World News - Find latest world news and headlines today based on politics, crime, entertainment, sports, lifestyle, technology and many
Saturday, 31 May 2025
A growing number of New Orleans fugitives' friends and family arrested for aiding in jail escape
from Yahoo News - Latest News & Headlines https://ift.tt/kWZAz8t
Ron DeSantis signs bill making gold, silver legal tender, declaring it’ll give Floridians ‘financial freedom’
from Yahoo News - Latest News & Headlines https://ift.tt/QrpVosL
Friday, 30 May 2025
Here are the 10 riskiest foods to eat in the US right now
from Yahoo News - Latest News & Headlines https://ift.tt/EKUtsGr
Thursday, 29 May 2025
Plea entered in deadly Ottawa County road rage case
from Yahoo News - Latest News & Headlines https://ift.tt/v6iH7tM
Wednesday, 28 May 2025
This Unstoppable Dividend-Paying Growth Stock Is Up More Than 260% in 5 Years. Here's Why It Just Hit an All-Time High.
from Yahoo News - Latest News & Headlines https://ift.tt/5CziDud
Israel destroys Houthis' final aircraft in strike on Saana International Airport
from Yahoo News - Latest News & Headlines https://ift.tt/2KIlXfP
Tuesday, 27 May 2025
Jim Cramer's warning misses the mark by over 380%
from Yahoo News - Latest News & Headlines https://ift.tt/lu5aXh8
Monday, 26 May 2025
Katrina survivor builds homes from cargo ship containers to weather all storms
from Yahoo News - Latest News & Headlines https://ift.tt/1Ct8vNS
Sunday, 25 May 2025
Saturday, 24 May 2025
Scientists grow concerned over devastating phenomenon impacting world's largest landlocked body of water: 'It is advisable to start action as soon as possible'
from Yahoo News - Latest News & Headlines https://ift.tt/1Kpy4I6
IndiGo flight facing severe weather was denied diversion requests, India says
from Yahoo News - Latest News & Headlines https://ift.tt/fLwZonH
Friday, 23 May 2025
Dave Ramsey Tells A Game Show Contestant Who Won $200,000 How To Manage The Money: 'You're Kind Of In The Middle'
from Yahoo News - Latest News & Headlines https://ift.tt/xhc97Gt
Thursday, 22 May 2025
Small Plane Crashes into Neighborhood, Igniting Fires in 15 Homes and Leaving Jet Fuel 'All Over'
from Yahoo News - Latest News & Headlines https://ift.tt/1XIxrma
New top story on Hacker News: Show HN: Pi Co-pilot – Evaluation of AI apps made easy
Show HN: Pi Co-pilot – Evaluation of AI apps made easy
8 by achintms | 0 comments on Hacker News.
Hey HN — 2 months ago we shared our first product with the HN community ( https://ift.tt/InyM7L6 ). Despite receiving lots of traffic from HN, we didn’t see any traction or retention. One of our major takeaways was that our product was too complicated. So we’ve spent the last 2 months iterating towards a much more focused product that tries to do just one thing really well. Today, we’d like to share our second launch with HN. Our original idea was to help software engineers build high-quality LLM applications by integrating their domain knowledge into a scoring system, which could then drive everything from prompt tuning to fine-tuning, RL, and data filtering. But what we quickly learned (with the help of HN – thank you!) is that most people aren’t optimizing as their first, second, or even third step — they’re just trying to ship something reasonable using system prompts and off-the-shelf models. In looking to build a product that’s useful to a wider audience, we found one piece of the original product that most people _did_ notice and want: the ability to check that the outputs of their AI apps look good. Whether you’re tweaking a prompt, switching models, or just testing a feature, you still need a way to catch regressions and evaluate your changes. Beyond basic correctness, developers also wanted to measure more subtle qualities — like whether a response feels friendly. So we rebuilt the product around this single use case: helping developers define and apply subjective, nuanced evals to their LLM outputs. We call it Pi Co-pilot. You can start with any/all of the below: - a few good/bad examples - a system prompt, or app description - an old eval prompt you wrote The co-pilot helps you turn that into a scoring spec — a set of ~10–20 concrete questions that probe the output against dimensions of quality you care about (e.g. “is it verbose?”, “does it have a professional tone?”, etc). For each question, it selects either: - a fast encoder-based model (trained for scoring) – Pi scorer. See our original post [1] for more details on why this is a good fit for scoring compared to the “LLM as a judge” pattern. - or generates Python functions when that makes more sense (word count, regex etc.) You iterate over examples, tweak questions, adjust scoring behavior, and quickly reach a spec that reflects your actual taste — not some generic benchmark or off-the-shelf metrics. Then you can plug the scoring system into your own workflow: Python, TypeScript, Promptfoo, Langfuse, Spreadsheets, whatever. We provide easy integrations with these systems. We took inspiration from tools like v0 and Bolt: natural language on the left, structured artifacts on the right. That pattern felt intuitive — explore conversationally, and let the underlying system crystallize it into things you can inspect and use (scoring spec, examples and code). Here is a loom demo of this: https://ift.tt/7vWgcj0 We’d appreciate feedback from the community on whether this second iteration of our product feels more useful. We are offering $10 of free credits (about 25M input tokens), so you can try out the Pi co-pilot for your use-cases. No sign-in required to start exploring: https://withpi.ai Overall stack: Co-pilot next.js and Vercel on GCP. Models: 4o on Azure, fine tuned Llama & ModernBert on GCP. Training: Runpod and SFCompute. – Achint (co-founder, Pi Labs)
8 by achintms | 0 comments on Hacker News.
Hey HN — 2 months ago we shared our first product with the HN community ( https://ift.tt/InyM7L6 ). Despite receiving lots of traffic from HN, we didn’t see any traction or retention. One of our major takeaways was that our product was too complicated. So we’ve spent the last 2 months iterating towards a much more focused product that tries to do just one thing really well. Today, we’d like to share our second launch with HN. Our original idea was to help software engineers build high-quality LLM applications by integrating their domain knowledge into a scoring system, which could then drive everything from prompt tuning to fine-tuning, RL, and data filtering. But what we quickly learned (with the help of HN – thank you!) is that most people aren’t optimizing as their first, second, or even third step — they’re just trying to ship something reasonable using system prompts and off-the-shelf models. In looking to build a product that’s useful to a wider audience, we found one piece of the original product that most people _did_ notice and want: the ability to check that the outputs of their AI apps look good. Whether you’re tweaking a prompt, switching models, or just testing a feature, you still need a way to catch regressions and evaluate your changes. Beyond basic correctness, developers also wanted to measure more subtle qualities — like whether a response feels friendly. So we rebuilt the product around this single use case: helping developers define and apply subjective, nuanced evals to their LLM outputs. We call it Pi Co-pilot. You can start with any/all of the below: - a few good/bad examples - a system prompt, or app description - an old eval prompt you wrote The co-pilot helps you turn that into a scoring spec — a set of ~10–20 concrete questions that probe the output against dimensions of quality you care about (e.g. “is it verbose?”, “does it have a professional tone?”, etc). For each question, it selects either: - a fast encoder-based model (trained for scoring) – Pi scorer. See our original post [1] for more details on why this is a good fit for scoring compared to the “LLM as a judge” pattern. - or generates Python functions when that makes more sense (word count, regex etc.) You iterate over examples, tweak questions, adjust scoring behavior, and quickly reach a spec that reflects your actual taste — not some generic benchmark or off-the-shelf metrics. Then you can plug the scoring system into your own workflow: Python, TypeScript, Promptfoo, Langfuse, Spreadsheets, whatever. We provide easy integrations with these systems. We took inspiration from tools like v0 and Bolt: natural language on the left, structured artifacts on the right. That pattern felt intuitive — explore conversationally, and let the underlying system crystallize it into things you can inspect and use (scoring spec, examples and code). Here is a loom demo of this: https://ift.tt/7vWgcj0 We’d appreciate feedback from the community on whether this second iteration of our product feels more useful. We are offering $10 of free credits (about 25M input tokens), so you can try out the Pi co-pilot for your use-cases. No sign-in required to start exploring: https://withpi.ai Overall stack: Co-pilot next.js and Vercel on GCP. Models: 4o on Azure, fine tuned Llama & ModernBert on GCP. Training: Runpod and SFCompute. – Achint (co-founder, Pi Labs)
Wednesday, 21 May 2025
Tuesday, 20 May 2025
Experts sound alarm after devastating revelations emerge about renowned athletic brand: 'Nothing has changed'
from Yahoo News - Latest News & Headlines https://ift.tt/zKV3Jjr
Monday, 19 May 2025
Sunday, 18 May 2025
Saturday, 17 May 2025
New top story on Hacker News: Show HN: I built a knife steel comparison tool
Show HN: I built a knife steel comparison tool
19 by p-s-v | 1 comments on Hacker News.
Hey HN! I'm a bit of a knife steel geek and got tired of juggling tabs to compare stats. So, I built this tool: https://ift.tt/oynd4PZ It lets you pick steels (like the ones in the screenshot) and see a radar chart comparing their edge retention, toughness, corrosion resistance, and ease of sharpening on a simple 1-10 scale. [Maybe attach the screenshot here if HN allows, or link to it] It's already been super handy for me, and I thought fellow knife/metallurgy enthusiasts here might find it useful too. Would love to hear your thoughts or any steel requests! Cheers!
19 by p-s-v | 1 comments on Hacker News.
Hey HN! I'm a bit of a knife steel geek and got tired of juggling tabs to compare stats. So, I built this tool: https://ift.tt/oynd4PZ It lets you pick steels (like the ones in the screenshot) and see a radar chart comparing their edge retention, toughness, corrosion resistance, and ease of sharpening on a simple 1-10 scale. [Maybe attach the screenshot here if HN allows, or link to it] It's already been super handy for me, and I thought fellow knife/metallurgy enthusiasts here might find it useful too. Would love to hear your thoughts or any steel requests! Cheers!
Friday, 16 May 2025
Former Augusta commissioner found dead in Aiken County apartment
from Yahoo News - Latest News & Headlines https://ift.tt/GuFIALc
Thursday, 15 May 2025
Xiaomi's new EV orders slump in China as consumer backlash grows
from Yahoo News - Latest News & Headlines https://ift.tt/mldGDLu
Wednesday, 14 May 2025
Man on skateboard killed by Tesla in Las Vegas hit-and-run
from Yahoo News - Latest News & Headlines https://ift.tt/OTG2H58
Tuesday, 13 May 2025
Monday, 12 May 2025
Sunday, 11 May 2025
Saturday, 10 May 2025
People In HR Revealed Truly Unhinged Reasons Employees Got Fired, And My Jaw Is On The Floor
from Yahoo News - Latest News & Headlines https://ift.tt/evarLDy
Friday, 9 May 2025
India preparing for possible retaliation from Pakistan after deadly missile strikes
from Yahoo News - Latest News & Headlines https://ift.tt/8PHJDzh
Thursday, 8 May 2025
DOGE committee goes off the rails after Democrats suggest Marjorie Taylor Greene engaged in insider trading
from Yahoo News - Latest News & Headlines https://ift.tt/3cpsHXF
Wednesday, 7 May 2025
Tuesday, 6 May 2025
Donald Trump Flips Out At Wall Street Journal Reporter: ‘You Hear Me? What I Said?’
from Yahoo News - Latest News & Headlines https://ift.tt/2MLr85e
Monday, 5 May 2025
Sunday, 4 May 2025
Saturday, 3 May 2025
As Trump moves to tax small parcels, some retailers give up on US
from Yahoo News - Latest News & Headlines https://ift.tt/rODktQJ
Friday, 2 May 2025
Thursday, 1 May 2025
Subscribe to:
Posts (Atom)