Sunday, 29 March 2026

Friday, 27 March 2026

New top story on Hacker News: Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)

Show HN: Sup AI, a confidence-weighted ensemble (52.15% on Humanity's Last Exam)
15 by supai | 10 comments on Hacker News.
Hi HN. I'm Ken, a 20-year-old Stanford CS student. I built Sup AI. I started working on this because no single AI model is right all the time, but their errors don’t strongly correlate. In other words, models often make unique mistakes relative to other models. So I run multiple models in parallel and synthesize the outputs by weighting segments based on confidence. Low entropy in the output token probability distributions correlates with accuracy. High entropy is often where hallucinations begin. My dad Scott (AI Research Scientist at TRI) is my research partner on this. He sends me papers at all hours, we argue about whether they actually apply and what modifications make sense, and then I build and test things. The entropy-weighting approach came out of one of those conversations. In our eval on Humanity's Last Exam, Sup scored 52.15%. The best individual model in the same evaluation run got 44.74%. The relative gap is statistically significant (p < 0.001). Methodology, eval code, data, and raw results: - https://sup.ai/research/hle-white-paper-jan-9-2026 - https://github.com/supaihq/hle Limitations: - We evaluated 1,369 of the 2,500 HLE questions (details in the above links) - Not all APIs expose token logprobs; we use several methods to estimate confidence when they don't We tried offering free access and it got abused so badly it nearly killed us. Right now the sustainable option is a $5 starter credit with card verification (no auto-charge). If you don't want to sign up, drop a prompt in the comments and I'll run it myself and post the result. Try it at https://sup.ai . My dad Scott (@scottmu) is in the thread too. Would love blunt feedback, especially where this really works for you and where it falls short. Here's a short demo video: https://www.youtube.com/watch?v=DRcns0rRhsg

Saturday, 21 March 2026

New top story on Hacker News: Former FBI Director Robert Mueller Has Died

Former FBI Director Robert Mueller Has Died
26 by WarOnPrivacy | 5 comments on Hacker News.


New top story on Hacker News: Show HN: Joonote – A note-taking app on your lock screen and notification panel

Show HN: Joonote – A note-taking app on your lock screen and notification panel
6 by kilgarenone | 0 comments on Hacker News.
I finally built this app after many years of being sick of unlocking my phone every goddamn time I need to take or view my notes. It particularly sucks when I'm doing my grocery and going down the list. I started building last year June. This is a native app written in Kotlin. And since I'm a 100% Web dev guy, I gotta say this wouldn't have been possible without this AI to assist me. So this isn't "vibe-coded". I simply used the chat interface in Gemini website, manually copy paste codes to build and integrate every single thing in the app! I used gemini to build it just because I was piggybacking on my last company's enterprise subscription. I personally didn't subscribe to any AI (and still don't cuz the free quota seems enough for me :) So I certainly have learnt alot about Android development, architecture patterns, Kotlin syntax, and obeying Google's whims. Can't say I love it all, but for the sake of this app, I will :) Anyway, I finally have the app I wish existed, and I'm using it everyday. It not only does the main thing I needed it to do, but there's also all this stuff: - Make your notes private if you don't want to show them on lock screen. - Create check/to-do lists. - Set one time or recurring reminders. - Full-text search your notes in the app. - Speech-to-text. - Organize your notes with custom or color labels. - Pin the app as a widget on your home screen. - You can auto backup and restore your notes on new install or Android device. - Works offline. - And no funny business happening in the background https://ift.tt/Ki7OaNl It's 30-day trial, then a one-time $9.99 to go Pro forever. I would love you all to check it out, FWIW. Ok thanks!

Thursday, 19 March 2026

New top story on Hacker News: Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff

Show HN: Dumped Wix for an AI Edge agent so I never have to hire junior staff
5 by axotopia | 2 comments on Hacker News.
I run a building design consultancy. I got tired of paying Wix $40/month for a brochure that couldn’t answer simple service questions, and me wasting hours on the same FAQs. So I killed it all and spent 4 months building a 'talker': https://axoworks.com The stack is completely duct-taped: Netlify’s 10s serverless timeout forced me to split the agent into three pieces: Brain (Edge), Hands (Browser), and Voice (Edge). I haven’t coded in 30 years. This was 3 steps forward, 2 steps back, heavily guided by AI. The fight that proved it worked: 2 weeks ago, a licensed architect attacked the bot, trying to prove my business model harms the profession. The AI (DeepSeek-R3) completely dismantled his arguments. It was hilariously caustic. Log: https://ift.tt/BsFleQ5... A few battle scars: * Web Speech API works fine, right up until someone speaks Chinese without toggling the language mode. Then it forcefully spits out English phonetic gibberish. Still a headache. * Liability is the killer. Hallucinate a building code clause? We’re dead. Insurance won’t touch us. * We publish the audit logs to keep ourselves honest and make sure the system stays hardened. Audit: https://ift.tt/FQAgSMv The hardest part was getting the intent right: making one LLM pivot seamlessly from a warm principal’s tone with a homeowner, to a defensive bulldog when attacked by a peer. That took 2.5 months of tuning. We burn through tokens with an 'Eager RAG' hack (pre-fetching guesses) just to improve responsiveness. I also ripped out the “essential” persistent DBs—less than 5% of visitors ever return, so why bother? If a client drops mid-query, their session vanishes. No server-side queues. The point: To let me operate with a network of seasoned pros, and trim the fat. Try to break it. I’ll be in the comments. Kee

Wednesday, 18 March 2026

Sunday, 15 March 2026

New top story on Hacker News: Grandparents are glued to their phones, families are worried [video]

Grandparents are glued to their phones, families are worried [video]
68 by tartoran | 32 comments on Hacker News.


New top story on Hacker News: SuperTux 0.7.0

SuperTux 0.7.0
20 by pentagrama | 1 comments on Hacker News.


New top story on Hacker News: Ask HN: How is AI-assisted coding going for you professionally?

Ask HN: How is AI-assisted coding going for you professionally?
18 by svara | 6 comments on Hacker News.
Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless." I'd like to cut through the noise and learn what's actually working and what isn't, from concrete experience. If you've recently used AI tools for professional coding work, tell us about it. What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them? Please share enough context (stack, project type, team size, experience level) for others to learn from your experience. The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.

New top story on Hacker News: In Memoriam: John W. Addison, my PhD advisor

In Memoriam: John W. Addison, my PhD advisor
9 by herodotus | 0 comments on Hacker News.


Friday, 6 March 2026

New top story on Hacker News: Show HN: Claude-replay – A video-like player for Claude Code sessions

Show HN: Claude-replay – A video-like player for Claude Code sessions
13 by es617 | 7 comments on Hacker News.
I got tired of sharing AI demos with terminal screenshots or screen recordings. Claude Code already stores full session transcripts locally as JSONL files. Those logs contain everything: prompts, tool calls, thinking blocks, and timestamps. I built a small CLI tool that converts those logs into an interactive HTML replay. You can step through the session, jump through the timeline, expand tool calls, and inspect the full conversation. The output is a single self-contained HTML file — no dependencies. You can email it, host it anywhere, embed it in a blog post, and it works on mobile. Repo: https://ift.tt/zRu03k4 Example replay: https://es617.github.io/assets/demos/peripheral-uart-demo.ht...

Tuesday, 3 March 2026

New top story on Hacker News: Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act

Show HN: Open-Source Article 12 Logging Infrastructure for the EU AI Act
12 by systima | 0 comments on Hacker News.
EU legislation (which affects UK and US companies in many cases) requires being able to truly reconstruct agentic events. I've worked in a number of regulated industries off & on for years, and recently hit this gap. We already had strong observability, but if someone asked me to prove exactly what happened for a specific AI decision X months ago (and demonstrate that the log trail had not been altered), I could not. The EU AI Act has already entered force, and its Article 12 kicks-in in August this year, requiring automatic event recording and six-month retention for high-risk systems, which many legal commentators have suggested reads more like an append-only ledger requirement than standard application logging. With this in mind, we built a small free, open-source TypeScript library for Node apps using the Vercel AI SDK that captures inference as an append-only log. It wraps the model in middleware, automatically logs every inference call to structured JSONL in your own S3 bucket, chains entries with SHA-256 hashes for tamper detection, enforces a 180-day retention floor, and provides a CLI to reconstruct a decision and verify integrity. There is also a coverage command that flags likely gaps (in practice omissions are a bigger risk than edits). The library is deliberately simple: TS, targeting Vercel AI SDK middleware, S3 or local fs, linear hash chaining. It also works with Mastra (agentic framework), and I am happy to expand its integrations via PRs. Blog post with link to repo: https://ift.tt/5NXdvct I'd value feedback, thoughts, and any critique.