Wednesday, 1 July 2026

New top story on Hacker News: Hanami 3.0: In Full Bloom

Hanami 3.0: In Full Bloom
10 by PuercoPop | 0 comments on Hacker News.


New top story on Hacker News: Show HN: Pglayers – PostgreSQL extensions as stackable Docker layers

Show HN: Pglayers – PostgreSQL extensions as stackable Docker layers
15 by iemejia | 2 comments on Hacker News.


New top story on Hacker News: Show HN: Morph Reflexes – Multi-head classifiers for agent traces

Show HN: Morph Reflexes – Multi-head classifiers for agent traces
11 by bhaktatejas922 | 1 comments on Hacker News.
The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale. To solve this, we built Reflexes: semantic signals from agent traces, served fast and cheap over API. Built on custom kernels and a custom inference engine forked from vLLM. Under the hood, it is a small LLM architected around multi-head inference. Small models need to be trained for specific tasks, but running 50 separate small models on the same input for 50 tasks makes no sense. How it works: We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA and older multiple-head techniques. we built the inference engine to reuse the KV/cache across inputs and compute across all reflexes. One shared backbone reads the trace once, then many heads classify different signals. Our inference engine reuses the same KV/cache and compute across all reflexes, giving us sub-30ms inference with less than 0.1% overhead for each additional reflex. We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms. Why does optimizing this matter? If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale. I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like `is_camera_obfuscated=true`, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently What it is not: A dashboard. 99% of dashboards go unused. 100% API first and made for devs who want to use this to trigger their own stuff. vibetrain a custom reflex in our dashboard, and/or then let it self improve in production: https://ift.tt/EDGbzvJ Docs: https://ift.tt/OAu70C5 I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns but cant right now? TLDR: semantic signals from agent traces, super fast, cheap via API

Friday, 26 June 2026

New top story on Hacker News: What Is a Nomogram and Why Would It Interest Me?

What Is a Nomogram and Why Would It Interest Me?
17 by Eridanus2 | 4 comments on Hacker News.


New top story on Hacker News: Ask HN: Is "no source code was copied" still a sufficient copyright defense?

Ask HN: Is "no source code was copied" still a sufficient copyright defense?
17 by oscgam1 | 13 comments on Hacker News.
We are all familiar with the Corgi event: https://ift.tt/HtAY4cl With the barrier to create new apps having dropped significantly thanks to LLMs, I am seeing more cases about copyright and unfair competition. I've seen and participated in some of these cases. Usually expert witnesses are required. Curious to hear the community stance on this one. "Now software developers are feeling what authors and artist felt". https://ift.tt/PhfCA9Q There are several claims of: Copying UI is Ok, your product is not undifferentiated enough. Here is a legal assessment of the situation: https://ift.tt/SFn8Hhp