Monday, 31 March 2025

3 missing US soldiers found dead in Lithuania, search continues for 4th soldier



from Yahoo News - Latest News & Headlines https://ift.tt/6QXL4tN

New top story on Hacker News: Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning

Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning
18 by lmeierhoefer | 1 comments on Hacker News.
Hi HN, we’re the cofounders of Augento ( https://augento.ai/ ). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent. There’s a demo video https://www.youtube.com/watch?v=j5RQaTdRrKE , and our docs are at https://ift.tt/yspL5BY . It’s open for anyone to use at https://augento.ai . Agents fail all the time, especially when you try to use them for something actually useful. Current solution approaches suck: prompting has intrinsic limits and supervised fine-tuning requires big explicit datasets that are hard to collect. Two months ago, the DeepSeek R1 paper outlined a way to post-train LLMs with (almost) pure reinforcement learning. We took up their research and built a fine-tuning platform around that. You let us intercept your agent's data flow, and we deliver you a fine-tuned open-source model, that is trained on the agent's specific task. Instead of providing big datasets of explicit fine-tuning samples, you provide a reward function, judging the model's outputs. Here are examples of what this can be used for: Coding Agent: We fine-tuned a coding agent that was constantly making syntax errors and failed to handle semantic edge cases properly. By providing a reward function that evaluated code against the compiler, the agent learned not to produce these errors. The fine-tuned model reduced critical bugs by 40% with just 20 training samples. MCP Tool Specialization: Imagine you have a custom set of internal tools using the MCP protocol, but your agent keeps selecting the wrong tool or passing incompatible parameters. You could fine-tune with a reward function that scores tool selection and parameter matching. Browser Agent Navigation: If you're building a browser agent that struggles with complex web UIs or specific sites, you could fine-tune it to better understand UI elements and navigation patterns. With a reward function that scores successful task completion (like "find the best price for this product" or "complete this multi-step form"), you could train an agent that better identifies clickable elements, understands form validation errors, and navigates through complex SPAs without getting stuck. VLA Robot Control: If you're using vision-language models to control robotic arms or other hardware, you could fine-tune for your specific actuator setup. With a reward function based on high-level task completion, you could train a Vision-Langauge-Action (VLA) model that translates natural language commands like "move the red block behind the blue cylinder" into actuator controls for your specific hardware. As you see from these examples, the current paradigm is best suited for "verifiable domains”, where it is possible to give an explicit function judging the model’s outputs. However, up next, we will also support an "alignment mode", where you don't have to provide a reward function but provide high-level feedback on past failure runs of your agent. Just tag where things went wrong, and we'll handle the rest. This makes it even easier to improve your agents without needing to write formal reward functions. Our platform is not itself open source, but it fine-tunes open-source language models. I.e. it is an alternative to the reinforcement fine-tuning API from OpenAI, but with Qwen, LLama, Deepseek, etc., and more customizability on the reward model. We charge users for the training and for their inference/interaction with the model later on ($0 monthly flat fee + training cost + inference cost). The platform is self-serving and open to use at https://ift.tt/ys5SA1o . We’ll give you $20 in training credits, which should be enough for connecting your agent and delivering some observable improvement on your use case. We’d love to hear your thoughts and feedback!

Friday, 28 March 2025

Canada Announces Bombshell Break With U.S. Over Trump



from Yahoo News - Latest News & Headlines https://ift.tt/KoDnI13

New top story on Hacker News: We hacked Google's A.I Gemini and leaked its source code (at least some part)

We hacked Google's A.I Gemini and leaked its source code (at least some part)
20 by topsycatt | 0 comments on Hacker News.


New top story on Hacker News: Show HN: Hexi, modern header-only network binary serialisation for C++ hackers

Show HN: Hexi, modern header-only network binary serialisation for C++ hackers
13 by Chaosvex | 2 comments on Hacker News.
Over the last few years, I've needed an easy way to quickly serialise and deserialise various network protocols safely and efficiently. Most of the libraries that existed at the time were either quite heavy, had less than stellar performance, or were an abstraction level above what I was looking for. I decided to put together my own class to do the job, starting with an easy, low-overhead way to move bytes in and out of arbitrary buffers. Along the way, it picked up useful bits and pieces, such as buffer structures and allocators that made the byte shuffling faster, often being able to do it with zero allocations and zero copies. Safety features came along to make sure that malicious packet data or mistakes in the code wouldn't result in segfaults or vulnerabilities. It's become useful enough to me that I've packaged it up in its own standalone library on the chance that it might be useful to others. It has zero dependencies other than the standard library and has been designed for quick integration into any project within minutes, or seconds with a copy paste of the amalgamated header. It can be used in production code but it's also ideal for for those that want to quickly hack away at binary data with minimal fuss.

Friday, 14 March 2025

New top story on Hacker News: Show HN: Pi Labs – AI scoring and optimization tools for software engineers

Show HN: Pi Labs – AI scoring and optimization tools for software engineers
9 by achintms | 0 comments on Hacker News.
Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs. This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it. The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers. We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1]. More technical details Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern. First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop. Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these. Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling. Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next. Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls. We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session! - Achint, co-founder of Pi Labs [1] https://ift.tt/7KRmcGY

Wednesday, 12 March 2025

Larry Ellison’s Oracle just reported $130 billion in future contracts—which doesn’t include even a single transaction from Stargate



from Yahoo News - Latest News & Headlines https://ift.tt/ozGc9tB

New drone-destroying laser weapon with 1-mile-range tested for Turkey’s ‘Steel Dome’



from Yahoo News - Latest News & Headlines https://ift.tt/5jGt8FP

New top story on Hacker News: Show HN: Nuanced – Help AI understand code structure, not just text

Show HN: Nuanced – Help AI understand code structure, not just text
12 by aymandfire | 3 comments on Hacker News.
Hi HN! We built Nuanced ( https://ift.tt/8zeQufh ), an open-source Python library that makes AI coding tools smarter about code structure. The problem: current AI coding assistants see code as just text. They don't understand which functions call which, or how code components depend on each other. This is why they often suggest changes that break dependencies they should have known about. Nuanced solves this by generating call graphs that map these relationships. When you ask "What would break if I change this function?", instead of guessing, the AI can see the actual dependencies. How it works: 1. Run `nuanced init .` to analyze a Python module in your codebase 2. Use `nuanced enrich app/file.py function_name` to get relationship data 3. Include this data in your AI prompts or integrate it into tools We're already working with teams building AI coding assistants and security review tools to integrate this capability. Our initial release supports Python with plans for JavaScript/TypeScript next. I'd love your feedback, especially if you're building dev tools that could benefit from better code structure understanding!

Sunday, 9 March 2025

New top story on Hacker News: Europe bets once again on RISC-V for supercomputing

Europe bets once again on RISC-V for supercomputing
37 by muxamilian | 15 comments on Hacker News.


New top story on Hacker News: Show HN: Evolving Agents Framework

Show HN: Evolving Agents Framework
27 by matiasmolinas | 3 comments on Hacker News.
Hey HN, I've been working on an open-source framework for creating AI agents that evolve, communicate, and collaborate to solve complex tasks. The Evolving Agents Framework allows agents to: Reuse, evolve, or create new agents dynamically based on semantic similarity Communicate and delegate tasks to other specialized agents Continuously improve by learning from past executions Define workflows in YAML, making it easy to orchestrate agent interactions Search for relevant tools and agents using OpenAI embeddings Support multiple AI frameworks (BeeAI, etc.) Current Status & Roadmap This is still a draft and a proof of concept (POC). Right now, I’m focused on validating it in real-world scenarios to refine and improve it. Next week, I'm adding a new feature to make it useful for distributed multi-agent systems. This will allow agents to work across different environments, improving scalability and coordination. Why? Most agent-based AI frameworks today require manual orchestration. This project takes a different approach by allowing agents to decide and adapt based on the task at hand. Instead of always creating new agents, it determines if existing ones can be reused or evolved. Example Use Case: Let’s say you need an invoice analysis agent. Instead of manually configuring one, our framework: Checks if a similar agent exists (e.g., a document analyzer) Decides whether to reuse, evolve, or create a new agent Runs the best agent and returns the extracted information Here's a simple example in Python: import asyncio from evolving_agents.smart_library.smart_library import SmartLibrary from evolving_agents.core.llm_service import LLMService from evolving_agents.core.system_agent import SystemAgent async def main(): library = SmartLibrary("agent_library.json") llm = LLMService(provider="openai", model="gpt-4o") system = SystemAgent(library, llm) result = await system.decide_and_act( request="I need an agent that can analyze invoices and extract the total amount", domain="document_processing", record_type="AGENT" ) print(f"Decision: {result['action']}") # 'reuse', 'evolve', or 'create' print(f"Agent: {result['record']['name']}") if __name__ == "__main__": asyncio.run(main()) Next Steps Validating in real-world use cases and improving agent evolution strategies Adding distributed multi-agent support for better scalability Full integration with BeeAI Agent Communication Protocol (ACP) Better visualization tools for debugging Would love feedback from the HN community! What features would you like to see? Repo: https://ift.tt/CJ8GUsI

Monday, 3 March 2025

New top story on Hacker News: Show HN: Open-Source Windows AI assistant that uses Word, Excel through COM

Show HN: Open-Source Windows AI assistant that uses Word, Excel through COM
16 by edmgood | 0 comments on Hacker News.
This started off as a project to understand how to get LLMs to interface with more traditional desktop softwares. We were especially interested in tools related to schematic drafting and molecular simulation. Decided to explore COM automation for more traditional Windows softwares as a starting point! Been using it to help some friends automate simple excel workflows. Thought we'd share!