Monday, 16 December 2024

New top story on Hacker News: Images of gamma-ray flare from supermassive black hole M87

Images of gamma-ray flare from supermassive black hole M87
3 by gmays | 0 comments on Hacker News.


New top story on Hacker News: Our muscles will atrophy as we climb the Kardashev Scale

Our muscles will atrophy as we climb the Kardashev Scale
22 by hosolmaz | 9 comments on Hacker News.


New top story on Hacker News: Lfgss shutting down 16th March 2025 (day before Online Safety Act is enforced)

Lfgss shutting down 16th March 2025 (day before Online Safety Act is enforced)
54 by buro9 | 17 comments on Hacker News.


New top story on Hacker News: Show HN: NCompass Technologies – yet another AI Inference API, but hear us out

Show HN: NCompass Technologies – yet another AI Inference API, but hear us out
3 by adiraja | 5 comments on Hacker News.
Hello HackerNews! I’m excited to share what we’ve been working on at nCompass Technologies: an AI inference platform that gives you a scalable and reliable API to access any open-source AI model — with no rate limits. We don't have rate limits as optimizations we made to our AI model serving software enable us to support a high number of concurrent requests without degrading quality of service for you as a user. If you’re thinking, well aren’t there a bunch of these already? So were we when we started nCompass. When using other APIs, we found that they weren’t reliable enough to be able to use open source models in production environments. To resolve this, we're building an AI inference engine that enable you, as an end user, to reliably use open source models in production. Underlying this API, we’re building optimizations at the hosting, scheduling and kernel levels with the single goal of minimizing the number of GPUs required to maximize the number of concurrent requests you can serve, without degrading quality of service. We’re still building a lot of our optimizations, but we’ve released what we have so far via our API. Compared to vLLM, we currently keep time-to-first-token (TTFT) 2-4x lower than vLLM at the equivalent concurrent request rate. You can check out a demo of our API here: https://ift.tt/fPrksIQ As a result of the optimizations we’ve rolled out so far, we’re releasing a few unique features on our API: 1. Rate-Limits: we don’t have any Most other API’s out there have strict rate limits and can be rather unreliable. We don’t want API’s for open source models to remain as a solution for prototypes only. We want people to use these APIs like they do OpenAI’s or Anthropic’s and actually make production grade products on top of open source models. 2. Underserved models: we have them There are a ton of models out there, but not all of them are readily available for people to use if they don’t have access to GPUs. We envision our API becoming a system where anyone can launch any custom model of their choice with minimal cold starts and run the model as a simple API call. Our cold starts for any 8B or 70B model are only 40s and we’ll keep improving this. Towards this goal, we already have models like `ai4bharat/hercule-hi` hosted on our API to support non-english language use cases and models like `Qwen/QwQ-32B-Preview` to support reasoning based use cases. You can find the other models that we host here: https://ift.tt/fMYyvbd. We’d love for you to try out our API by following the steps here: https://ift.tt/UNKCion . We provide $100 of free credit on sign up to run models, and like we said, go crazy with your requests, we’d love to see if you can break our system :) We’re still actively building out features and optimizations and your input can help shape the future of nCompass. If you have thoughts on our platform or want us to host a specific model, let us know at hello@ncompass.tech. Happy Hacking!

Friday, 13 December 2024

New top story on Hacker News: MarkItDown: Python tool for converting files and office documents to Markdown

MarkItDown: Python tool for converting files and office documents to Markdown
2 by Handy-Man | 0 comments on Hacker News.


New top story on Hacker News: Garbage Collected Smart Pointers in Rust via Concurrent Cycle Collection

Garbage Collected Smart Pointers in Rust via Concurrent Cycle Collection
23 by maplant | 1 comments on Hacker News.


New top story on Hacker News: People who are good at reading have different brains

People who are good at reading have different brains
22 by pseudolus | 3 comments on Hacker News.


New top story on Hacker News: Show HN: I made the slowest, most expensive GPT

Show HN: I made the slowest, most expensive GPT
23 by wluk | 13 comments on Hacker News.
This is another one of my automate-my-life projects - I'm constantly asking the same question to different AIs since there's always the hope of getting a better answer somewhere else. Maybe ChatGPT's answer is too short, so I ask Perplexity. But I realize that's hallucinated, so I try Gemini. That answer sounds right, but I cross-reference with Claude just to make sure. This doesn't really apply to math/coding (where o1 or Gemini can probably one-shot an excellent response), but more to online search, where information is more fluid and there's no "right" search engine + text restructuring + model combination every time. Even o1 doesn't have online search, so it's obviously a hard problem to solve. An example is something like "best ski resorts in the US", which will get a different response from every GPT, but most of their rankings won't reflect actual skiers' consensus - say, on Reddit https://ift.tt/6CzA0R4... - because there's so many opinions floating around, a one-shot RAG search + LLM isn't going to have enough context to find how everyone thinks. And obviously, offline GPTs like o1 and Sonnet/Haiku aren't going to have the latest updates if a resort closes for example. So I’ve spent the last few months experimenting with a new project that's basically the most expensive GPT I’ll ever run. It runs search queries through ChatGPT, Claude, Grok, Perplexity, Gemini, etc., then aggregates the responses. For added financial tragedy, in-between it also uses multiple embedding models and performs iterative RAG searches through different search engines. This all functions as sort of like one giant AI brain. So I pay for every search, then every embedding, then every intermediary LLM input/output, then the final LLM input/output. On average it costs about 10 to 30 cents per search. It's also extremely slow. https://ithy.com I know that sounds absurdly overkill, but that’s kind of the point. The goal is to get the most accurate and comprehensive answer possible, because it's been vetted by a bunch of different AIs, each sourcing from different buckets of websites. Context limits today are just large enough that this type of search and cross-model iteration is possible, where we can determine the "overlap" between a diverse set of text to determine some sort of consensus. The idea is to get online answers that aren't attainable from any single AI. If you end up trying this out, I'd recommend comparing Ithy's output against the other GPTs to see the difference. It's going to cost me a fortune to run this project (I'll probably keep it online for a month or two), but I see it as an exploration of what’s possible with today’s model APIs, rather than something that’s immediately practical. Think of it as an online o1 (without the $200/month price tag, though I'm offering a $29/month Pro plan to help subsidize). If nothing else, it’s a fun (and pricey) thought experiment.

Thursday, 12 December 2024

New top story on Hacker News: Show HN: Gentrace – connect to your LLM app code and run/eval it from a UI

Show HN: Gentrace – connect to your LLM app code and run/eval it from a UI
7 by dsaffy | 0 comments on Hacker News.
Hey HN - Doug from Gentrace here. We originally launched via Show HN in August of 2023 as evaluation and observability for generative AI: https://ift.tt/KX6bsUO Since then, everyone from the model providers to LLM ops companies built a prompt playground. We had one too, until we realized this was totally the wrong approach: - It's not connected to your application code - They don't support all models - You have to rebuild evals for just this one prompt (can't use your end-to-end evals) In other words, it was a ton of work and time to use these to actually make your app better. So, we built a new experience and are relaunching around this idea: Gentrace is a collaborative LLM app testing and experimentation platform that brings together engineers, PMs, subject matter experts, and more to run and test your actual end-to-end app. To do this, use our SDK to: - connect your app to Gentrace as a live runner over websocket (local) / via webhook (staging, prod) - wrap your parameters (eg prompt, model, top-k) so they become tunable knobs in the front end - edit the parameters and then run / evaluate the actual app code with datasets and evals in Gentrace We think it's great for tuning retrieval systems, upgrading models, and iterating on prompts. It's free to trial. Would love to hear your feedback / what you think!

Tuesday, 10 December 2024

New top story on Hacker News: Ask HN: Those making $500/month on side projects in 2024 – Show and tell

Ask HN: Those making $500/month on side projects in 2024 – Show and tell
87 by cvbox | 72 comments on Hacker News.
It's the time of the year again, so I'd be interested hear what new (and old) ideas have come up. Previously asked on: 2023 → https://ift.tt/3mUX6ZK 2022 → https://ift.tt/ehPHQCk 2021 → https://ift.tt/ExoqimD 2020 → https://ift.tt/vRdE92T 2019 → https://ift.tt/XnJdAc1 2018 → https://ift.tt/ABfve6h 2017 → https://ift.tt/gqDshEa

Wednesday, 4 December 2024

New top story on Hacker News: Show HN: I combined spaced repetition with emails so you can remember anything

Show HN: I combined spaced repetition with emails so you can remember anything
15 by iskrataa | 3 comments on Hacker News.
Hey HN, I am a student shipping apps in my free time. This is my 4th for the year! Non-fic books and podcasts have been part of my life for years now but I always struggled with remembering what I’ve read or listened to. I wanted it to stick even after years. My notes list grew large but I never really revisited them. That’s why I created GinkgoNotes. You can enter notes you want to recall and leave it to the app to create a personalised (based on spaced repetition) email schedule. That means you’ll get your notes emailed to you a couple of times exactly when you should read them again (based on Ebbinghaus's Forgetting Curve) so it’s certain that you’ll remember them. I hope this will be helpful as it was for me. Would love some feedback! Iskren