What
See the result here. You can see 2804 predictions HN users made, evaluated by LLM with categories, filters and search. Try typing “Linux” in a search bar! Source code is on GitHub
Why
I stumbled upon a 2010 thread of predictions for the upcoming decade and had fun reading through it. I thought maybe I can find people that were good at predicting the future, and check what they think about the next decade. I decided to use LLMs to evaluate these predictions. LLMs, with their extensive knowledge of recent history, should be capable of assessing the accuracy of most predictions.
How
Data
Initially, I aimed to analyze all predictions made by Hacker News users. I utilized a ClickHouse dataset, employing regular expressions and similar techniques for data filtering. Claude helped me write SQL queries, but the results still contained significant noise. To streamline the process, I narrowed the focus to 12 specific Ask HN prediction threads, with the idea to extend it to all HN comments later. Lesson: ClickHouse is very nice and can be a go-to (after SQLite).
Model Selection
I used the LMSYS Leaderboard to find an open-source model that fits in my Mac M1 Max for local execution. The goal was to obtain structured JSON output. While APIs like Google’s Gemini Flash were cheap, they required multiple queries to achieve the desired output format, because they didn’t force JSON output but only provided suggestions. Ultimately, I selected the Nous Research Hermes-2-Theta-Llama-3-70B-GGUF model. The GGUF weights were readily available on HuggingFace, allowing me to configure my JSON schema and run llama.cpp with an additional grammar output restriction. This approach worked, but a processing time of approximately 4 minutes per comment, so I just let it run over night. Lesson: local LLMs are pretty good already, and pretty simple in a setup. Hope Ollama will get a proper Grammar support too though.
Web Implementation
For the frontend, I used the Skeleton.dev UI toolkit for SvelteKit with DataTables integration. The entire project was deployed to CloudFlare. Given the current dataset size, I opted to store the data in a TypeScript file rather than setting up an external database. However, I’m considering a WebAssembly SQLite or CloudFlare’s D1 for the future scalability. Lesson: website building is very straightforward these days, especially with a Svelte template.
Results
I deployed the website to https://hn-predictions.eamag.me/. I’ve calculated some basic statistics I found interesting, like predictions about Linux or Bitcoin, but urge you to go through the table and find something else! LLM is mostly correct, but often misses on nuances, or too conservative in the estimations. Overall success rate is about 50%, so a coin flip. There are people with good predictions, but they never did it again :( Politics is the hardest for HN audience, Technology is not the strongest area either. Not many people predicted business highlights of Microsoft and NVIDIA.
Next steps
- Use all HN comments: first write some good SQL queries to filter comments about predictions, then run a very fast LLM to confirm they’re actually about predictions.
- Connect this project to a prediction market like https://manifold.markets/
- Automatically reply to people about their predictions when they they can be evaluated