Inspired by this post
DAMN
— Brandon G. Neri (@nerijs) October 20, 2024
Qwen2.5-coder 7B as an autocomplete model is insane!
I'm getting even better results than Copilot and Cursor.
Can actually fit ~16k tokens of context at usable speeds
Currently using Q4_K_M and takes ~12GB of VRAM
here’s how to actually run it:
- Install continue.dev via vscode marketplace
- Install brew and then ollama via
brew install ollama
, set it running viabrew services start ollama
- Pull a specific model like
ollama pull qwen2.5-coder:7b-base
- Click on the gear icon in the bottom right corner of Continue to open your config.json and add
That’s it! Runs fast on the latest macbooks