Inspired by this post
DAMN
— Brandon G. Neri (@nerijs) October 20, 2024
Qwen2.5-coder 7B as an autocomplete model is insane!
I'm getting even better results than Copilot and Cursor.
Can actually fit ~16k tokens of context at usable speeds
Currently using Q4_K_M and takes ~12GB of VRAM
here’s how to actually run it:
- Install continue.dev via vscode marketplace
- Install brew and then ollama via
brew install ollama
, set it running viabrew services start ollama
- Pull a specific model like
ollama pull qwen2.5-coder:7b-base
- Click on the gear icon in the bottom right corner of Continue to open your config.json and add
"models": [
{"model": "claude-3-5-sonnet-20240620",
"provider": "anthropic",
"apiKey": "",
"title": "Claude 3.5 Sonnet"
},
{
"title": "qwen2.5-coder:7b-base",
"provider": "ollama",
"model": "qwen2.5-coder:7b-base"
}
],
"tabAutocompleteModel": {
"title": "qwen2.5-coder:7b-base",
"provider": "ollama",
"model": "qwen2.5-coder:7b-base"
},
That’s it! Runs fast on the latest macbooks