Qwen2.5-coder 7B as an autocomplete model is insane! Guide how to use it

Inspired by this post

DAMN

Qwen2.5-coder 7B as an autocomplete model is insane!

I'm getting even better results than Copilot and Cursor.

Can actually fit ~16k tokens of context at usable speeds

Currently using Q4_K_M and takes ~12GB of VRAM
— Brandon G. Neri (@nerijs) October 20, 2024

here’s how to actually run it:

Install continue.dev via vscode marketplace
Install brew and then ollama via brew install ollama, set it running via brew services start ollama
Pull a specific model like ollama pull qwen2.5-coder:7b-base
Click on the gear icon in the bottom right corner of Continue to open your config.json and add

"models": [
{"model": "claude-3-5-sonnet-20240620",
"provider": "anthropic",
"apiKey": "",
"title": "Claude 3.5 Sonnet"
},
{
"title": "qwen2.5-coder:7b-base",
"provider": "ollama",
"model": "qwen2.5-coder:7b-base"
}
],
"tabAutocompleteModel": {
"title": "qwen2.5-coder:7b-base",
"provider": "ollama",
"model": "qwen2.5-coder:7b-base"
},

That’s it! Runs fast on the latest macbooks