Guide to Self Hosting LLMs Faster/Better than Ollama

brucethemoose@lemmy.world · edit-2 17 days ago

Guide to Self Hosting LLMs Faster/Better than Ollama

kitnaht@lemmy.world · edit-2 17 days ago

If your “FIRST STEP” is to choose an OS: Fuck that.

You should never have to change your OS just to use this crap. It’s all written in Python. It should work on every OS available. Your first step is installing the prerequisites.

If you’re using something like Continue for local coding tasks, CodeQwen is awesome, and you’ll generally want a context window of 120k or so because for coding, you want all the code context - or else the LLM starts spitting out repetitious stuff, or can’t ingest all of your context so it’ll rewrite stuff that’s already there.

sturlabragason@lemmy.world · 17 days ago

Choose OS is very relevant when doing cloud stuff.

brucethemoose@lemmy.world · 17 days ago

Or setting up a home server, which I figured some here would do.

brucethemoose@lemmy.world · 17 days ago

CodeQwen 1.5 is pretty old at this point, afaik made obsolete by their latest release.

The Qwen models (at least 2.5) are really only good to like 32K, which is still a ton of context. But I’ve been testing Qwen 32B at 64K -90K and even that larger model is… Not great.

32K is generally enough to get the jist of whatever you’re trying to fill in.

L_Acacia@lemmy.one · 17 days ago

llama.cpp works on windows too (or any os for that matter), though linux will vive you better performances

gravitas_deficiency@sh.itjust.works · 17 days ago

Wtf are you talking about. PCIe passthrough exists.

brucethemoose@lemmy.world · 17 days ago

I would not recommend that for performance reasons, AFAIK.

Windows is fine, I should make that more clear.

gravitas_deficiency@sh.itjust.works · 17 days ago

Huh, really? Is there that much of a perf hit using passthrough? I’d have assumed that the bottleneck isn’t actually the PCIE, so much as it is the beefiness of the GPU crunching the model.

brucethemoose@lemmy.world · edit-2 17 days ago

I have not tested WSL or VMs in Windows in awhile, but my impression is that “it depends” and you should use the native windows version unless you are having some major installation issues.

kitnaht@lemmy.world · edit-2 17 days ago

Why would you even bother trying to run this all through a VM when you can just run it directly? If you’re to the point of using VMs, you don’t need this tutorial anyways.

Are you seriously telling me you’re jumping through all the hoops to spin up a VM on Linux, and then doing all the configuration for GPU passthrough, because you can’t just figure out how to run it locally?

gravitas_deficiency@sh.itjust.works · 17 days ago

Bro this is a community for sharing knowledge and increasing the technical aptitude of fellow users by doing said sharing. Maybe instead of shitting on a pretty solid digest of the fundamentals of setting up something like this, try adding to the body of knowledge instead.