Best-of Guide · Servers / VPS

What is the best VPS for running local LLMs on a budget?

Last updated June 17, 2026 · Source-based research, not a hands-on benchmark

Shortlist live

The short version

A cheap VPS can run small local LLMs on CPU — fine for personal use, light coding help, and cutting API bills, but not fast or for large models. Pick RackNerd (KVM VPS from $2.24/mo) for the lowest cost, or Hostinger VPS (8 GB RAM on KVM 2 at $8.99/mo) for more headroom and a friendlier setup. For fast or large-model inference you need a GPU, which budget VPS do not include.

See RackNerd VPS

RackNerd official site showing KVM VPS pricing and data-center locations, captured June 2026 — RackNerd's official site — KVM VPS from $2.24/mo across 12 locations, captured June 17, 2026.

The reality: budget VPS are CPU-only

This is the single most important fact before you rent a server. Cheap VPS from RackNerd, Hostinger, and similar providers give you CPU and RAM — no GPU. Tools like Ollama and llama.cpp will happily run a model on CPU, but generation is much slower than on a GPU, and the bigger the model, the slower it gets. So a budget VPS is great for small models and light, personal, or background use, and a poor fit for fast chat or large models. For those you need a GPU instance, which costs far more.

How much RAM do you actually need?

The model has to fit in RAM. Small open models are surprisingly compact: per Ollama's model library, Llama 3.2 1B is about 1.3 GB, Llama 3.2 3B about 2.0 GB, and Gemma 3 4B about 3.3 GB to download. You need that much RAM for the weights plus headroom for the OS, the context window, and runtime overhead — so aim for at least a few GB above the model size.

Model (quantized)	Approx. size	Comfortable VPS RAM	Fits which plan
Gemma 3 1B	~0.8 GB	2 GB+	Entry RackNerd / any VPS
Llama 3.2 3B	~2.0 GB	4 GB+	Hostinger KVM 1 (4 GB)
Gemma 3 4B	~3.3 GB	8 GB	Hostinger KVM 2 (8 GB)
7B-class	~4–5 GB	8 GB+	Hostinger KVM 2 / larger RackNerd

Model sizes per Ollama's library (June 2026). RAM guidance allows headroom for OS, context, and runtime; CPU inference speed still depends on cores.

Best budget picks

Cheapest: RackNerd

RackNerd KVM VPS starts at $2.24/mo across 12 locations — the lowest-cost way to keep a small model running 24/7. Choose a plan with enough RAM for your target model size and accept CPU-speed generation. Visit RackNerd →

More headroom: Hostinger VPS

Hostinger VPS KVM 2 gives 2 vCPU and 8 GB RAM for $8.99/mo (renews $14.99/mo) — enough for 3B–7B-class models with a friendlier dashboard and guided setup. KVM 1 (4 GB, $6.49/mo) suits 1B–3B models. Visit Hostinger →

When a budget VPS is not enough

If you want fast responses, large models (13B and up), or to serve multiple users, CPU-only budget VPS will frustrate you — that is GPU territory, and far more expensive. For occasional heavy jobs, a managed cloud you can scale up and down (see Cloudways) or a dedicated GPU host makes more sense than a cheap VPS. For everyday personal use — a coding helper, a summarizer, a private chatbot — a small model on a budget VPS does the job cheaply. Compare the cheapest options in our best cheap VPS guide and the head-to-head RackNerd vs Hostinger VPS. For what solo builders ship on this kind of setup, see our 100 one-person companies study.

FAQ

Can a cheap VPS run a local LLM?

Yes, small ones. A budget VPS runs models on CPU, which works well for small quantized models (1B–4B) for personal and light use. Per Ollama's library these models are roughly 0.8–3.3 GB, so they fit on a VPS with a few GB of RAM. Generation is slower than on a GPU.

How much RAM do I need to run a local model?

Enough to hold the model plus overhead. A ~2 GB model (Llama 3.2 3B) is comfortable on 4 GB RAM; a ~3.3 GB model (Gemma 3 4B) wants 8 GB. Add headroom for the OS and context window.

Do I need a GPU?

Not for small models at light usage — CPU is fine, just slower. You need a GPU for fast responses, large models, or serving many users, and budget VPS do not include one.