Best-of Guide · Servers / VPS
How to Run Ollama on a Cheap VPS
Quick Answer
Yes, you can run Ollama on a cheap VPS, but treat it as CPU-only: stick to small quantized models like gemma3:1b and expect modest speeds, not GPU performance. RackNerd is the cheapest way in (KVM VPS from $2.24/mo), while Hostinger is friendlier if you want a managed-leaning dashboard. See our best VPS for local LLM picks for sizing details.

Why self-host Ollama on a VPS
If you are paying per-token API fees for light or batchy workloads, a self-hosted small model on a flat-rate VPS can lower the bill to a predictable monthly cost. Ollama makes this simple: it pulls open models, exposes a local HTTP API, and runs the same way on your laptop or a remote server.
The trade-off is honest. Hosted APIs give you frontier models and fast responses; a budget VPS gives you a small, private model that you control end to end. Good fits include internal tools, classification, summarization, prototyping, draft generation, and any task where a 1B-4B model is good enough and privacy or cost predictability matters more than raw quality.
- Predictable cost - a flat monthly VPS fee instead of variable per-token billing.
- Privacy - prompts and data stay on a server you control.
- Always-on endpoint - a persistent API your apps and cron jobs can hit.
| Model | Download size | Suggested VPS RAM | Best for |
|---|---|---|---|
| gemma3:1b | 815 MB | 2 GB+ | First model, fastest on CPU |
| llama3.2:1b | 1.3 GB | 2 GB+ | Light chat and tooling |
| llama3.2:3b | 2.0 GB | 4 GB+ | Better quality, slower |
| gemma3:4b | 3.3 GB | 8 GB+ | Best small-model quality, slowest |
Pick a VPS: cheapest vs. friendliest
Two routes cover most people. RackNerd is the cheapest practical entry: KVM VPS plans start at $2.24/mo across 12 locations, with full KVM virtualization and raw, self-managed root access. That is ideal if you are comfortable on the command line and want to squeeze the most RAM per dollar. For local LLM work, do not take the very bottom plan; size up so your chosen model fits in RAM with headroom.
Hostinger VPS is the friendlier option. Its KVM 1 plan runs $6.49/mo intro (renews $11.99/mo on the 2-year term) and gives 1 vCPU, 4 GB RAM, 50 GB NVMe, and 4 TB transfer. KVM 2 at $8.99/mo intro (renews $14.99/mo) doubles that to 2 vCPU, 8 GB RAM, 100 GB NVMe, and 8 TB. The dashboard is more managed-leaning, and there is a 30-day money-back window if it does not work out.
If you would rather not touch the server at all, fully managed cloud like Cloudways exists, but it is priced by server size and aimed at production apps such as WooCommerce, not cheap LLM tinkering. For broader options, see our best cheap VPS roundup and the RackNerd vs. Hostinger VPS comparison.
Realistic specs: it's CPU-only
This is the part people get wrong. Budget VPS plans are CPU-only with no GPU, so inference runs on the processor and is slower than any GPU setup. That is fine for small models and light traffic, but it caps how big you can go.
Plan around the model file size plus operating-system and runtime overhead. As a rough rule, give yourself RAM comfortably larger than the model download. From the official Ollama library, the small models look like this:
- gemma3:1b - 815 MB, the easiest first model to fit.
- llama3.2:1b - 1.3 GB.
- llama3.2:3b - 2.0 GB.
- gemma3:4b - 3.3 GB, workable on an 8 GB box but slower.
A 1B model is the sweet spot for a cheap CPU-only VPS. Stepping up to 3B-4B is possible on 8 GB RAM but expect longer response times, especially with longer prompts. Larger models are not realistic here.
Install Ollama on Ubuntu
Spin up an Ubuntu VPS, then connect over SSH. The official install is one line from ollama.com:
- Install: curl -fsSL https://ollama.com/install.sh | sh
- Run a small model: ollama run gemma3:1b
The install script sets up the Ollama service; the first ollama run pulls the model, then drops you into an interactive prompt. To use it from your apps, Ollama serves a local HTTP API on port 11434.
Before you expose it
By default Ollama listens on localhost. Do not blindly open port 11434 to the public internet - there is no built-in authentication. Safer patterns:
- Keep Ollama bound to localhost and put it behind a reverse proxy (such as Nginx) that adds authentication and TLS.
- Or reach it only over a private network or an SSH tunnel from your app server.
- Configure your VPS firewall to deny direct access to the Ollama port.
New to the command line? Our hosting for beginners guide covers the basics of getting onto a server.
Limits and honest expectations
A cheap VPS running Ollama is a real, useful endpoint - within limits. Keep these in mind before you rely on it:
- Speed - CPU-only inference is slower than GPU. Tokens stream at a modest pace, and long prompts or long outputs feel it most.
- Concurrency - a small box handles one or a few requests at a time. It is not built for many simultaneous users.
- Model quality - 1B-4B models trail frontier hosted models. Test whether the quality is good enough for your specific task.
- RAM is the ceiling - if the model does not fit in memory, it will not run well. Size the plan to the model, not the other way around.
- You manage it - on a raw, self-managed VPS you handle updates, security, and the firewall yourself.
Used for the right jobs - internal tooling, prototyping, light batch work - this setup pays for itself quickly versus per-token billing.
FAQ
What size VPS do I need to run Ollama?
Match RAM to the model plus overhead. A 1B model like gemma3:1b (815 MB) runs on a 2 GB plan, while a 4B model like gemma3:4b (3.3 GB) wants 8 GB. Remember budget VPS are CPU-only, so smaller models respond faster.
Can I run Ollama on a VPS without a GPU?
Yes. Ollama runs on CPU, which is exactly how cheap VPS plans work since they have no GPU. Inference is slower than a GPU setup, so stick to small quantized models such as gemma3:1b or llama3.2:1b for usable speed.
How do I install Ollama on an Ubuntu VPS?
SSH into the server and run the official one-line installer from ollama.com: curl -fsSL https://ollama.com/install.sh | sh. Then start a model with ollama run gemma3:1b. Keep the API on localhost or behind an authenticated reverse proxy before exposing it.