Running Gemma 4 locally means your data never leaves your machine. No API costs, no rate limits, complete privacy. This guide shows you how to get Gemma 4 running in under 5 minutes using Ollama.
What You Need
- A computer with at least 8GB RAM (16GB recommended for larger models)
- macOS, Windows, or Linux
- About 2-5GB of free disk space (depending on model size)
Step 1: Install Ollama
Visit ollama.com and download the installer for your operating system.
macOS:
# Or install via Homebrew
brew install ollamaLinux:
curl -fsSL https://ollama.com/install.sh | shWindows: Download the installer from ollama.com/download.
Step 2: Run Gemma 4
Once Ollama is installed, running Gemma 4 is literally one command:
ollama run gemma4That's it. Ollama will automatically download the model and start an interactive chat session. For other download methods (Hugging Face, LM Studio, Kaggle), see our complete download guide.
Choosing the Right Model Size
Gemma 4 comes in four sizes. Here's how to choose:
| Model | Parameters | RAM Needed | Best For | Command |
|---|---|---|---|---|
| E2B | 2B | ~4GB | Mobile, quick tasks | ollama run gemma4:e2b |
| E4B | 4B | ~6GB | Laptops, daily use | ollama run gemma4:e4b |
| 26B MoE | 26B | ~16GB | Best efficiency | ollama run gemma4:26b |
| 31B Dense | 31B | ~20GB | Maximum quality | ollama run gemma4:31b |
Recommendation: Start with E4B if you have a modern laptop. It offers the best balance of speed and quality. Not sure which size fits your use case? Read our detailed model comparison guide.
Step 3: Use Gemma 4 for Different Tasks
Text Chat
ollama run gemma4
>>> Tell me about quantum computing in simple termsCode Generation
ollama run gemma4
>>> Write a Python function to sort a list of dictionaries by a keyImage Understanding (Multimodal)
Gemma 4 can analyze images:
ollama run gemma4
>>> Describe this image: /path/to/image.jpgUsing the API
Ollama also provides a local API at http://localhost:11434:
curl http://localhost:11434/api/generate -d '{
"model": "gemma4",
"prompt": "What is machine learning?"
}'Performance Tips
- Close other apps — free up RAM for the model
- Use quantized models — Ollama serves quantized versions by default, which are much faster
- GPU acceleration — if you have an NVIDIA GPU, Ollama will use it automatically
- Adjust context length — for longer conversations, set
/set parameter num_ctx 8192
Gemma 4 vs Cloud APIs
| Feature | Gemma 4 Local (Ollama) | Cloud API (ChatGPT, Gemini) |
|---|---|---|
| Cost | Free forever | Pay per token |
| Privacy | 100% local | Data sent to server |
| Speed | Depends on hardware | Usually faster |
| Internet | Not needed | Required |
| Rate Limits | None | Yes |
| Customization | Full control | Limited |
Troubleshooting
"Not enough memory" — Try a smaller model: ollama run gemma4:e2b
Slow response — Make sure no other heavy apps are running. Check if GPU is being used: ollama ps
Model not found — Update Ollama: ollama update, then try again.
For more detailed solutions to these and other issues, check our Gemma 4 troubleshooting guide.
Next Steps
- Compare Gemma 4 models in detail on our Models page
- Try LM Studio for a graphical interface
- Explore Google AI Studio for cloud-based access
Gemma 4 is developed by Google DeepMind and released under the Apache 2.0 license. This guide is provided by the Gemma 4 AI community.



