So you want to get Gemma 4 running. Good news — there are a bunch of ways to do it, and at least one of them will be perfect for your situation. Whether you want a one-liner in the terminal or a point-and-click GUI, this guide covers every option.
Let's walk through each method, from easiest to most advanced.
Method 1: Ollama (Recommended for Most People)
This is the fastest way to go from zero to running Gemma 4. One command, and you're chatting.
# Install Ollama first (macOS)
brew install ollama
# Then run Gemma 4 — it downloads automatically
ollama run gemma4That's literally it. Ollama handles the download, model setup, and gives you an interactive chat right in your terminal.
Want a specific model size? Just add a tag:
ollama run gemma4:e2b # Smallest, fastest
ollama run gemma4:e4b # Best for most laptops
ollama run gemma4:26b # MoE, great efficiency
ollama run gemma4:31b # Maximum qualityFor the full Ollama setup walkthrough, check out our detailed Ollama guide.
Best for: Developers, terminal users, anyone who wants the fastest setup.
Method 2: LM Studio (Best GUI Experience)
If you'd rather not touch a terminal, LM Studio is your friend. It's a desktop app with a clean interface for downloading and running local models.
Steps:
- Download LM Studio from lmstudio.ai
- Open the app and search for "gemma4" in the model browser
- Click the download button next to the model size you want
- Once downloaded, click "Chat" and start talking
LM Studio also lets you tweak settings like temperature, context length, and system prompts through a nice sidebar — no config files needed.
For a complete walkthrough, see our LM Studio guide.
Best for: Non-developers, people who prefer GUIs, anyone who wants to experiment with model settings visually.
Method 3: Hugging Face (Direct Weight Download)
This is the route for ML engineers and researchers who want the raw model weights. You'll download the files directly and load them into your own inference pipeline.
# Install the Hugging Face CLI
pip install huggingface-hub
# Download Gemma 4 E4B
huggingface-cli download google/gemma-4-e4b
# Or download a specific GGUF quantization
huggingface-cli download google/gemma-4-e4b-GGUF \
--include "gemma-4-e4b-Q4_K_M.gguf"You can also browse and download from the web UI at huggingface.co/google — just search for "gemma-4".
Note: You'll need to accept Google's license agreement on Hugging Face before downloading. It's Apache 2.0, so no weird restrictions — just a one-time click.
Loading in Python with Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "google/gemma-4-e4b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
input_text = "Explain quantum computing in simple terms"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Best for: ML researchers, fine-tuning, custom inference pipelines, integration with existing ML codebases.
Method 4: Google AI Studio (No Download Needed)
Don't want to download anything at all? Google AI Studio lets you use Gemma 4 right in your browser. No setup, no hardware requirements.
Head to aistudio.google.com and select Gemma 4 from the model dropdown. You get a full chat interface, prompt playground, and even API key generation.
# You can also use the API after getting a key
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-e4b")
response = model.generate_content("Write a haiku about coding")
print(response.text)Check out our Google AI Studio guide for the full walkthrough.
Best for: Quick testing, no-setup exploration, people with limited hardware.
Method 5: Kaggle (Alternative Download Source)
Kaggle hosts Gemma 4 models too. This is especially handy if you're already in the Kaggle ecosystem or want free GPU notebooks to test with.
Steps:
- Go to kaggle.com/models/google/gemma-4
- Accept the license
- Download weights directly, or use them in a Kaggle notebook with free GPU
# In a Kaggle notebook with GPU
import kagglehub
model_path = kagglehub.model_download("google/gemma-4/transformers/e4b")
print(f"Model downloaded to: {model_path}")Best for: Kaggle users, free GPU access for testing, academic research.
Which Method Should You Choose?
Here's the quick decision matrix:
| Method | Setup Time | Difficulty | GPU Needed? | Offline? | Best For |
|---|---|---|---|---|---|
| Ollama | 2 min | Easy | No (but helps) | Yes | Developers, daily use |
| LM Studio | 5 min | Very Easy | No (but helps) | Yes | GUI lovers, beginners |
| Hugging Face | 10-15 min | Advanced | Recommended | Yes | ML engineers, fine-tuning |
| Google AI Studio | 30 sec | Very Easy | No | No | Quick testing, no hardware |
| Kaggle | 5-10 min | Moderate | Free GPUs! | No | Research, experimentation |
My Recommendation
- Just want to try it? → Google AI Studio. Zero setup.
- Want to run it daily on your machine? → Ollama. One command and done.
- Prefer a GUI? → LM Studio. Clean and simple.
- Building something custom? → Hugging Face. Full control.
- Need free GPU time? → Kaggle. Free T4/P100 GPUs.
Storage Requirements
Before you download, make sure you have enough disk space:
| Model | GGUF (Q4_K_M) | Full Weights (FP16) |
|---|---|---|
| E2B | ~1.5 GB | ~4 GB |
| E4B | ~3 GB | ~8 GB |
| 26B MoE | ~8 GB | ~52 GB |
| 31B Dense | ~18 GB | ~62 GB |
Most people should grab the GGUF quantized versions — they're much smaller and the quality difference is minimal for everyday use. Not sure if your machine can handle a particular model size? Check our hardware requirements guide before downloading.
Troubleshooting Downloads
Download too slow?
- Hugging Face: Try setting
HF_HUB_ENABLE_HF_TRANSFER=1after installingpip install hf-transfer - Ollama: Downloads are usually fast, but check your internet connection
- Try a mirror if you're in a region with slow access to the default servers
Not enough disk space?
- Start with E2B or E4B — they're much smaller
- Use quantized (GGUF Q4) versions instead of full-precision weights
- Clean up old models:
ollama rm <model_name>
License issues on Hugging Face?
- Make sure you're logged in:
huggingface-cli login - Accept the license on the model page before trying to download
Next Steps
Once you've got Gemma 4 downloaded, here's where to go:
- Set up Ollama properly → How to Run Gemma 4 with Ollama
- Configure LM Studio → LM Studio Guide
- Pick the right model size → Which Gemma 4 Model Should I Use?
- Running into issues? → Gemma 4 Troubleshooting Guide



