Docker gives you reproducible, isolated AI deployments. Same container, same results — whether it's your laptop, a staging server, or production. No more "it works on my machine."
Let's set up Gemma 4 in Docker from scratch.
Why Docker for AI?
- Reproducible: Pin your Ollama version, model files, and config
- Isolated: Won't mess with your host system's Python, CUDA, or anything else
- Portable: Build once, deploy anywhere
- Easy cleanup:
docker compose downand it's gone
If you're just running Gemma 4 for personal use, Ollama directly is simpler. Docker shines when you need consistent deployments across environments or want to bundle Gemma 4 into a larger application stack.
Quick Start with Docker Run
The fastest way to get Gemma 4 running in Docker:
# Run Ollama in Docker
docker run -d \
--name gemma4 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
ollama/ollama
# Pull and run Gemma 4
docker exec gemma4 ollama pull gemma4:26b
docker exec -it gemma4 ollama run gemma4:26bThat's it — three commands. The -v ollama-data:/root/.ollama ensures your model persists when the container restarts.
Dockerfile with Ollama
For more control, build a custom image:
FROM ollama/ollama:latest
# Set environment
ENV OLLAMA_HOST=0.0.0.0
ENV OLLAMA_KEEP_ALIVE=24h
# Create a startup script that pulls the model on first run
COPY <<'EOF' /start.sh
#!/bin/bash
ollama serve &
sleep 5
# Pull model if not already present
if ! ollama list | grep -q "gemma4:26b"; then
echo "Pulling Gemma 4 26B..."
ollama pull gemma4:26b
fi
# Keep container running
wait
EOF
RUN chmod +x /start.sh
EXPOSE 11434
CMD ["/start.sh"]Build and run:
docker build -t gemma4-server .
docker run -d --name gemma4 -p 11434:11434 -v ollama-data:/root/.ollama gemma4-serverDocker Compose (Recommended)
For a proper setup, use docker-compose.yml:
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: gemma4-ollama
ports:
- "11434:11434"
volumes:
- ollama-models:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_KEEP_ALIVE=24h
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
webui:
image: ghcr.io/open-webui/open-webui:main
container_name: gemma4-webui
ports:
- "3000:8080"
volumes:
- webui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
ollama:
condition: service_healthy
restart: unless-stopped
volumes:
ollama-models:
driver: local
webui-data:
driver: localThis gives you Ollama + Open WebUI — a complete ChatGPT-like interface for Gemma 4:
# Start everything
docker compose up -d
# Pull Gemma 4
docker exec gemma4-ollama ollama pull gemma4:26b
# Open the web UI
open http://localhost:3000GPU Passthrough (NVIDIA)
To use your GPU inside Docker, you need the NVIDIA Container Toolkit:
# Install NVIDIA Container Toolkit (Ubuntu/Debian)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smiUpdate your docker-compose.yml to use the GPU:
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama-models:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stoppedNote: On Mac with Apple Silicon, Docker runs in a Linux VM and cannot access Metal acceleration. For Mac, run Ollama natively instead — you'll get Metal GPU acceleration automatically. See our Mac performance guide for details.
Persistent Model Storage
Models are large files. You don't want to re-download them every time a container restarts.
Named volume (recommended — Docker manages the storage):
volumes:
ollama-models:
driver: localBind mount (you choose the path — good for managing disk space):
volumes:
- /data/ollama-models:/root/.ollamaCheck model storage size:
docker exec gemma4-ollama du -sh /root/.ollama/models| Model | Approximate Size (Q4) |
|---|---|
| Gemma 4 E2B | ~1.5 GB |
| Gemma 4 E4B | ~2.5 GB |
| Gemma 4 26B | ~15 GB |
| Gemma 4 31B | ~18 GB |
Multi-Model Setup
Want to run multiple Gemma 4 sizes for different use cases? Easy:
# Pull multiple models
docker exec gemma4-ollama ollama pull gemma4:e4b # Fast, simple tasks
docker exec gemma4-ollama ollama pull gemma4:26b # Most tasks
docker exec gemma4-ollama ollama pull gemma4:31b # Maximum quality
# List all models
docker exec gemma4-ollama ollama listOllama loads models on demand and unloads idle ones. Only the active model uses VRAM. You can configure how long models stay loaded:
environment:
- OLLAMA_KEEP_ALIVE=5m # Unload after 5 minutes of idle
- OLLAMA_MAX_LOADED_MODELS=2 # Keep up to 2 models loadedExposing the API
The Ollama API runs on port 11434 by default. Once your container is running:
# List available models
curl http://localhost:11434/api/tags
# Generate a response
curl http://localhost:11434/api/chat -d '{
"model": "gemma4:26b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# The API is also OpenAI-compatible
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma4:26b",
"messages": [{"role": "user", "content": "Hello!"}]
}'For detailed API usage, see our API tutorial. For production-grade serving with higher throughput, consider vLLM in Docker.
Useful Docker Commands
# View logs
docker compose logs -f ollama
# Check resource usage
docker stats gemma4-ollama
# Enter the container
docker exec -it gemma4-ollama bash
# Stop everything
docker compose down
# Stop and remove model data
docker compose down -v
# Update Ollama image
docker compose pull && docker compose up -dNext Steps
- Deploy for production: vLLM + Docker guide
- Use the API in your app: API tutorial
- Get reliable JSON from Gemma 4: structured output guide
- Run natively on Mac: Mac performance guide



