If you've got an AMD GPU and want to run Gemma 4 locally, you're in luck — AMD has Day 0 support for Gemma 4 through ROCm. But getting everything working takes a bit more setup than NVIDIA's plug-and-play CUDA ecosystem. This guide walks you through the entire process, from checking GPU compatibility to running inference with vLLM.
Does Your AMD GPU Support Gemma 4?
Not all AMD GPUs work with ROCm. You need a card with a supported architecture. Here's a quick reference:
| GPU Series | Architecture | ROCm Support | Notes |
|---|---|---|---|
| Radeon RX 7900 XTX/XT | RDNA 3 (gfx1100) | Yes | Best consumer option |
| Radeon RX 7800 XT | RDNA 3 (gfx1101) | Yes | Good mid-range |
| Radeon RX 7600 | RDNA 3 (gfx1102) | Partial | Limited VRAM (8GB) |
| Instinct MI250X | CDNA 2 (gfx90a) | Yes | Data center GPU |
| Instinct MI300X | CDNA 3 (gfx942) | Yes | Top-tier performance |
| Radeon RX 6000 series | RDNA 2 | Limited | Community workarounds only |
Important: The architecture string must match exactly. If ROCm detects the wrong architecture, you'll get silent failures or garbage output. Check yours with:
rocminfo | grep "Name:" | grep "gfx"Installing ROCm on Linux
ROCm is Linux-only for serious ML workloads. Windows support exists through WSL2, but it's limited and not recommended for production use.
Step 1: Check Your Kernel and Driver
# Check kernel version (5.15+ recommended)
uname -r
# Check if amdgpu driver is loaded
lsmod | grep amdgpuStep 2: Install ROCm
For Ubuntu 22.04/24.04:
# Add AMD's package repository
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb
sudo dpkg -i amdgpu-install_6.4.60401-1_all.deb
# Install ROCm with ML libraries
sudo amdgpu-install --usecase=rocm,ml
# Add your user to the render and video groups
sudo usermod -aG render,video $USER
# Reboot
sudo rebootStep 3: Verify Installation
# Check ROCm is working
rocm-smi
# You should see your GPU listed with temperature and memory infoRunning Gemma 4 with the Lemonade Tool
AMD's Lemonade tool is the easiest way to get Gemma 4 running on AMD hardware. It handles model download, quantization, and serving in one package.
# Install Lemonade
pip install lemonade-sdk
# Run Gemma 4 with automatic optimization
lemonade serve --model gemma-4-12b-it --device rocm
# For the smaller model
lemonade serve --model gemma-4-1b-it --device rocmLemonade automatically detects your GPU architecture and applies the right optimizations. It's a great starting point before moving to more advanced setups.
Using vLLM with ROCm
For production inference, vLLM with ROCm support gives you the best throughput:
# Install vLLM with ROCm support
pip install vllm-rocm
# Start the server
python -m vllm.entrypoints.openai.api_server \
--model google/gemma-4-12b-it \
--tensor-parallel-size 1 \
--dtype float16 \
--max-model-len 8192SGLang Alternative
SGLang also supports ROCm and can be faster for certain workloads:
pip install sglang[rocm]
python -m sglang.launch_server \
--model-path google/gemma-4-12b-it \
--port 8000 \
--device rocmCommon Issues and Fixes
"Triton backend required for multimodal"
If you're trying to use Gemma 4's vision or audio features on AMD, you need the Triton backend compiled for ROCm:
# Install Triton with ROCm support
pip install triton-rocm
# Set the backend explicitly
export TRITON_BACKEND=rocmWithout this, text-only inference works fine, but multimodal inputs will fail silently or throw cryptic errors.
Architecture String Mismatch
This is the most common issue. If you see errors like hipErrorNoBinaryForGpu, your architecture string doesn't match:
# Check what ROCm thinks your GPU is
rocminfo | grep gfx
# Override if needed (example for RX 7900 XTX)
export HSA_OVERRIDE_GFX_VERSION=11.0.0Out of Memory Errors
AMD GPUs share VRAM reporting differently than NVIDIA. Check actual available memory:
rocm-smi --showmeminfo vram
# If you're running out, try a smaller quantization
# Q4_K_M works well on 16GB cardsPerformance Is Worse Than Expected
Make sure you're not accidentally running on CPU:
# Verify GPU is being used
watch -n 1 rocm-smi
# You should see GPU utilization > 0% during inferencePerformance Expectations
Here's what to expect for token generation speed with Gemma 4 12B Q4_K_M:
| GPU | VRAM | Tokens/sec | Notes |
|---|---|---|---|
| RX 7900 XTX | 24GB | ~35-45 | Best consumer AMD option |
| RX 7800 XT | 16GB | ~25-30 | Good for most tasks |
| MI300X | 192GB | ~120+ | Data center, runs full precision |
| MI250X | 128GB | ~80+ | Previous gen data center |
Windows and WSL2
If you absolutely must use Windows, ROCm works through WSL2 with some limitations:
# Inside WSL2 Ubuntu
sudo apt install rocm-hip-runtime
# Limited to HIP runtime only — no full ROCm stackFor a better Windows experience, consider using Ollama which handles AMD GPU detection automatically on supported cards.
Next Steps
- Having issues? Check our Gemma 4 Troubleshooting Guide for solutions to the most common problems
- Not sure if your hardware is enough? Read the Hardware Requirements Guide for detailed VRAM and RAM recommendations
- Want to compare models? See Which Gemma 4 Model Should You Pick? to choose the right size for your AMD GPU
Running Gemma 4 on AMD is totally doable — it just takes a bit more initial setup than NVIDIA. Once ROCm is configured correctly, performance is competitive, and AMD's Day 0 support means you'll get updates alongside NVIDIA users going forward.



