How to Run Gemma 4 on AMD GPU (ROCm Setup Guide)

Apr 7, 2026

If you've got an AMD GPU and want to run Gemma 4 locally, you're in luck — AMD has Day 0 support for Gemma 4 through ROCm. But getting everything working takes a bit more setup than NVIDIA's plug-and-play CUDA ecosystem. This guide walks you through the entire process, from checking GPU compatibility to running inference with vLLM.

Does Your AMD GPU Support Gemma 4?

Not all AMD GPUs work with ROCm. You need a card with a supported architecture. Here's a quick reference:

GPU SeriesArchitectureROCm SupportNotes
Radeon RX 7900 XTX/XTRDNA 3 (gfx1100)YesBest consumer option
Radeon RX 7800 XTRDNA 3 (gfx1101)YesGood mid-range
Radeon RX 7600RDNA 3 (gfx1102)PartialLimited VRAM (8GB)
Instinct MI250XCDNA 2 (gfx90a)YesData center GPU
Instinct MI300XCDNA 3 (gfx942)YesTop-tier performance
Radeon RX 6000 seriesRDNA 2LimitedCommunity workarounds only

Important: The architecture string must match exactly. If ROCm detects the wrong architecture, you'll get silent failures or garbage output. Check yours with:

rocminfo | grep "Name:" | grep "gfx"

Installing ROCm on Linux

ROCm is Linux-only for serious ML workloads. Windows support exists through WSL2, but it's limited and not recommended for production use.

Step 1: Check Your Kernel and Driver

# Check kernel version (5.15+ recommended)
uname -r

# Check if amdgpu driver is loaded
lsmod | grep amdgpu

Step 2: Install ROCm

For Ubuntu 22.04/24.04:

# Add AMD's package repository
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.4.60401-1_all.deb
sudo dpkg -i amdgpu-install_6.4.60401-1_all.deb

# Install ROCm with ML libraries
sudo amdgpu-install --usecase=rocm,ml

# Add your user to the render and video groups
sudo usermod -aG render,video $USER

# Reboot
sudo reboot

Step 3: Verify Installation

# Check ROCm is working
rocm-smi

# You should see your GPU listed with temperature and memory info

Running Gemma 4 with the Lemonade Tool

AMD's Lemonade tool is the easiest way to get Gemma 4 running on AMD hardware. It handles model download, quantization, and serving in one package.

# Install Lemonade
pip install lemonade-sdk

# Run Gemma 4 with automatic optimization
lemonade serve --model gemma-4-12b-it --device rocm

# For the smaller model
lemonade serve --model gemma-4-1b-it --device rocm

Lemonade automatically detects your GPU architecture and applies the right optimizations. It's a great starting point before moving to more advanced setups.

Using vLLM with ROCm

For production inference, vLLM with ROCm support gives you the best throughput:

# Install vLLM with ROCm support
pip install vllm-rocm

# Start the server
python -m vllm.entrypoints.openai.api_server \
  --model google/gemma-4-12b-it \
  --tensor-parallel-size 1 \
  --dtype float16 \
  --max-model-len 8192

SGLang Alternative

SGLang also supports ROCm and can be faster for certain workloads:

pip install sglang[rocm]

python -m sglang.launch_server \
  --model-path google/gemma-4-12b-it \
  --port 8000 \
  --device rocm

Common Issues and Fixes

"Triton backend required for multimodal"

If you're trying to use Gemma 4's vision or audio features on AMD, you need the Triton backend compiled for ROCm:

# Install Triton with ROCm support
pip install triton-rocm

# Set the backend explicitly
export TRITON_BACKEND=rocm

Without this, text-only inference works fine, but multimodal inputs will fail silently or throw cryptic errors.

Architecture String Mismatch

This is the most common issue. If you see errors like hipErrorNoBinaryForGpu, your architecture string doesn't match:

# Check what ROCm thinks your GPU is
rocminfo | grep gfx

# Override if needed (example for RX 7900 XTX)
export HSA_OVERRIDE_GFX_VERSION=11.0.0

Out of Memory Errors

AMD GPUs share VRAM reporting differently than NVIDIA. Check actual available memory:

rocm-smi --showmeminfo vram

# If you're running out, try a smaller quantization
# Q4_K_M works well on 16GB cards

Performance Is Worse Than Expected

Make sure you're not accidentally running on CPU:

# Verify GPU is being used
watch -n 1 rocm-smi

# You should see GPU utilization > 0% during inference

Performance Expectations

Here's what to expect for token generation speed with Gemma 4 12B Q4_K_M:

GPUVRAMTokens/secNotes
RX 7900 XTX24GB~35-45Best consumer AMD option
RX 7800 XT16GB~25-30Good for most tasks
MI300X192GB~120+Data center, runs full precision
MI250X128GB~80+Previous gen data center

Windows and WSL2

If you absolutely must use Windows, ROCm works through WSL2 with some limitations:

# Inside WSL2 Ubuntu
sudo apt install rocm-hip-runtime
# Limited to HIP runtime only — no full ROCm stack

For a better Windows experience, consider using Ollama which handles AMD GPU detection automatically on supported cards.

Next Steps

Running Gemma 4 on AMD is totally doable — it just takes a bit more initial setup than NVIDIA. Once ROCm is configured correctly, performance is competitive, and AMD's Day 0 support means you'll get updates alongside NVIDIA users going forward.

Gemma 4 AI

Gemma 4 AI

Related Guides