So you've played with Gemma 4 in a chat window and now you want to build something with it. Good — that's where it gets fun. There are three main ways to call Gemma 4 via API, and each one makes sense for different situations.
Let's walk through all three with real code you can copy and run.
Option 1: Ollama Local API (Free, Private, No Limits)
If you've already set up Ollama locally, you've got an API server running at localhost:11434 right now. No API key needed, no rate limits, completely free.
Python (requests)
import requests
response = requests.post("http://localhost:11434/api/generate", json={
"model": "gemma4",
"prompt": "Explain async/await in Python like I'm 10",
"stream": False
})
print(response.json()["response"])cURL
curl http://localhost:11434/api/generate -d '{
"model": "gemma4",
"prompt": "Explain async/await in Python like I am 10",
"stream": false
}'JavaScript (fetch)
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gemma4",
prompt: "Explain async/await in Python like I'm 10",
stream: false,
}),
});
const data = await response.json();
console.log(data.response);Ollama Chat API (Multi-turn)
For conversations with message history, use the chat endpoint:
import requests
response = requests.post("http://localhost:11434/api/chat", json={
"model": "gemma4",
"messages": [
{"role": "system", "content": "You are a helpful coding tutor."},
{"role": "user", "content": "What's the difference between a list and a tuple?"}
],
"stream": False
})
print(response.json()["message"]["content"])Pros: Zero cost, total privacy, no rate limits, works offline. Cons: Speed depends on your hardware. No GPU = slow.
Option 2: Google AI Studio API (Free Tier Available)
Google offers Gemma 4 through their AI Studio API. You get a generous free tier and it's fast because it runs on Google's TPU infrastructure.
Get Your API Key
- Go to aistudio.google.com
- Click "Get API Key" in the top right
- Create a key (takes 10 seconds)
For a detailed walkthrough, check out our Google AI Studio guide.
Python (google-generativeai SDK)
pip install google-generativeaiimport google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content("Write a Python decorator for retry logic")
print(response.text)cURL
curl "https://generativelanguage.googleapis.com/v1beta/models/gemma-4-27b-it:generateContent?key=YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "Write a Python decorator for retry logic"}]
}]
}'Free Tier Limits
Here's what you get without paying anything:
- 15 requests per minute (RPM)
- 1,500 requests per day (RPD)
- 1 million tokens per minute
That's actually pretty generous for development and small projects. You'll only hit limits if you're building something with real user traffic.
Error Handling
The API returns specific error codes you should handle:
import google.generativeai as genai
from google.api_core import exceptions
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")
try:
response = model.generate_content("Your prompt here")
print(response.text)
except exceptions.ResourceExhausted:
print("Rate limit hit. Wait a minute and try again.")
except exceptions.InvalidArgument as e:
print(f"Bad request: {e}")
except exceptions.NotFound:
print("Model not found. Check the model name.")
except Exception as e:
print(f"Unexpected error: {e}")Option 3: OpenRouter API (OpenAI-Compatible)
OpenRouter is great if you want to swap between models easily. It uses the same format as OpenAI's API, so if you've built anything with GPT, you can switch to Gemma 4 by changing one line.
Get Your API Key
- Go to openrouter.ai
- Sign up and add credits ($5 minimum)
- Generate an API key from the dashboard
Python
import requests
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": "Bearer YOUR_OPENROUTER_KEY",
"Content-Type": "application/json",
},
json={
"model": "google/gemma-4-27b-it",
"messages": [
{"role": "user", "content": "Compare React and Vue in 5 bullet points"}
],
},
)
print(response.json()["choices"][0]["message"]["content"])cURL
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer YOUR_OPENROUTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemma-4-27b-it",
"messages": [
{"role": "user", "content": "Compare React and Vue in 5 bullet points"}
]
}'Using the OpenAI Python SDK
Since OpenRouter is OpenAI-compatible, you can use the official OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="google/gemma-4-27b-it",
messages=[
{"role": "user", "content": "Explain monads in plain English"}
],
)
print(response.choices[0].message.content)This is especially nice because you can switch between Gemma 4, Claude, GPT, Llama, and others just by changing the model string. Want to let the model call external tools and APIs? Check out our function calling guide.
Streaming Responses
Nobody wants to wait 30 seconds for a wall of text. Streaming shows tokens as they're generated — way better UX. Here's how to do it with each method.
Ollama Streaming (Python)
import requests
import json
response = requests.post("http://localhost:11434/api/generate", json={
"model": "gemma4",
"prompt": "Write a short story about a debugging session at 3am",
"stream": True
}, stream=True)
for line in response.iter_lines():
if line:
chunk = json.loads(line)
print(chunk.get("response", ""), end="", flush=True)Google AI Studio Streaming (Python)
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content(
"Write a short story about a debugging session at 3am",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)OpenRouter Streaming (Python)
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
stream = client.chat.completions.create(
model="google/gemma-4-27b-it",
messages=[{"role": "user", "content": "Write a short story about a debugging session at 3am"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Comparison: Which API Should You Use?
| Feature | Ollama (Local) | Google AI Studio | OpenRouter |
|---|---|---|---|
| Cost | Free | Free tier (15 RPM) | Pay per token |
| Speed | Depends on hardware | Fast (Google TPUs) | Fast |
| Privacy | Complete (offline) | Data sent to Google | Data sent to provider |
| Rate Limits | None | 15 RPM / 1,500 RPD | Based on credits |
| Setup | Install Ollama + model | Get API key | Sign up + add credits |
| OpenAI Compatible | Partial | No (own SDK) | Yes |
| Best For | Privacy, development | Free prototyping | Production, multi-model |
My recommendation:
- Building a side project? Start with Google AI Studio's free tier. It's fast and free.
- Privacy matters? Run Ollama locally. Your data stays on your machine.
- Production app? OpenRouter gives you the most flexibility and the ability to fall back to other models.
- Just learning? Ollama. No API keys, no limits, just code.
Common Gotchas
"Connection refused" on Ollama: Make sure the Ollama server is running. On Mac, check if the Ollama icon is in the menu bar. On Linux, run ollama serve first.
"Model not found" on Google AI Studio: Model names change. Check the AI Studio docs for current model IDs.
Slow responses on Ollama: You're probably running on CPU. That's fine — it works, just slower. See our hardware guide for what to expect.
Timeouts: For long-running generations, increase your HTTP client timeout. Gemma 4's 31B model can take a while for complex prompts.
Next Steps
- New to Ollama? Start with our complete Ollama setup guide
- Want to send images to the API? Check out Gemma 4 multimodal guide
- Need better prompts? Browse our 50 best Gemma 4 prompts
- Not sure which model size to pick? Read Gemma 4: Which Model Should You Use?



