If you're building an app on top of Gemma 4, you need structured output — not free-form text. You need JSON that you can parse, validate, and pipe into your database or API. Every single time, without exceptions.
This is one of the trickiest parts of working with local LLMs, but with the right techniques, Gemma 4 can be surprisingly reliable. Let's go through every method.
Why Structured Output Matters
When you're using Gemma 4 as a component in a larger system — not just chatting with it — you need predictable output:
# This is what you want:
{"sentiment": "positive", "confidence": 0.92, "topics": ["pricing", "support"]}
# This is what you don't want:
"The sentiment of this text is positive, with a confidence of about 92%..."The first one can be parsed and used programmatically. The second requires another round of parsing, which adds latency, cost, and failure points.
Method 1: System Prompt Technique
The simplest approach — tell the model exactly what you want in the system prompt:
import requests
import json
response = requests.post("http://localhost:11434/api/chat", json={
"model": "gemma4:26b",
"messages": [
{
"role": "system",
"content": """You are a JSON-only response API.
You MUST respond with valid JSON and nothing else.
No markdown, no explanation, no code blocks — just raw JSON.
Schema:
{
"sentiment": "positive" | "negative" | "neutral",
"confidence": number between 0 and 1,
"topics": string[],
"summary": string (one sentence)
}"""
},
{
"role": "user",
"content": "Analyze: 'The new update is amazing! The UI is so much cleaner and everything loads faster. Only complaint is the price went up.'"
}
],
"stream": False,
})
result = json.loads(response.json()["message"]["content"])
print(result)This works most of the time. But "most of the time" isn't good enough for production. The model might occasionally add a preamble like "Here's the JSON:" or wrap the output in markdown code blocks.
Method 2: Ollama Format Parameter
Ollama has a built-in format parameter that constrains the output to valid JSON:
response = requests.post("http://localhost:11434/api/chat", json={
"model": "gemma4:26b",
"messages": [
{
"role": "system",
"content": "Analyze the sentiment of the given text. Return: sentiment (positive/negative/neutral), confidence (0-1), topics (list), summary (one sentence)."
},
{
"role": "user",
"content": "The customer service was terrible but the product itself is excellent."
}
],
"format": "json",
"stream": False,
})
# This is guaranteed to be valid JSON
result = response.json()["message"]["content"]
parsed = json.loads(result)The format: "json" flag tells Ollama to constrain the token generation to only produce valid JSON. This is much more reliable than prompt engineering alone.
Limitation: It guarantees valid JSON syntax, but it doesn't guarantee the schema. The model might return {"answer": "positive"} instead of your expected format. You still need validation.
Method 3: Schema Definition with Pydantic
For production code, define your expected schema with Pydantic and validate against it:
from pydantic import BaseModel, Field
from typing import Literal
import json
import requests
class SentimentResult(BaseModel):
sentiment: Literal["positive", "negative", "neutral"]
confidence: float = Field(ge=0, le=1)
topics: list[str]
summary: str
def analyze_sentiment(text: str) -> SentimentResult:
schema_str = json.dumps(SentimentResult.model_json_schema(), indent=2)
response = requests.post("http://localhost:11434/api/chat", json={
"model": "gemma4:26b",
"messages": [
{
"role": "system",
"content": f"""Respond with JSON matching this exact schema:
{schema_str}
No other text. Just valid JSON."""
},
{
"role": "user",
"content": f"Analyze this text: {text}"
}
],
"format": "json",
"stream": False,
})
raw = json.loads(response.json()["message"]["content"])
return SentimentResult.model_validate(raw)
# Usage
result = analyze_sentiment("Great product, terrible shipping time.")
print(f"Sentiment: {result.sentiment} ({result.confidence:.0%})")
print(f"Topics: {', '.join(result.topics)}")This gives you type safety and validation. If the model returns something unexpected, Pydantic throws a clear error instead of silently corrupting your data.
Method 4: Validation and Retry Pattern
For maximum reliability, add a retry loop:
from pydantic import BaseModel, ValidationError
import json
import requests
import time
def get_structured_output(
prompt: str,
schema_class: type[BaseModel],
model: str = "gemma4:26b",
max_retries: int = 3,
) -> BaseModel:
schema_str = json.dumps(schema_class.model_json_schema(), indent=2)
for attempt in range(max_retries):
try:
response = requests.post("http://localhost:11434/api/chat", json={
"model": model,
"messages": [
{
"role": "system",
"content": f"Respond ONLY with JSON matching this schema:\n{schema_str}"
},
{"role": "user", "content": prompt}
],
"format": "json",
"stream": False,
"options": {
"temperature": 0.1 if attempt == 0 else 0.3,
},
})
raw = json.loads(response.json()["message"]["content"])
return schema_class.model_validate(raw)
except (json.JSONDecodeError, ValidationError) as e:
if attempt == max_retries - 1:
raise ValueError(
f"Failed to get valid output after {max_retries} attempts: {e}"
)
time.sleep(0.5)
raise ValueError("Unreachable")
# Usage
class ProductReview(BaseModel):
rating: int = Field(ge=1, le=5)
pros: list[str]
cons: list[str]
recommendation: bool
review = get_structured_output(
"Review: 'Solid laptop, great keyboard, battery could be better. 4/5 would buy again.'",
ProductReview,
)Key design choices:
- Start with low temperature (0.1) for consistency, increase on retries for variety
- Use
format: "json"to guarantee valid JSON syntax - Validate with Pydantic for schema correctness
- Cap retries at 3 — if it fails 3 times, the prompt probably needs work
Common Failures and Fixes
Model wraps JSON in markdown:
```json
{"key": "value"}
```Fix: Use format: "json" in Ollama. If that's not available, strip markdown:
def clean_json(text: str) -> str:
text = text.strip()
if text.startswith("```"):
text = text.split("\n", 1)[1] # Remove first line
text = text.rsplit("```", 1)[0] # Remove last ```
return text.strip()Model adds extra fields:
The model might return fields you didn't ask for. Pydantic handles this — by default it ignores extra fields. Or set model_config = ConfigDict(extra="forbid") to reject them.
Model uses wrong types:
Sometimes the model returns "0.92" (string) instead of 0.92 (number). Pydantic's model_validate handles most type coercion automatically.
Empty or null fields:
Make fields optional when they might be empty:
class Result(BaseModel):
name: str
email: str | None = None # Model might not find an email
topics: list[str] = [] # Default to empty listNested objects:
Gemma 4 handles nested JSON well, but keep nesting to 2-3 levels max:
class Address(BaseModel):
city: str
country: str
class Person(BaseModel):
name: str
age: int
address: Address # One level of nesting — finePerformance Tips
- Lower temperature (0.1-0.3) produces more consistent JSON
- Shorter schemas get better compliance — don't ask for 20 fields at once
- Few-shot examples in the system prompt dramatically improve reliability
- The 26B model is significantly better at JSON than E4B — see model comparison
- Thinking mode helps with complex schemas — see thinking mode guide
Next Steps
- Use JSON output with the Ollama API in your applications
- Deploy a JSON API server with vLLM + Docker
- Fine-tune Gemma 4 for your specific JSON format
- Learn about thinking mode for complex structured tasks



