Gemma 4 Structured Output: How to Get Reliable JSON Every Time

If you're building an app on top of Gemma 4, you need structured output — not free-form text. You need JSON that you can parse, validate, and pipe into your database or API. Every single time, without exceptions.

This is one of the trickiest parts of working with local LLMs, but with the right techniques, Gemma 4 can be surprisingly reliable. Let's go through every method.

Why Structured Output Matters

When you're using Gemma 4 as a component in a larger system — not just chatting with it — you need predictable output:

# This is what you want:
{"sentiment": "positive", "confidence": 0.92, "topics": ["pricing", "support"]}

# This is what you don't want:
"The sentiment of this text is positive, with a confidence of about 92%..."

The first one can be parsed and used programmatically. The second requires another round of parsing, which adds latency, cost, and failure points.

Method 1: System Prompt Technique

The simplest approach — tell the model exactly what you want in the system prompt:

import requests
import json

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4:26b",
    "messages": [
        {
            "role": "system",
            "content": """You are a JSON-only response API. 
You MUST respond with valid JSON and nothing else.
No markdown, no explanation, no code blocks — just raw JSON.

Schema:
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": number between 0 and 1,
  "topics": string[],
  "summary": string (one sentence)
}"""
        },
        {
            "role": "user",
            "content": "Analyze: 'The new update is amazing! The UI is so much cleaner and everything loads faster. Only complaint is the price went up.'"
        }
    ],
    "stream": False,
})

result = json.loads(response.json()["message"]["content"])
print(result)

This works most of the time. But "most of the time" isn't good enough for production. The model might occasionally add a preamble like "Here's the JSON:" or wrap the output in markdown code blocks.

Method 2: Ollama Format Parameter

Ollama has a built-in format parameter that constrains the output to valid JSON:

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "gemma4:26b",
    "messages": [
        {
            "role": "system",
            "content": "Analyze the sentiment of the given text. Return: sentiment (positive/negative/neutral), confidence (0-1), topics (list), summary (one sentence)."
        },
        {
            "role": "user",
            "content": "The customer service was terrible but the product itself is excellent."
        }
    ],
    "format": "json",
    "stream": False,
})

# This is guaranteed to be valid JSON
result = response.json()["message"]["content"]
parsed = json.loads(result)

The format: "json" flag tells Ollama to constrain the token generation to only produce valid JSON. This is much more reliable than prompt engineering alone.

Limitation: It guarantees valid JSON syntax, but it doesn't guarantee the schema. The model might return {"answer": "positive"} instead of your expected format. You still need validation.

Method 3: Schema Definition with Pydantic

For production code, define your expected schema with Pydantic and validate against it:

from pydantic import BaseModel, Field
from typing import Literal
import json
import requests

class SentimentResult(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float = Field(ge=0, le=1)
    topics: list[str]
    summary: str

def analyze_sentiment(text: str) -> SentimentResult:
    schema_str = json.dumps(SentimentResult.model_json_schema(), indent=2)
    
    response = requests.post("http://localhost:11434/api/chat", json={
        "model": "gemma4:26b",
        "messages": [
            {
                "role": "system",
                "content": f"""Respond with JSON matching this exact schema:
{schema_str}

No other text. Just valid JSON."""
            },
            {
                "role": "user",
                "content": f"Analyze this text: {text}"
            }
        ],
        "format": "json",
        "stream": False,
    })
    
    raw = json.loads(response.json()["message"]["content"])
    return SentimentResult.model_validate(raw)

# Usage
result = analyze_sentiment("Great product, terrible shipping time.")
print(f"Sentiment: {result.sentiment} ({result.confidence:.0%})")
print(f"Topics: {', '.join(result.topics)}")

This gives you type safety and validation. If the model returns something unexpected, Pydantic throws a clear error instead of silently corrupting your data.

Method 4: Validation and Retry Pattern

For maximum reliability, add a retry loop:

from pydantic import BaseModel, ValidationError
import json
import requests
import time

def get_structured_output(
    prompt: str,
    schema_class: type[BaseModel],
    model: str = "gemma4:26b",
    max_retries: int = 3,
) -> BaseModel:
    schema_str = json.dumps(schema_class.model_json_schema(), indent=2)
    
    for attempt in range(max_retries):
        try:
            response = requests.post("http://localhost:11434/api/chat", json={
                "model": model,
                "messages": [
                    {
                        "role": "system",
                        "content": f"Respond ONLY with JSON matching this schema:\n{schema_str}"
                    },
                    {"role": "user", "content": prompt}
                ],
                "format": "json",
                "stream": False,
                "options": {
                    "temperature": 0.1 if attempt == 0 else 0.3,
                },
            })
            
            raw = json.loads(response.json()["message"]["content"])
            return schema_class.model_validate(raw)
            
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == max_retries - 1:
                raise ValueError(
                    f"Failed to get valid output after {max_retries} attempts: {e}"
                )
            time.sleep(0.5)
    
    raise ValueError("Unreachable")

# Usage
class ProductReview(BaseModel):
    rating: int = Field(ge=1, le=5)
    pros: list[str]
    cons: list[str]
    recommendation: bool

review = get_structured_output(
    "Review: 'Solid laptop, great keyboard, battery could be better. 4/5 would buy again.'",
    ProductReview,
)

Key design choices:

Start with low temperature (0.1) for consistency, increase on retries for variety
Use format: "json" to guarantee valid JSON syntax
Validate with Pydantic for schema correctness
Cap retries at 3 — if it fails 3 times, the prompt probably needs work

Common Failures and Fixes

Model wraps JSON in markdown:

```json
{"key": "value"}
```

Fix: Use format: "json" in Ollama. If that's not available, strip markdown:

def clean_json(text: str) -> str:
    text = text.strip()
    if text.startswith("```"):
        text = text.split("\n", 1)[1]  # Remove first line
        text = text.rsplit("```", 1)[0]  # Remove last ```
    return text.strip()

Model adds extra fields:

The model might return fields you didn't ask for. Pydantic handles this — by default it ignores extra fields. Or set model_config = ConfigDict(extra="forbid") to reject them.

Model uses wrong types:

Sometimes the model returns "0.92" (string) instead of 0.92 (number). Pydantic's model_validate handles most type coercion automatically.

Empty or null fields:

Make fields optional when they might be empty:

class Result(BaseModel):
    name: str
    email: str | None = None  # Model might not find an email
    topics: list[str] = []    # Default to empty list

Nested objects:

Gemma 4 handles nested JSON well, but keep nesting to 2-3 levels max:

class Address(BaseModel):
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # One level of nesting — fine

Performance Tips

Lower temperature (0.1-0.3) produces more consistent JSON
Shorter schemas get better compliance — don't ask for 20 fields at once
Few-shot examples in the system prompt dramatically improve reliability
The 26B model is significantly better at JSON than E4B — see model comparison
Thinking mode helps with complex schemas — see thinking mode guide