Skip to content

How LLMs generate structured outputs

Published:

Ol­lama has re­cently added sup­port for struc­tured out­puts. But how ex­actly does LLMs gen­er­ate struc­tured out­puts?

I have used prompt­ing tricks such as “an­swer with a JSON only” or show­ing ex­am­ples, but they are not very re­li­able. Ol­lama can very re­li­ably pro­duce struc­tured out­puts, ei­ther a valid JSON or any JSON-​defined schema. Granted, some­times an un­ex­pected EOF error hap­pens, but mostly with more com­plex schemas and small mod­els (e.g. Llama 3.2 3B).

It’s pos­si­ble to cre­ate schemas using py­dan­tic:

from ollama import chat
from pydantic import BaseModel


class Joke(BaseModel):
    id: int
    setup: str
    punchline: str
    category: str | None = None
    tags: list[str] | None


response = chat(
    messages=[
        {
            "role": "user",
            "content": "Tell me a funny joke",
        }
    ],
    model="deepseek-r1",
    format=Joke.model_json_schema(),
)

country = Joke.model_validate_json(response.message.content)
print(country.model_dump_json(indent=2))

Out­put:

{
  "id": 1,
  "setup": "Why did the chicken cross the road?",
  "punchline": "To get to the other side!",
  "category": null,
  "tags": ["chicken", "road", "joke"]
}

Gram­mars

Ol­lama uses llama.cpp under the hood to run LLMs. Struc­tured out­puts, such as a valid JSON, are pos­si­ble using con­strained de­cod­ing. This works by mod­i­fy­ing how next to­kens are se­lected: the model is only able to choose to­kens that do not vi­o­late the gram­mar rules.

We can use the llama_cpp Python li­brary to con­vert the JSON schema above to a gram­mar:

category ::= string | null
category-kv ::= "\"category\"" space ":" space category
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
id-kv ::= "\"id\"" space ":" space integer
integer ::= ("-"? integral-part) space
integral-part ::= [0-9] | [1-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9] ([0-9])?)?)?)?)?)
?)?)?)?)?)?)?)?)?)?
null ::= "null" space
punchline-kv ::= "\"punchline\"" space ":" space string
root ::= "{" space id-kv "," space setup-kv "," space punchline-kv "," space tags-kv ( "," space ( category-kv ) )? "}" space
setup-kv ::= "\"setup\"" space ":" space string
space ::= " "?
string ::= "\"" char* "\"" space
tags ::= tags-0 | null
tags-0 ::= "[" space (string ("," space string)*)? "]" space
tags-kv ::= "\"tags\"" space ":" space tags

How­ever, we are not lim­ited to JSON out­puts. For in­stance, if we want to rate movies, we can cre­ate a gram­mar that al­lows only valid rat­ings as out­puts:

from llama_cpp.llama import Llama, LlamaGrammar

grammar = LlamaGrammar.from_string(
    """
    root ::= "5.0" | leading "." trailing
    leading ::= [0-4]
    trailing ::= [0-9]
    """
)


llm = Llama.from_pretrained(
    repo_id="MaziyarPanahi/Llama-3.2-3B-Instruct-GGUF",
    filename="Llama-3.2-3B-Instruct.Q8_0.gguf",
)

response = llm("Rate the movie Dune: Part Two (2024)", grammar=grammar, max_tokens=-1)

print(response["choices"][0]["text"])

Out­put:

3.5

This bridges an im­por­tant gap in LLM usage in gen­eral by re­li­ably gen­er­at­ing out­put in a struc­tured for­mat. The un­struc­tured strat­egy of pars­ing strings with any con­tent is error-​prone. Not only that, but there are claims that struc­tured out­puts with con­strained de­cod­ing out­per­forms un­struc­tured out­puts in some tasks.



Previous Notes
Song lyrics and LLM poetry
Next Notes
Things we learned about LLMs in 2024