← Essays
§ Essay

Pydantic vs dataclasses: when the runtime cost earns its keep

Python dataclasses vanish at runtime in the sense that matters — they are just type annotations plus an auto-generated __init__. Nothing inspects the values you pass in. Pydantic models do not vanish. They validate, coerce, and can raise, every time you call model_validate(). That is not a flaw; it is the whole point. But it is a cost, and the question is whether the cost earns its keep in your specific code path.

Pydantic v2 is roughly 1.5 MB installed, with the hot validation path written in Rust (pydantic-core). Each model you define adds schema metadata, and model_validate() spends measurable CPU — single-digit microseconds for simple models, tens to hundreds for nested ones with lists. On a FastAPI server handling a thousand requests a second, that is one to ten milliseconds per second of pure validation CPU. Effectively free. On a hot path processing a million records a minute, the cost starts to matter, and you start measuring.

None of that is catastrophic. But it is not free either. For code paths where you control both ends — a helper function inside your app that takes a known internal shape — you are paying for insurance on a risk that does not exist.

Runtime validation is insurance. Like insurance, the cost is visible and the upside happens only when you needed it.

— the house style

Where are your trust boundaries?

The question “when does Pydantic earn it” reduces to one about trust boundaries. Here is a map of common positions.

Definitely worth it — network boundaries

Where an external system’s contract can drift without warning. Pydantic catches the drift at the parse site, not ten frames deep in business logic.

from pydantic import BaseModel
from typing import Optional

class Customer(BaseModel):
  id: str
  email: Optional[str] = None

# FastAPI wires model_validate into the route — you don't call it yourself
@app.post("/webhook")
def webhook(customer: Customer):
  process(customer)

Stripe rarely breaks a contract, but their smaller peers do it all the time. When the shape shifts, FastAPI returns a 422 with a useful message, and the error lands at the boundary instead of three async calls later.

Usually worth it — user input

Form bodies, query params, config files, environment variables, LLM outputs.

from pydantic import Field
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
  database_url: str
  openai_key: str = Field(..., pattern=r"^sk-")
  max_workers: int = 4

settings = Settings()  # reads env, validates on startup

The Field(pattern=...) check and the string-to-int coercion on max_workers earn Pydantic’s place by themselves. The one-line boundary beats a hand-rolled config loader.

LLM outputs are the new frontier here. You ask GPT for JSON that matches a schema, it mostly complies, and Pydantic tells you precisely when it didn’t — usually the same field, the same way, so you know what to retry.

Not worth it — internal utilities

Where the caller’s type checker already guarantees the shape. Do not do this:

# The input comes from your own code. mypy has already checked.
class GetUserInput(BaseModel):
  user_id: str

def get_user(input: GetUserInput): ...

# Do this instead.
from dataclasses import dataclass

@dataclass
class GetUserInput:
  user_id: str

def get_user(input: GetUserInput): ...

Every Pydantic model you add to a purely internal path is runtime cost paying for a compile-time problem mypy already solved.

Depends — everything else

Database results, third-party SDK returns, files on disk. If your ORM’s types are trustworthy — SQLAlchemy 2.0 with modern typing, SQLModel, or a generated client — no Pydantic needed at the query site. If the ORM returns Any or stale shapes, validate at the boundary. Same for a TOML or JSON config file: parse it, then validate it.

Two patterns that make Pydantic cheaper

Use TypeAdapter for one-off validation. When you need to validate a payload without defining a full BaseModel hierarchy:

from pydantic import TypeAdapter

Customers = TypeAdapter(list[Customer])
data = Customers.validate_json(response.text)

One instantiation, reusable, no per-call BaseModel construction overhead.

Validate at the edge, not inline. One call at the boundary beats ten scattered through the call graph:

# validate once, then trust the type
customer = Customer.model_validate(response_json)
render_profile(customer)
sync_to_cache(customer)
log_activity(customer)

# don't re-validate at every function

This pattern matters more in Python than in TypeScript. In TS, re-typing a value is free at runtime — the as cast compiles away. In Python, every extra model_validate() call is a real function invocation through pydantic-core.

Alternatives worth knowing

  • attrs — older than dataclasses, still actively maintained. More features (slots, factory defaults, lazy validators). No built-in validation unless you opt in with attr.validators. For people who liked pre-Pydantic-v2 ergonomics.
  • msgspec — faster than Pydantic on hot paths. Smaller API, stricter by default, also Rust-backed. Worth the switch when you process streams of millions of records and the microseconds add up.
  • marshmallow — older, more explicit. Separates the schema from the domain class, which some large codebases still prefer. Awkward next to modern type hints.
  • Just dataclasses + type hints — honestly fine for most internal code. Pyright or mypy with strict mode catches what you need, and nothing runs at import time.

The dataclass output from this tool is the right output when you do not need runtime validation. The Pydantic output is the right output when you do. Picking correctly is a code-path-by-code-path question, not a project-wide decision. Both can live in the same file. Pick Pydantic where the wild data enters, pick dataclasses where you are shuffling shapes your own code already built.