Pydantic vs dataclasses: when the runtime cost earns its keep
Python dataclasses vanish at runtime in the sense that matters — they
are just type annotations plus an auto-generated __init__.
Nothing inspects the values you pass in. Pydantic models do not vanish.
They validate, coerce, and can raise, every time you call
model_validate(). That is not a flaw; it is the whole point.
But it is a cost, and the question is whether the cost earns its keep in
your specific code path.
Pydantic v2 is roughly 1.5 MB installed, with the hot validation path
written in Rust (pydantic-core). Each model you define adds
schema metadata, and model_validate() spends measurable CPU
— single-digit microseconds for simple models, tens to hundreds for
nested ones with lists. On a FastAPI server handling a thousand requests a
second, that is one to ten milliseconds per second of pure validation CPU.
Effectively free. On a hot path processing a million records a minute, the
cost starts to matter, and you start measuring.
None of that is catastrophic. But it is not free either. For code paths where you control both ends — a helper function inside your app that takes a known internal shape — you are paying for insurance on a risk that does not exist.
Runtime validation is insurance. Like insurance, the cost is visible and the upside happens only when you needed it.
— the house styleWhere are your trust boundaries?
The question “when does Pydantic earn it” reduces to one about trust boundaries. Here is a map of common positions.
Definitely worth it — network boundaries
Where an external system’s contract can drift without warning. Pydantic catches the drift at the parse site, not ten frames deep in business logic.
from pydantic import BaseModel
from typing import Optional
class Customer(BaseModel):
id: str
email: Optional[str] = None
# FastAPI wires model_validate into the route — you don't call it yourself
@app.post("/webhook")
def webhook(customer: Customer):
process(customer) Stripe rarely breaks a contract, but their smaller peers do it all the time. When the shape shifts, FastAPI returns a 422 with a useful message, and the error lands at the boundary instead of three async calls later.
Usually worth it — user input
Form bodies, query params, config files, environment variables, LLM outputs.
from pydantic import Field
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
openai_key: str = Field(..., pattern=r"^sk-")
max_workers: int = 4
settings = Settings() # reads env, validates on startup
The Field(pattern=...) check and the string-to-int coercion on
max_workers earn Pydantic’s place by themselves. The
one-line boundary beats a hand-rolled config loader.
LLM outputs are the new frontier here. You ask GPT for JSON that matches a schema, it mostly complies, and Pydantic tells you precisely when it didn’t — usually the same field, the same way, so you know what to retry.
Not worth it — internal utilities
Where the caller’s type checker already guarantees the shape. Do not do this:
# The input comes from your own code. mypy has already checked.
class GetUserInput(BaseModel):
user_id: str
def get_user(input: GetUserInput): ...
# Do this instead.
from dataclasses import dataclass
@dataclass
class GetUserInput:
user_id: str
def get_user(input: GetUserInput): ... Every Pydantic model you add to a purely internal path is runtime cost paying for a compile-time problem mypy already solved.
Depends — everything else
Database results, third-party SDK returns, files on disk. If your
ORM’s types are trustworthy — SQLAlchemy 2.0 with modern typing,
SQLModel, or a generated client — no Pydantic needed at the query
site. If the ORM returns Any or stale shapes, validate at the
boundary. Same for a TOML or JSON config file: parse it, then validate it.
Two patterns that make Pydantic cheaper
Use TypeAdapter for one-off validation. When
you need to validate a payload without defining a full
BaseModel hierarchy:
from pydantic import TypeAdapter
Customers = TypeAdapter(list[Customer])
data = Customers.validate_json(response.text) One instantiation, reusable, no per-call BaseModel construction overhead.
Validate at the edge, not inline. One call at the boundary beats ten scattered through the call graph:
# validate once, then trust the type
customer = Customer.model_validate(response_json)
render_profile(customer)
sync_to_cache(customer)
log_activity(customer)
# don't re-validate at every function
This pattern matters more in Python than in TypeScript. In TS, re-typing a
value is free at runtime — the as cast compiles away. In
Python, every extra model_validate() call is a real function
invocation through pydantic-core.
Alternatives worth knowing
- attrs — older than dataclasses, still actively
maintained. More features (slots, factory defaults, lazy validators).
No built-in validation unless you opt in with
attr.validators. For people who liked pre-Pydantic-v2 ergonomics. - msgspec — faster than Pydantic on hot paths. Smaller API, stricter by default, also Rust-backed. Worth the switch when you process streams of millions of records and the microseconds add up.
- marshmallow — older, more explicit. Separates the schema from the domain class, which some large codebases still prefer. Awkward next to modern type hints.
- Just dataclasses + type hints — honestly fine for most internal code. Pyright or mypy with strict mode catches what you need, and nothing runs at import time.
The dataclass output from this tool is the right output when you do not need runtime validation. The Pydantic output is the right output when you do. Picking correctly is a code-path-by-code-path question, not a project-wide decision. Both can live in the same file. Pick Pydantic where the wild data enters, pick dataclasses where you are shuffling shapes your own code already built.