Query budgets and runaway agents

Bound how much your agent can scan, return, mutate, and spend per minute. The patterns that keep one curious prompt from melting your database.

The second-most-common production incident with agentic systems (after cross-tenant leaks) is a runaway query. The agent decides “let me scan everything” and your database gets locked out. This guide is about bounding the worst case.

Six budgets you almost certainly need

OrmAI exposes six knobs. Set all of them in production.

Budget	Default	What it bounds
`max_rows`	100	Rows returned by any single read
`max_scan_rows`	1,000,000	Rows the planner is allowed to examine
`statement_timeout_ms`	5,000	DB statement timeout per call
`max_writes_per_minute`	60	Mutations per principal per minute
`max_aggregations_per_minute`	30	Heavy reads per principal per minute
`max_join_depth`	3	How deeply joins can chain

policy = (
    PolicyBuilder(DEFAULT_PROD)
    .max_rows(200)
    .max_scan_rows(500_000)
    .statement_timeout_ms(3_000)
    .max_writes_per_minute(20)
    .max_aggregations_per_minute(10)
    .max_join_depth(2)
    .build()
)

These defaults are conservative on purpose. Loosen them as you measure real load.

Per-tool overrides

Some tools genuinely need higher limits. Override per-tool:

policy = ...max_rows(200) \
    .override("db.aggregate", max_rows=10_000, statement_timeout_ms=10_000) \
    .override("admin.export_orders", max_rows=100_000, statement_timeout_ms=60_000)

Each override is logged distinctly so the audit log can answer “which calls used the elevated budget?”

Per-tenant budgets

For multi-tenant systems, the per-principal budgets aren’t enough — one chatty tenant can starve everyone else. OrmAI supports per-tenant quotas:

.max_writes_per_minute_per_tenant(20)
.max_aggregations_per_hour_per_tenant(120)

These use a token-bucket implemented either in-memory (single process) or in Redis (cluster). Wire Redis once:

from ormai.budgets import RedisBudgetStore

policy = ...budget_store(RedisBudgetStore(url="redis://...")) ...

Compile-time vs. run-time enforcement

OrmAI enforces budgets in two places:

Compile-time. Before sending the SQL, OrmAI inspects the planned query (using the database’s EXPLAIN) and rejects calls whose estimated row scan exceeds max_scan_rows. The DB never runs the query.
Run-time. The DB-level statement_timeout and the application-level row limit are belt-and-suspenders. If the planner underestimated, the timeout still fires.

This matters because EXPLAIN is cheap. Catching the runaway before it executes is the difference between “user gets a polite error” and “your DB hangs for 90 seconds.”

What happens when a budget is exceeded

The tool call returns a structured error the agent can recover from:

{
  "error": "scan_budget_exceeded",
  "budget": "max_scan_rows",
  "limit": 1000000,
  "estimated": 8421000,
  "suggestion": "narrow the where clause or add a date filter"
}

This shape matters: the agent often can respond to the suggestion (“oh, let me add a date filter”). It can’t respond to a 504 from your gateway after the DB locked up.

Tell your model about this in your system prompt:

If a tool call returns error="scan_budget_exceeded", narrow your query
(add a date range, a tenant scope, a more specific filter) and try again.

Approval-gated burst budgets

Some operations are legitimately expensive but rare. Don’t make the policy permissive just for those — gate them with approval.

.enable_writes(models=["Order"])
.require_approval(
  tool="db.update",
  model="Order",
  when=lambda call: call.affected_rows > 100,
)

Now any update that would touch more than 100 rows enters an “awaiting approval” state. Your application surface (a Slack message, a queue, a UI) collects a human approval and resumes the call. Until approved, the change is held. After approval, it executes.

This pattern lets you keep tight budgets for the 99% case and a controlled escape hatch for the 1%.

Cost budgets (optional)

For LLM applications, the agent’s tool calls themselves cost money (DB time, API calls). OrmAI can attach a per-call cost estimate to each row in the audit log:

.cost_estimator(lambda call: call.scanned_rows * 0.0000001)  # $/row
.max_cost_per_tenant_per_day(5.00)  # USD

When a tenant exceeds their daily budget, subsequent calls return a budget_exhausted error. Useful for usage-based pricing or for stopping a misbehaving customer from running up your bill.

Choosing budgets

Start with these defaults, then loosen one at a time as needed:

max_rows: 100 — almost no agent answer benefits from > 100 rows.
max_scan_rows: 500_000 — caught most unbounded scans in our benchmarks.
statement_timeout_ms: 3_000 — long enough for indexed queries, short enough that the DB is responsive.
max_writes_per_minute: 20 — almost no legitimate agent workflow exceeds this.
max_aggregations_per_minute: 10 — aggregates are by far the most expensive reads.
max_join_depth: 2 — deeper joins are usually the agent flailing.

Run for a week, look at the audit log for budget_exceeded rows, and adjust.

Common mistakes

No max_scan_rows. This is the one that matters. Without it, the agent can issue an EXPLAIN-cheap query whose actual cost is ruinous.
Setting statement_timeout only. Timeouts kill queries after they’ve started consuming resources. Compile-time scan rejection is much cheaper.
Per-process budgets in a multi-process deployment. Use Redis-backed budgets if you have more than one app instance.
Generous burst budgets in dev that get copied to prod. Use DEFAULT_DEV and DEFAULT_PROD as starting templates; they have different defaults for a reason.