Guide
Query budgets and runaway agents
Bound how much your agent can scan, return, mutate, and spend per minute. The patterns that keep one curious prompt from melting your database.
The second-most-common production incident with agentic systems (after cross-tenant leaks) is a runaway query. The agent decides “let me scan everything” and your database gets locked out. This guide is about bounding the worst case.
Six budgets you almost certainly need
OrmAI exposes six knobs. Set all of them in production.
| Budget | Default | What it bounds |
|---|---|---|
max_rows | 100 | Rows returned by any single read |
max_scan_rows | 1,000,000 | Rows the planner is allowed to examine |
statement_timeout_ms | 5,000 | DB statement timeout per call |
max_writes_per_minute | 60 | Mutations per principal per minute |
max_aggregations_per_minute | 30 | Heavy reads per principal per minute |
max_join_depth | 3 | How deeply joins can chain |
policy = (
PolicyBuilder(DEFAULT_PROD)
.max_rows(200)
.max_scan_rows(500_000)
.statement_timeout_ms(3_000)
.max_writes_per_minute(20)
.max_aggregations_per_minute(10)
.max_join_depth(2)
.build()
)
These defaults are conservative on purpose. Loosen them as you measure real load.
Per-tool overrides
Some tools genuinely need higher limits. Override per-tool:
policy = ...max_rows(200) \
.override("db.aggregate", max_rows=10_000, statement_timeout_ms=10_000) \
.override("admin.export_orders", max_rows=100_000, statement_timeout_ms=60_000)
Each override is logged distinctly so the audit log can answer “which calls used the elevated budget?”
Per-tenant budgets
For multi-tenant systems, the per-principal budgets aren’t enough — one chatty tenant can starve everyone else. OrmAI supports per-tenant quotas:
.max_writes_per_minute_per_tenant(20)
.max_aggregations_per_hour_per_tenant(120)
These use a token-bucket implemented either in-memory (single process) or in Redis (cluster). Wire Redis once:
from ormai.budgets import RedisBudgetStore
policy = ...budget_store(RedisBudgetStore(url="redis://...")) ...
Compile-time vs. run-time enforcement
OrmAI enforces budgets in two places:
- Compile-time. Before sending the SQL, OrmAI inspects the planned query (using the database’s
EXPLAIN) and rejects calls whose estimated row scan exceedsmax_scan_rows. The DB never runs the query. - Run-time. The DB-level
statement_timeoutand the application-level row limit are belt-and-suspenders. If the planner underestimated, the timeout still fires.
This matters because EXPLAIN is cheap. Catching the runaway before it executes is the difference between “user gets a polite error” and “your DB hangs for 90 seconds.”
What happens when a budget is exceeded
The tool call returns a structured error the agent can recover from:
{
"error": "scan_budget_exceeded",
"budget": "max_scan_rows",
"limit": 1000000,
"estimated": 8421000,
"suggestion": "narrow the where clause or add a date filter"
}
This shape matters: the agent often can respond to the suggestion (“oh, let me add a date filter”). It can’t respond to a 504 from your gateway after the DB locked up.
Tell your model about this in your system prompt:
If a tool call returns error="scan_budget_exceeded", narrow your query
(add a date range, a tenant scope, a more specific filter) and try again.
Approval-gated burst budgets
Some operations are legitimately expensive but rare. Don’t make the policy permissive just for those — gate them with approval.
.enable_writes(models=["Order"])
.require_approval(
tool="db.update",
model="Order",
when=lambda call: call.affected_rows > 100,
)
Now any update that would touch more than 100 rows enters an “awaiting approval” state. Your application surface (a Slack message, a queue, a UI) collects a human approval and resumes the call. Until approved, the change is held. After approval, it executes.
This pattern lets you keep tight budgets for the 99% case and a controlled escape hatch for the 1%.
Cost budgets (optional)
For LLM applications, the agent’s tool calls themselves cost money (DB time, API calls). OrmAI can attach a per-call cost estimate to each row in the audit log:
.cost_estimator(lambda call: call.scanned_rows * 0.0000001) # $/row
.max_cost_per_tenant_per_day(5.00) # USD
When a tenant exceeds their daily budget, subsequent calls return a budget_exhausted error. Useful for usage-based pricing or for stopping a misbehaving customer from running up your bill.
Choosing budgets
Start with these defaults, then loosen one at a time as needed:
max_rows: 100— almost no agent answer benefits from > 100 rows.max_scan_rows: 500_000— caught most unbounded scans in our benchmarks.statement_timeout_ms: 3_000— long enough for indexed queries, short enough that the DB is responsive.max_writes_per_minute: 20— almost no legitimate agent workflow exceeds this.max_aggregations_per_minute: 10— aggregates are by far the most expensive reads.max_join_depth: 2— deeper joins are usually the agent flailing.
Run for a week, look at the audit log for budget_exceeded rows, and adjust.
Common mistakes
- No
max_scan_rows. This is the one that matters. Without it, the agent can issue anEXPLAIN-cheap query whose actual cost is ruinous. - Setting
statement_timeoutonly. Timeouts kill queries after they’ve started consuming resources. Compile-time scan rejection is much cheaper. - Per-process budgets in a multi-process deployment. Use Redis-backed budgets if you have more than one app instance.
- Generous burst budgets in dev that get copied to prod. Use
DEFAULT_DEVandDEFAULT_PRODas starting templates; they have different defaults for a reason.
Related
Found a typo or want to suggest a topic? Email [email protected].