Production checklist for agent + database systems

The 30 things to verify before letting your agent talk to a real database. Compiled from incidents, audits, and three years of shipping.

If you’re about to ship an agent that touches your production database, run through this list. Each item came from a real incident or a real audit finding, in our work or a customer’s.

Identity & isolation

Tenant ID comes from the authenticated session, not the agent’s input. Verified by code review.
tenant_scope() is set in the policy. Verified by inspecting the policy file.
Cross-tenant test exists in CI. Asserts that an agent acting on tenant A cannot see tenant B’s data, even when forging tenant_id in the where clause.
Multi-tenant join targets are denormalized with their own tenant_id. Or you have an explicit theory of why they don’t need it.
Database row-level security is enabled on the highest-stakes tables (compliance-heavy data).
Admin / cross-tenant operations live behind a separate policy and a separate audit channel.

Reads & redaction

Wildcards for secrets (*password*, *secret*, *token*, *api_key*) are in deny_fields.
PII columns are masked, not denied — unless you have a specific reason to deny.
Free-text fields that may contain PII are denied or wrapped in a domain tool with sanitization.
describe_schema() returns only what policy allows. Verified by manual call.
No tool returns raw passwords / tokens / secrets in any code path. Verified by grep + test.
Joined responses respect redaction. Test with include parameter.

Writes & approvals

require_reason=True on every write-enabled model.
writable_fields constrains updates to the actual fields the agent should be able to mutate.
Approval gates exist for high-stakes writes (price changes, role assignments, large refunds, etc.).
Approval queue has a documented SLA and an auto-deny on timeout.
Approval identity is logged with the write.
Dry-run is exposed as a separate tool for any write that affects > 10 rows.
max_writes_per_minute is set conservatively (≤ 20 unless you have measured otherwise).

Budgets

max_scan_rows is set. This is the single most important budget.
statement_timeout_ms is set — both at the DB session level and via OrmAI policy.
max_rows is set per tool.
max_join_depth is set (≤ 3).
Per-tenant quotas exist if you serve multiple tenants from one process.
Budget store is Redis-backed if you run more than one app instance.
Budget exceeded errors return a structured suggestion the agent can recover from.

Audit

Audit sink is configured (SQL, JSON log, or event stream).
Audit DB user has INSERT-only privileges (and DELETE only for retention).
Audit rows include trace IDs linked to your observability stack.
Hash-chain or append-only retention for the long-term audit copy.
Saved queries are documented for the security team’s most common questions.
Retention policy is set deliberately (90 days hot / 1 year warm / 7 years cold by default).
Sanitized inputs are logged, with a hash of raw inputs separately for forgery detection.

Operational

DEFAULT_PROD policy is the base for production policies, not DEFAULT_DEV.
Policy lives in version control with PR review.
Policy has a regression test suite that asserts which calls succeed and which fail.
Health check endpoint exposes OrmAI version, policy hash, audit sink status.
OrmAI version is pinned to a known-good release.
Rate-limiting is enabled in front of the agent endpoint (per IP, per session).
Structured logs flow to your observability stack.
Alerts are wired for: audit sink failures, policy denial spikes, write rate above baseline, statement timeouts.

LLM-side hygiene

System prompt instructs the agent to handle structured policy errors (scan_budget_exceeded, tenant_mismatch, etc.).
maxSteps / max_iterations is set on the agent loop (≤ 12).
Tool list given to the model is the actual OrmAI-generated list, not a hand-curated subset.
The model is not told the tenant ID in the prompt. It comes from context only.
The agent’s natural-language output is logged separately from the tool audit, with its own scrubbing.

Compliance specifics

The policy file is the artifact for “what can the agent see/do?” — and it can be handed to an auditor as-is.
Audit log queries for the common SOC 2 / ISO controls are documented as views.
Data subject access request (GDPR) can be answered using the audit log: “every operation involving subject X.”
Data deletion propagates to audit retention rules where required.
Vendors with data access are listed in your DPA — including OrmAI (which is open source and runs in-process, so probably no DPA needed, but verify with your legal team).

Pre-launch verification

Run these in a staging environment with realistic data:

Cross-tenant probe. From an agent acting as tenant A, try every tool with tenant_id: B in arguments. Every one should return tenant A’s data.
PII probe. Query every model with db.get. Verify masked / denied fields are masked / absent.
Budget probe. Issue an unbounded query. Confirm the error is structured.
Write probe. Try every write tool without reason. Confirm denial.
Approval probe. Trigger an approval-gated write. Confirm it enters pending state.
Audit probe. Confirm every tool call wrote an audit row. Confirm denied calls also wrote one.

Quarterly review

Re-read the policy file. Does it still match the agent’s intended capabilities?
Look at the audit log for the last 90 days. What’s the most common denial? Is the policy too tight or is the agent misbehaving?
Look at the longest-running tool calls. Are they the ones you’d expect?
Look at the write log. Does every write have a coherent reason?
Run the regression suite.