Guide
Production checklist for agent + database systems
The 30 things to verify before letting your agent talk to a real database. Compiled from incidents, audits, and three years of shipping.
Dipankar Sarkar · ·Updated April 15, 2026 productionchecklistoperationssoc2
If you’re about to ship an agent that touches your production database, run through this list. Each item came from a real incident or a real audit finding, in our work or a customer’s.
Identity & isolation
- Tenant ID comes from the authenticated session, not the agent’s input. Verified by code review.
-
tenant_scope()is set in the policy. Verified by inspecting the policy file. - Cross-tenant test exists in CI. Asserts that an agent acting on tenant A cannot see tenant B’s data, even when forging
tenant_idin the where clause. - Multi-tenant join targets are denormalized with their own
tenant_id. Or you have an explicit theory of why they don’t need it. - Database row-level security is enabled on the highest-stakes tables (compliance-heavy data).
- Admin / cross-tenant operations live behind a separate policy and a separate audit channel.
Reads & redaction
- Wildcards for secrets (
*password*,*secret*,*token*,*api_key*) are indeny_fields. - PII columns are masked, not denied — unless you have a specific reason to deny.
- Free-text fields that may contain PII are denied or wrapped in a domain tool with sanitization.
-
describe_schema()returns only what policy allows. Verified by manual call. - No tool returns raw passwords / tokens / secrets in any code path. Verified by grep + test.
- Joined responses respect redaction. Test with
includeparameter.
Writes & approvals
-
require_reason=Trueon every write-enabled model. -
writable_fieldsconstrains updates to the actual fields the agent should be able to mutate. - Approval gates exist for high-stakes writes (price changes, role assignments, large refunds, etc.).
- Approval queue has a documented SLA and an auto-deny on timeout.
- Approval identity is logged with the write.
- Dry-run is exposed as a separate tool for any write that affects > 10 rows.
-
max_writes_per_minuteis set conservatively (≤ 20 unless you have measured otherwise).
Budgets
-
max_scan_rowsis set. This is the single most important budget. -
statement_timeout_msis set — both at the DB session level and via OrmAI policy. -
max_rowsis set per tool. -
max_join_depthis set (≤ 3). - Per-tenant quotas exist if you serve multiple tenants from one process.
- Budget store is Redis-backed if you run more than one app instance.
- Budget exceeded errors return a structured suggestion the agent can recover from.
Audit
- Audit sink is configured (SQL, JSON log, or event stream).
- Audit DB user has INSERT-only privileges (and DELETE only for retention).
- Audit rows include trace IDs linked to your observability stack.
- Hash-chain or append-only retention for the long-term audit copy.
- Saved queries are documented for the security team’s most common questions.
- Retention policy is set deliberately (90 days hot / 1 year warm / 7 years cold by default).
- Sanitized inputs are logged, with a hash of raw inputs separately for forgery detection.
Operational
-
DEFAULT_PRODpolicy is the base for production policies, notDEFAULT_DEV. - Policy lives in version control with PR review.
- Policy has a regression test suite that asserts which calls succeed and which fail.
- Health check endpoint exposes OrmAI version, policy hash, audit sink status.
- OrmAI version is pinned to a known-good release.
- Rate-limiting is enabled in front of the agent endpoint (per IP, per session).
- Structured logs flow to your observability stack.
- Alerts are wired for: audit sink failures, policy denial spikes, write rate above baseline, statement timeouts.
LLM-side hygiene
- System prompt instructs the agent to handle structured policy errors (
scan_budget_exceeded,tenant_mismatch, etc.). -
maxSteps/max_iterationsis set on the agent loop (≤ 12). - Tool list given to the model is the actual OrmAI-generated list, not a hand-curated subset.
- The model is not told the tenant ID in the prompt. It comes from context only.
- The agent’s natural-language output is logged separately from the tool audit, with its own scrubbing.
Compliance specifics
- The policy file is the artifact for “what can the agent see/do?” — and it can be handed to an auditor as-is.
- Audit log queries for the common SOC 2 / ISO controls are documented as views.
- Data subject access request (GDPR) can be answered using the audit log: “every operation involving subject X.”
- Data deletion propagates to audit retention rules where required.
- Vendors with data access are listed in your DPA — including OrmAI (which is open source and runs in-process, so probably no DPA needed, but verify with your legal team).
Pre-launch verification
Run these in a staging environment with realistic data:
- Cross-tenant probe. From an agent acting as tenant A, try every tool with
tenant_id: Bin arguments. Every one should return tenant A’s data. - PII probe. Query every model with
db.get. Verify masked / denied fields are masked / absent. - Budget probe. Issue an unbounded query. Confirm the error is structured.
- Write probe. Try every write tool without
reason. Confirm denial. - Approval probe. Trigger an approval-gated write. Confirm it enters pending state.
- Audit probe. Confirm every tool call wrote an audit row. Confirm denied calls also wrote one.
Quarterly review
- Re-read the policy file. Does it still match the agent’s intended capabilities?
- Look at the audit log for the last 90 days. What’s the most common denial? Is the policy too tight or is the agent misbehaving?
- Look at the longest-running tool calls. Are they the ones you’d expect?
- Look at the write log. Does every write have a coherent reason?
- Run the regression suite.
Related
Found a typo or want to suggest a topic? Email [email protected].