OrmAI vs. hand-rolled tools

Writing one bespoke function per agent capability is safer than raw SQL, and far easier to ship than people admit. Here's where it works, and where it falls apart at scale.

If you’ve shipped agentic features, you’ve almost certainly built hand-rolled tools: get_orders_for_customer(customer_id), update_subscription_status(sub_id, status). This is the right place to start. It is also the wrong place to stop.

Why hand-rolled tools feel right

The shape is intuitive. Each tool is a function in your codebase. Each one:

Has a clear name.
Has a typed signature.
Lives in a file your code reviewer looks at.
Calls your existing ORM with parameters you wrote.

Compared to “the agent writes SQL,” this is dramatically safer. There’s no string interpretation. The function does what it says. You can unit test it. You can check who has authority to call it.

For a small number of operations on a small surface — a chatbot that does five things — hand-rolled tools are the right answer. Use OrmAI if and when you need more than what they give you.

Where hand-rolled tools start hurting

The pain shows up around tool count 10–15. Here is the catalog of failure modes from teams we’ve worked with.

1. Tenant filtering is re-derived in every tool

def get_orders_for_customer(customer_id: int, tenant_id: int):
    return db.query(Order).filter_by(customer_id=customer_id, tenant_id=tenant_id).all()

def get_invoices_for_customer(customer_id: int, tenant_id: int):
    return db.query(Invoice).filter_by(customer_id=customer_id, tenant_id=tenant_id).all()

Every tool repeats the same tenant_id plumbing. Tenant ID has to be passed in by the agent — which means the agent could lie. You probably wrap with current_tenant() from a context. Now you have two sources of truth: the parameter the agent passed, and the context. Which one is authoritative? When they disagree, what happens? In practice, some of your tools will get this subtly wrong.

2. Pagination contracts diverge

def list_orders(...) -> list[Order]: ...
def list_invoices(..., page: int = 1, per_page: int = 20) -> dict: ...
def list_customers(..., cursor: str = None, limit: int = 50) -> dict: ...

Three pagination styles in one codebase. The agent has to learn each one separately. The next engineer adds a fourth style. Six months later you can’t change pagination without breaking the agent’s mental model.

3. Field redaction is per-tool, manually

You decide that customer email shouldn’t be returned when an agent queries customers. So you write:

def get_customer(id: int):
    c = db.query(Customer).get(id)
    return {**c.__dict__, "email": redact(c.email)}

Then someone adds list_customers, forgets to redact, and now the field is exposed through one tool. Then someone adds search_customers_by_email. Now your “redact email” rule has to be enforced in seven places. Production data leak: a matter of when.

4. Audit logging is best-effort

Each tool does its own logging. Some log inputs and outputs. Some log just the call. Some log nothing. Some log to the application log; some log to a dedicated table. When the security team asks “show me everything the agent did for tenant 42 yesterday,” you can’t answer it from one place.

5. Write gating is a maze

Reads are easy. Writes need approval, reasons, row caps, daily quotas. Each gets bolted onto each write tool, ad hoc. By the time you’re at 20 tools, the write-gating logic is tangled enough that you can’t tell, by inspection, what the agent is actually allowed to mutate.

6. Adding a new “horizontal” capability touches every tool

Now you want:

A query budget per tenant per hour.
An “auditor mode” where every read also dumps the row count to a metrics endpoint.
A “dry-run mode” for staging.
A new redaction rule because Compliance just asked for one.

Each of these is a change to every tool. There’s no central knob.

How OrmAI changes the shape

OrmAI inverts the abstraction. Instead of one function per capability, you declare:

The set of models the agent can see (e.g. Order, Customer, Invoice).
The fields visible per model, with their visibility mode (full, masked, hashed, denied).
Which models can be written, with what gates.
Per-tenant budgets, timeouts, scopes.

OrmAI then exposes a small generic toolbox to the agent (db.query, db.get, db.aggregate, db.create, db.update, db.delete) that compiles each call against the policy.

policy = (
    PolicyBuilder(DEFAULT_PROD)
    .register_models([Order, Customer, Invoice])
    .deny_fields("*password*", "*token*")
    .mask_fields(["customer.email", "customer.phone"])
    .tenant_scope("tenant_id")
    .enable_writes(models=["Order", "Invoice"], require_reason=True)
    .max_rows(500)
    .max_writes_per_minute(20)
    .build()
)

toolset = mount_sqlalchemy(engine=engine, session_factory=Session, policy=policy)

That’s the entire substrate. The agent gets six tools instead of fifty. The horizontal concerns (audit, budgets, redaction) live in the policy, not in each function.

When hand-rolled tools are still the right answer

This isn’t a religious argument. Hand-rolled tools win when:

The tool encapsulates real domain logic, not just a parameterized query. compute_renewal_quote(account) should be a hand-rolled function — and OrmAI lets you register it as a domain tool that gets the same policy/audit treatment as the generic ones.
The agent surface is genuinely tiny (3–4 operations) and unlikely to grow.
The operations are not parameterized by the agent’s input (e.g. “always summarize today’s signups” — no parameters, no policy needed).

The right pattern in practice is OrmAI for the generic data access and hand-rolled domain tools for the things that aren’t generic. OrmAI handles registration, policy, and audit for both kinds.

A real example: 23 tools → 6 tools + a policy

A customer-facing analytics agent for a vertical SaaS company. Original implementation: 23 hand-rolled tools across orders, customers, products, sessions. After OrmAI:

6 generic tools.
4 domain tools (cohort analyses that needed real Python).
One 80-line policy file.
A central audit table that the security team queries.
Cross-tenant isolation enforced once.

Lines of code in the agent’s data layer dropped from ~1,800 to ~340. SOC 2 control evidence went from “screenshots of code review” to “the policy file plus log queries.” The agent answered more questions, not fewer, because the generic tools gave it more flexibility within safe bounds.

Migration recipe

Inventory your existing tools. Group them by model.
For each group, identify which calls are pure parameterized reads (most of them). Those collapse into db.query/db.get/db.aggregate.
For each write tool, decide: does it need a reason field? Two-person approval? A row cap? Encode that in policy.
Identify the genuinely non-trivial tools (the domain logic). Keep them. Register them with OrmAI so they share audit and policy.
Cut the generic CRUD tools. Update the agent’s tool list.

Most teams keep 20–40% of their original tools as domain tools. The rest become thin policy entries.