OrmAI vs. text-to-SQL

Text-to-SQL turns natural language into SQL strings. OrmAI turns natural language into typed tool calls. Here's why that distinction matters in production.

Text-to-SQL is the most natural-feeling answer to “how does an LLM query a database.” Just ask it to write SQL. It works in demos. It is hard to make safe in production. This page is about why, and what the alternative looks like.

What text-to-SQL is, exactly

Text-to-SQL is a pattern, not a single technology. The shape:

The user asks a question in natural language.
The application sends a prompt to an LLM containing the schema (or a curated subset), some examples, and the question.
The LLM emits a SQL string.
The application validates the string and runs it.
The result is sent back to the LLM for summarization.

Step 4 is where everything goes wrong.

The validation problem

You cannot statically prove a SQL string is safe in the strong sense (“returns only what the policy allows”). You can prove it’s well-formed, you can whitelist verbs and tables, you can block dangerous functions — but you cannot generally prove that a SELECT returns only the rows the current user is allowed to see.

The reasons are accumulating:

Subqueries and CTEs. A model can compose a SELECT * FROM users inside a CTE that is then filtered. Your validator has to follow scope through arbitrary nesting.
Joins across boundaries. users JOIN orders is fine for some users and not for others. Static analysis would have to know your tenancy model.
Lateral and recursive queries. Generated by capable models, especially when prompted with examples. They are difficult to validate.
Vendor-specific syntax. Postgres has JSONB operators that bypass column-level intuitions. MySQL has INTO OUTFILE. SQL Server has OPENROWSET.
Function abuse. pg_read_server_files, xpath, query_to_xml — every database has functions that can read data outside the table.

Each of these can be patched. The patch list is unbounded, and the cost of a single miss is unbounded. Worse: every database upgrade adds new SQL features, which means your validator is permanently behind the model.

What does the empirical data say

We took the Spider benchmark — 1,034 natural-language queries across 200 different databases — and ran two systems through it back-to-back.

Text-to-SQL baseline. A strong open-source text-to-SQL model with a “guardrail” prompt instructing it to scope by tenant, never use DROP, and limit results.
OrmAI. The same questions routed through db.query, db.aggregate, and db.get tools, with a policy that scopes by tenant, masks PII, and caps rows.

Result on safety:

Text-to-SQL: 23 unsafe operations. Categories: cross-table joins ignoring scope (11), unbounded scans (8), structurally destructive queries (3), schema introspection of tables not exposed (1).
OrmAI: 0 unsafe operations.

Result on answer quality on the safe subset (where the question could be answered through either interface): equivalent within 2 percentage points. We aren’t trading capability for safety.

Read the full methodology →

The deeper objection: SQL is the wrong contract

Text-to-SQL treats the database as a generic, flat surface. The agent is allowed to address the database the way a DBA would. That contract is enormous and poorly typed.

OrmAI treats the database as a domain. The agent talks to typed nouns it understands — Order, Customer, Invoice — through verbs the system can reason about (query, aggregate, update). The contract is small, typed, and aligned with the application’s domain model rather than its physical storage.

This matters even before security. It means:

The model doesn’t need to know your column naming conventions (orders.cust_id vs. orders.customer_id).
The model can’t accidentally couple to physical schema. You can rename columns without breaking your prompts.
You can change the underlying ORM without retraining anything.
The audit log captures intent (query Order where status=pending), not just SQL.

A side-by-side

Consider: “summarize last month’s revenue by region for the current customer.”

Text-to-SQL output:

SELECT region, SUM(total)
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
  AND tenant_id = 42
GROUP BY region;

This might be correct. A few things to notice:

The tenant_id = 42 is the model’s interpretation of “the current customer.” If the model hallucinates the wrong ID, you have a leak.
There’s no row cap.
No statement timeout.
The audit trail is the SQL string, which doesn’t tell you why the model wrote it.

OrmAI output:

toolset.execute("db.aggregate", {
    "model": "Order",
    "where": {"created_at__gt": "now()-30d"},
    "group_by": ["region"],
    "aggregations": {"total": "sum"},
}, ctx=ctx)  # ctx.tenant_id from authenticated request

tenant_id is not in the call. It comes from the request context, where it was set by your authentication layer. The model has no way to forge it. The compiler injects the scope filter. The audit log records “agent X, on behalf of tenant 42, aggregated Orders by region for last 30 days, returned 14 rows."

"Hybrid” approaches don’t help much

A common middle ground: use text-to-SQL but constrain the model with a sandboxed schema view, RLS at the database, and a syntactic checker. This is better than naked text-to-SQL. It still inherits the validation problem. The RLS is real; the rest is hope.

If your only goal is read-only analyst-style exploration on a redacted warehouse, a heavily fenced text-to-SQL setup may be fine. If you are building a customer-facing agent against your operational database, this isn’t the bar.

Where text-to-SQL still makes sense

Three cases:

Pure read-only analyst tooling on a redacted warehouse. No PII, no tenants, no writes.
Internal developer assistants for query authoring — where a human reads the SQL before running it.
One-off ad-hoc analysis by a single data scientist on their own dataset.

In every other case — multi-tenant SaaS, customer-facing agents, write-capable agents, agents over operational databases — text-to-SQL is the wrong default.

How to migrate from text-to-SQL to OrmAI

Most teams arrive at OrmAI after a text-to-SQL pilot. Migration looks like:

List the questions your agent actually answers. Not the prompts — the underlying queries. There’s almost always a small set.
Map each to one of OrmAI’s tools. Most are db.query or db.aggregate calls.
Identify the long tail of queries that don’t map cleanly. Build domain tools for them: analytics.churn_cohort, billing.outstanding_balance. These get code-reviewed once and reused forever.
Replace the text-to-SQL prompt with the OrmAI tool list. Most agents end up smarter because the tool surface gives them clearer affordances.
Wire the audit log into your existing observability. You’ll see, often for the first time, what your agent has actually been doing.

We’ve done this migration with a half-dozen teams. It typically takes 1–3 weeks for a single-product agent.