Field-level redaction for PII

Stop your agent from ever seeing what it shouldn't. Mask, hash, or deny columns at the policy level — without touching application code.

The fastest way to fail a privacy review is to discover, three months in, that your agent has been seeing customer email addresses in its tool responses. This guide is about making that discovery impossible by making the underlying class of bug impossible.

The four redaction modes

OrmAI supports four ways to hide a field from the agent:

Allow (default): the field is returned as-is.
Mask: the field is returned with a redacted middle (e.g. j***@acme.com, *** *** 1234). Useful for fields the agent needs to recognize without seeing fully.
Hash: the field is returned as a stable, salted hash. Useful for joins/lookups where the agent needs equality but not the value.
Deny: the field is omitted from the response entirely. Useful for secrets and credentials.

You pick a mode per (model, field) pair. The choice is enforced on every read tool and every join target.

Declaring redaction

policy = (
    PolicyBuilder(DEFAULT_PROD)
    .register_models([Customer, Order, Subscription])
    .deny_fields("*password*", "*secret*", "*token*", "*api_key*")  # globs
    .mask_fields([
        "customer.email",
        "customer.phone",
        "customer.tax_id",
    ])
    .hash_fields(["customer.ssn", "subscription.external_billing_id"])
    .build()
)

The glob patterns in deny_fields match across all registered models. This is intentional: secrets should never accidentally show up because someone forgot to add a model to the deny list. If a column anywhere in your schema matches *secret*, it’s denied.

The mask_fields and hash_fields lists use model.field notation because they’re more selective.

What the agent sees

A db.get for a customer:

{"name": "db.get", "arguments": {"model": "Customer", "id": 1}}

Returns:

{
  "id": 1,
  "tenant_id": 42,
  "name": "Ada Lovelace",
  "email": "a***[email protected]",
  "phone": "5********12",
  "tax_id": "12-***6789"
}

The password_hash, api_key, and signing_secret columns are absent — they matched *password*, *api_key*, and *secret*. The agent doesn’t even know they exist (they’re also stripped from describe_schema).

Masking strategies

The default mask is “first character, last character, asterisks in between.” For email, OrmAI is slightly smarter and preserves the domain:

[email protected] → a*****[email protected]

For phone numbers, the last 2 digits are preserved.

For arbitrary strings, you can override:

.mask_fields([
    ("customer.tax_id", "preserve_last_4"),  # ***-**-1234
    ("customer.api_id", "first_3_chars"),    # acm****
    ("customer.notes", "fully"),             # ********
])

Or supply a callable:

.mask_field("customer.full_address", lambda v: v.split(",")[-2].strip() + ", ***")

The callable runs in-process; it has no access to anything but the value.

Hash mode

Hashing returns a deterministic, salted SHA-256:

{ "ssn": "h:9c2a...8f1" }

The salt is per-policy. The same SSN produces the same hash within a policy, so the agent can join or group on it; across policies (or after a salt rotation), the hash is different.

This is useful when the agent needs to ask “are these two customers the same person?” without seeing the underlying identifier.

Conditional redaction

Some fields should be visible to some agents and not others. OrmAI supports per-principal redaction:

.mask_fields([
    "customer.email",
])
.allow_fields_for(role="support_lead", fields=["customer.email"])

Now any principal whose role is support_lead sees real emails; everyone else sees the mask. The role comes from RunContext.principal.role, which your auth layer populated.

Free-text fields

The hardest case is notes or description columns that contain arbitrary user-entered text. Some of that text is PII; most isn’t. Two patterns:

Deny by default, opt-in tools. Don’t expose customer.notes to the agent at all. Build a domain tool summarize_customer_notes(customer_id) that runs the notes through a redaction model server-side and returns a sanitized summary.
Mask aggressively then verify. Use a callable mask that runs an entity-recognition model and replaces detected PII with <NAME>, <EMAIL> tokens. Slower; sometimes the right tradeoff for large knowledge-base contents.

Do not give the agent raw access to free-text PII fields and “trust it” not to repeat what it sees. It will repeat what it sees.

Schema discovery

describe_schema() returns only what the policy allows. Denied fields are absent. Masked fields are present but flagged so the model knows it’s seeing a mask, not the truth:

{
  "Customer": {
    "fields": {
      "id": { "type": "int", "primary_key": true },
      "name": { "type": "str" },
      "email": { "type": "str", "redaction": "mask" },
      "phone": { "type": "str", "redaction": "mask" }
      // password_hash, api_key absent
    }
  }
}

Your prompt can take advantage of this. The agent reading the schema knows which fields are masked and won’t try to use them as join keys.

Compliance and audit

Field redaction is the easiest control to evidence. The artifact is the policy file. A SOC 2 / HIPAA / GDPR auditor wants to know “which fields are visible to your AI agent?” — you hand them the policy, the deny list, and the masked list.

If you keep the policy in version control (you should), you also get a free history of changes for evidence.

Common mistakes

Whitelisting models, not fields

A policy like “the agent can see Customer” implicitly grants every column. Wrong. Always think field-by-field for sensitive models.

Relying on the prompt to redact

“In your response, never include the customer’s email address.” Don’t do this. The model will leak it half the time. Redact in the data layer.

Forgetting that joins return the joined model

If Order has a customer relationship and the agent issues include: ["customer"], the customer’s redaction rules apply to the embedded object. Test this. We’ve seen one case where redaction rules were applied to direct queries but not to joined responses — fixed in OrmAI 0.2.0, worth verifying on your version.

Masking but not denying secrets

Secrets should never be masked, only denied. A masked password is still mostly a password.