Text-to-SOQL that won't leak

Letting an agent turn "list the open loan applications and their amounts" into a SOQL query is a five-line prompt. Letting it do that without ever returning a field the asking user isn't allowed to see is the part a Salesforce architect actually has to answer for.

Field-level security (FLS) and sharing aren't suggestions in a regulated org — they're the control that keeps a support agent from reading a loan applicant's risk tier. An LLM that writes queries has to inherit those controls, not route around them. Here's the pattern I landed on: two layers, and the one that matters is the one that doesn't trust the model.

Layer 1: constrain the prompt

The model only ever sees the accessible fields. The prompt is built from the FLS allow-list, so a restricted field like Risk_Tier__c isn't in the model's vocabulary for that object. This stops most leaks — but "most" isn't a control. Prompts drift, models hallucinate field names, someone pastes a schema. So it's necessary, not sufficient.

Layer 2: enforce on the output

This is the layer that doesn't trust anything. Every generated query is checked against the allow-list before it runs — and the check is deterministic, so it needs no model and no org to test.

1/3Restricted = all − accessible
text_to_soql.py
1def restricted_fields(obj, allowed, full):
2 # everything the object has, minus what this user can see
3 return set(full.get(obj, [])) - allowed.get(obj, set())
4 
5def enforce_fls(soql, allowed, full):
6 obj = parse_object(soql)
7 if obj is None:
8 return False, ["could not parse target object"]
9 if obj not in allowed:
10 return False, [f"object not permitted: {obj}"]
11 restricted = restricted_fields(obj, allowed, full)
12 present = sorted(f for f in restricted
13 if re.search(rf"\b{re.escape(f)}\b", soql))
14 return (len(present) == 0), present
The allow-list is what the user can see; the restriction is everything else the object has. Deriving it by subtraction means a newly-added sensitive field is restricted by default — fail closed, not open.

What it does

Same allow-list, three queries. The first is clean; the others try to reach Risk_Tier__c / Reviewed_By__c through different clauses:

PASS  -> ok
        SELECT Id, Name, Amount__c FROM Loan_Application__c WHERE Status__c = 'Open'
BLOCK -> ['Reviewed_By__c', 'Risk_Tier__c']
        SELECT Id, Risk_Tier__c, Reviewed_By__c FROM Loan_Application__c
BLOCK -> ['Risk_Tier__c']
        SELECT Id, Name FROM Loan_Application__c ORDER BY Risk_Tier__c DESC

The third one is the interesting catch: it doesn't select the restricted field, it orders by it — which still leaks information (the sort reveals relative risk). Scanning the whole query text, not just the SELECT clause, is what closes that gap.

The architect's framing

The thing I'd put in front of a customer isn't the code — it's the principle: "won't leak" is a data-modeling property, enforced in code. The allow-list comes straight from describe (field.accessible per the running user); the guard is the same whether the SQL came from a human, an agent, or a generated report. The LLM is just another untrusted query source, and you already know how to handle those.

The lightweight parser here is illustrative, not a full SOQL grammar — but the shape is the point: generate inside the fence, and verify on the way out.

Code and a runnable notebook: sf-agent-grounding (notebooks/05). It's part of the same project as grounding a Salesforce agent in Data 360.