How I'd advise a customer rolling out AI agents on their data

The valuable advisory work isn't break-fix. It's proactive — getting in front of the architecture before the production incident, the audit finding, or the stalled rollout. So this is the memo I'd write for a team three months into putting an AI agent over their real data — a CRM, a warehouse, internal APIs, whatever — when the security team starts asking whether they're doing it right.

It's deliberately two-voiced, because the job is two-voiced: the exec wants to know is it safe and will it scale; the platform team wants to know what do we change on Monday. An architect's real work is mapping cleanly between those.

Where these rollouts actually go wrong

Not in the demo. In the second quarter, when the agent has real tools, the data model has sprawled, and nobody owns the grounding. The risks that bite:

RiskWhy it bitesWhat I'd advise
Over-privileged toolsAgent tools wired to broad permissions / a powerful service identity mutate beyond intentLeast-privilege identity per tool; mark destructive tools and gate them; inventory every tool and its blast radius
Data model sprawlTables and views duplicated, ungoverned ingestion, identity resolution guessed-atA named owner for the data model; naming + review for new entities; prefer federating over copying
Grounding driftRetrieval goes stale as the systems change; the agent cites last quarter's realityA re-index cadence tied to data change; a freshness SLO; treat the corpus as a maintained asset
Permission bypassThe agent returns a row or field the asking user can't seeEnforce row/field permissions on tools and generated queries — verify on the way out, don't trust the prompt
PII in prompts & logsSensitive data flows into context windows and trace logsData masking on sensitive tiers; log redaction; a sensitive-data handling rule per source
No evalsQuality degrades silently; nobody notices until a user doesA small set of golden conversations + regression scoring before each change

The pattern across all six: the failure isn't the model, it's ungoverned surface area. That's the architect's lane.

The runbook I'd hand them

Phased, because trying to do it all at once is its own risk.

  1. Baseline & guardrails. Inventory every tool and the identity it runs as; capture the row/field permissions on the data in scope; turn on confirmation gates for anything destructive. Nothing ships until this exists.
  2. Data model review. Walk the entities, identity-resolution rules, and ingestion sources. Kill duplicates, name an owner, decide federate vs. copy per source.
  3. Grounding & tools, tiered. Wire retrieval and tools with least privilege and explicit risk tiers (read / write / destructive). The client should see the tier and gate accordingly.
  4. Evals & observability. Golden conversations, regression scoring, and trace logging with redaction. Make quality measurable before scaling traffic.
  5. Scale & a standing review board. Only now open the aperture — with a recurring architecture review as the control that keeps surface area governed as it grows.

The review-board checklist

What I'd actually bring to the table:

Same architecture, two explanations

To the exec: "The agent only ever acts inside the permissions the user already has, it asks before anything irreversible, and we measure its answers before we widen access. The data stays where it lives. We review the whole thing on a cadence — so it scales without the surface area getting away from us."

To the platform team: "Per-tool run-as identities, tier annotations mapped to client confirmation gates, row/field permissions enforced on generated queries, federated data with an owner, a re-index job on data change, and a golden-conversation eval in CI."

Both sentences describe the same system. The architect's job is to keep them true to each other — and to make the safe path the easy one, so the team takes it without being told to.

(This generalizes a specific engagement I wrote up for Salesforce Agentforce + Data 360, but none of it is Salesforce-specific.)