Building a governance MCP server for Salesforce — after Dreamforce 2025

Imagine you're the new architect on a Salesforce customer mid-Agentforce deployment. Three months in. The agents are live in production. The security team starts asking: who can do what across this org, what's drifted since launch, and where are the gaps the auditors will flag?

You open the Salesforce dev console. You open Setup. You wander between Profile pages and Permission Set pages and the Sharing Settings page and the Apex Class manager. There's no single tool — official or otherwise — that answers those three questions. There used to be the Salesforce Health Check report; it's still there, kind of, but it's a static page in Setup that doesn't talk to anything else and doesn't surface what its findings mean.

I started building sf-dev-ai a few weeks before Dreamforce 2025. It became more interesting after Dreamforce, when Salesforce shipped the things that almost but didn't quite fill the gap above. This post is about what they shipped, the lane that's still open, and the architecture I landed on — including five things that surprised me while building it.

The post-Dreamforce landscape

Three Salesforce products matter for this story.

Agentforce Vibes is the in-IDE coding agent. It launched October 1, 2025 — a VS Code-compatible extension that also runs in Cursor, Windsurf, and MuleSoft's Anypoint Code Builder. Under the hood it's built on the Cline framework. It generates Apex, LWC, and Flow metadata with org context awareness. Multi-model: xGen (Salesforce's in-house model), GPT-5, and Anthropic models are all reachable. This is the coding-loop tool — write code, run it against your org, iterate.

@salesforce/mcp is the official Salesforce DX MCP Server. Apache-2.0. Stdio transport. Bundled in npm as @salesforce/mcp. It exposes around 60 tools across 14 toolsets — LWC generation, Aura→LWC migration, Code Analyzer, DevOps Center work items, Apex test execution, scratch-org lifecycle, mobile LWC offline checks. Read SOQL, deploy metadata, run tests. If you spent any time in sf CLI muscle memory, the flag conventions (--orgs DEFAULT_TARGET_ORG, --toolsets data,metadata,testing) are immediately familiar.

Salesforce Hosted MCP Servers went GA in February 2026 — these are HTTP MCP servers hosted by Salesforce that expose CRM business data (Opportunities, Accounts, Cases) plus flows and invocable actions for end-user agents. Different product, different audience, but worth knowing about.

These three together cover most of what an in-IDE developer needs. Want to scaffold an LWC component? Vibes via DX MCP's lwc-experts toolset. Want to deploy a metadata bundle? DX MCP's deploy_metadata. Want to run Apex tests in CI? DX MCP's run_apex_test. Want CRM-style agent automation? Hosted MCP.

None of them answers the architect's three questions.

The governance gap

There's a class of question that doesn't fit any of the above. A few examples:

Which non-admin profiles have Modify All Data?
Which permission sets are unassigned, and how long have they been that way?
What's the org-wide default sharing model on each customisable object?
Where are sessions still configured for twelve-hour timeouts?
How does access drift compare between this user and that user?
What's the Lightning page hierarchy on the Opportunity record, and which components reference deprecated APIs?

These aren't build-loop questions. They aren't deploy-loop questions. They're whole-org analyst questions — and the readers are admins, architects, security teams, FDEs landing on a customer engagement. Salesforce's own surface here is the Health Check report in Setup, which has not seen meaningful API exposure.

The first thing I built was a 13-rule static analyser for permission drift. Then a technical-debt scanner for Apex coverage and inactive automation. Then an RBAC auditor for user → permission set graphs. After Dreamforce I added three more rules for Setup-level security (org-wide sharing defaults, session timeout, API enabled on the Standard User profile), bringing the health scanner to 16 rules. Each rule is a deterministic check with a structured Finding output:

{
  checkId: "PROFILE_MODIFY_ALL_DATA",
  category: "profiles",
  severity: "critical",
  title: "Non-admin profiles with Modify All Data",
  affectedItems: [{ name: "Sales User", id: "00ex0..." }],
  remedy: "Remove Modify All Data from non-admin profiles. Use permission sets..."
}

Running the scanner against a real Trailhead Playground returns nine findings — one critical, four high, two medium, two info — with a Grade F, score 44/100. Real, not fabricated; it's a real org and a real diagnostic. Once you have the data structured like this, the question becomes how to surface it to an AI agent so the agent can read findings, narrate them, and chain into remediation.

Architecture core: the tier model

The most original thing in sf-dev-ai is the tier model. Every tool — native or proxied — declares a risk tier, and the tier maps to standard MCP ToolAnnotations that any compliant MCP client reads natively.

Here's the whole shape. Scroll through the three lanes — each lights up as you reach it. The client sees one server; what happens behind it is the rest of this section.

How a call routes through sf-dev-ai

Each lane lights up as you scroll its step.

Workspace — MCP client

Claude Code · Cursor · Windsurf

↓

stdio · one server, 35 tools, one tier model

↓

sf-dev-ai MCP server

tier annotation layer

read · create · update · delete → MCP ToolAnnotations

native tools

health · rbac · debt · crud

dx_ proxy lane

--with-dx-mcp

↓

jsforce

bearer · refresh via sf CLI

@salesforce/mcp

child process

↓

Salesforce org

metadata · SOQL · Apex · permissions

Native governance

The health hub, RBAC auditor and debt scanner run as native tools. They read the org through jsforce — bearer-only, with token refresh delegated to the sf CLI — and return structured findings the agent narrates.

get_health_findings()
  → engine runs 16 deterministic checks
  → jsforce reads the org (bearer token)
  → returns Finding[] { severity, remedy, ... }

Proxy · --with-dx-mcp

With --with-dx-mcp, sf-dev-ai spawns @salesforce/mcp as a child process, lists its tools, and re-exports a curated subset behind a dx_ prefix — with our annotations, not theirs. The client never sees the child.

server.registerTool(`dx_${name}`, {
  description: `[proxied] ${upstream.description}`,
  inputSchema: upstream.inputSchema,
  annotations: ANNOTATIONS[mapping.tier], // ours
}, (args) => client.callTool({ name, arguments: args }));

Tier safety layer

Every tool — native or proxied — declares a risk tier that maps to standard MCP ToolAnnotations. The client reads destructiveHint / readOnlyHint natively and renders confirmation gates. No skill file, no manifest parsing.

// tier → standard MCP ToolAnnotations
delete  → { destructiveHint: true,  ... }
update  → { idempotentHint: true,   ... }
read    → { readOnlyHint: true,     ... }
// client renders the gate; we never build a UI

The tier definitions are one small file — . Walk it tier by tier:

1/4Reads run automatically

src/lib/mcp/annotations.ts

1export const ANNOTATIONS = {

2read: {

3 readOnlyHint: true,

4 openWorldHint: true,

5},

6readSensitive: {

7 readOnlyHint: true,

8 openWorldHint: true,

9},

10create: {

11 readOnlyHint: false,

12 destructiveHint: false,

13 idempotentHint: false,

14 openWorldHint: true,

15},

16update: {

17 readOnlyHint: false,

18 destructiveHint: false,

19 idempotentHint: true,

20 openWorldHint: true,

21},

22delete: {

23 readOnlyHint: false,

24 destructiveHint: true,

25 idempotentHint: false,

26 openWorldHint: true,

27},

28} as const;

read and readSensitive both set readOnlyHint: true, so a compliant client never gates them. readSensitive is identical on the wire but flagged sensitive in our logs — full Apex bodies, raw permission XML.

These get attached to every tool registration:

server.tool(
  "delete_record",
  "Delete a Salesforce record by ID.",
  { objectName: z.string(), recordId: z.string() },
  ANNOTATIONS.delete,
  async ({ objectName, recordId }) => { /* ... */ }
);

When Claude Code calls tools/list, every tool ships with these hints in the response. Claude Code (and Cursor, and Windsurf) read destructiveHint: true and render a confirmation gate automatically — no skill file, no manifest parsing, just the protocol doing what the protocol is for.

Here's the thing: @salesforce/mcp ships zero annotations on any of its ~60 tools. Not one. The official Salesforce DX MCP Server has no per-tool readOnlyHint, no destructiveHint, no idempotency signal. There's a community thread on production-deploy safety asking for guard rails. The current workaround is "restrict the Salesforce CLI Connected App or revoke API access at the org level." That works but it's a coarse instrument.

sf-dev-ai ships a fine instrument. Tier 0 reads run automatically. Tier 1 reads run automatically but the output is treated as sensitive in logs (full Apex code bodies, raw permission XML). Tier 2 mutations require confirmation. Tier 3 destructive operations require confirmation with an irreversibility warning. The confirmation is delegated to the client via standard MCP — sf-dev-ai never has to implement a permission UI.

Composition via proxy — `--with-dx-mcp`

Once you've built a safety layer, the natural next thought is: can it apply to other people's tools too?

Because @salesforce/mcp is Apache-2.0 and exposes a clean stdio MCP interface, sf-dev-ai can spawn it as a child process and re-export a curated subset of its tools — with sf-dev-ai's tier annotations layered on top. That's what the --with-dx-mcp flag does.

npx sf-dev-ai-mcp --orgs my-dev --with-dx-mcp

The proxy lives in . The whitelist lives in :

export const DX_TOOL_MAP = [
  {
    upstreamName: "run_apex_test",
    tier: "update",
    rationale: "Apex tests: idempotent in end-state sense but execute user-defined Apex with arbitrary side effects.",
  },
  {
    upstreamName: "run_code_analyzer",
    tier: "read",
    rationale: "Static analysis (PMD/ESLint/RetireJS); no org-side mutation.",
  },
  {
    upstreamName: "query_code_analyzer_results",
    tier: "read",
    rationale: "Reads previously-stored analyzer results; pure read.",
  },
  {
    upstreamName: "assign_permission_set",
    tier: "create",
    rationale: "Writes a new PermissionSetAssignment row; not idempotent.",
  },
];

Notice what's not there: deploy_metadata (production-deploy safety is significant; users who want it should register @salesforce/mcp directly), DevOps Center (out of governance scope), the LWC suite (~30 tools — wrong lane).

At boot, the proxy spawns the child via StdioClientTransport, calls tools/list, looks each upstream tool up in the map, and re-registers it through our server with a dx_ prefix and our annotations:

server.registerTool(
  `dx_${mapping.upstreamName}`,
  {
    description: `[proxied from @salesforce/mcp] ${upstream.description}`,
    inputSchema: upstream.inputSchema,
    annotations: ANNOTATIONS[mapping.tier],
  },
  async (args) => client.callTool({
    name: mapping.upstreamName,
    arguments: args,
  }),
);

The MCP client (Claude Code, Cursor) sees one server, 35 tools, one tier model. The DX MCP child is invisible to the client. If the child crashes, our error handler translates it into a structured isError: true tool result — the parent never panics. If the parent dies (stdin EOF, SIGTERM, SIGPIPE), the child is reaped via client.close().

The point isn't the proxy itself. The point is that composition via MCP is real. Apache-2.0 means I can build a safety layer on top of someone else's tools without owning their code. The official server gets weekly releases from a Salesforce CLI team; I don't have to keep up.

Static analysis + agent

The second architectural idea I lean on is let static analysis do the analysis; let the LLM do the explanation and the action.

The Health Hub runs 16 deterministic checks. Each one is plain TypeScript:

{
  id: "PROFILE_MODIFY_ALL_DATA",
  category: "profiles",
  severity: "critical",
  evaluate: (data) => {
    const flagged = data.profilePermissions.filter(
      (p) => p.PermissionsModifyAllData && !isAdminProfile(p["Profile.Name"])
    );
    if (flagged.length === 0) return [];
    return [{ /* ...finding... */ }];
  }
}

When the agent asks get_health_findings, the tool runs the engine and returns the structured findings. The LLM then explains them, prioritises, and proposes remediations the user can confirm.

The temptation in any AI-adjacent project is to let the LLM do the analysis itself. "Just give it the metadata and ask it to find security issues." It works once or twice and then it doesn't — the prompt grows, the cost grows, the findings drift between runs, and you can't audit why it flagged something. Plain code is faster, cheaper, reproducible, and reviewable. The LLM is best at the human-language layer: explaining a finding in context, prioritising, planning a fix, narrating tradeoffs. Use it where it's strong, not where it's expensive.

One specific pattern that's worth calling out: the Setup-level rules I added in Phase 5 read from APIs that might not be available depending on org edition (SecuritySettings via Metadata API, sharing models via EntityDefinition). A single failing API read shouldn't fail the whole scan. The collector wraps each upstream call:

const [entitySharing, securitySettings] = await Promise.all([
  (async () => {
    try {
      return (await conn.query("SELECT ... FROM EntityDefinition ...")).records;
    } catch {
      return null;
    }
  })(),
  (async () => {
    try {
      return await conn.metadata.read("SecuritySettings", "Security");
    } catch {
      return null;
    }
  })(),
]);

If one returns null, the dependent rules return [] silently and the other 15 rules still run. Optional data, mandatory robustness.

Five things that surprised me

The architecture story is the one I planned. These are the things that I learned while shipping it.

1. jsforce's refresh-token grant doesn't work against PlatformCLI

This was the demo-breaker. I wired jsforce with the access token and refresh token from ~/.sfdx/<user>.json, fully expecting jsforce to auto-refresh when the access token expired. It didn't. Every call returned Unable to refresh session due to: expired access/refresh token — even when sf CLI showed the org as Connected thirty seconds before.

PlatformCLI is Salesforce's hardcoded Connected App for the sf CLI itself. It requires CLI-bundled credentials and PKCE state that jsforce doesn't have. The CLI handles this internally via @salesforce/core. The fix was to delegate refresh to the CLI, in :

// src/lib/mcp/sf-cli-auth.ts
export function tryGetFreshAccessToken(orgArg: string): string | null {
  try {
    const stdout = execFileSync(
      "sf",
      ["org", "auth", "show-access-token", "--target-org", orgArg, "--json"],
      { stdio: ["ignore", "pipe", "ignore"], timeout: 30_000 },
    ).toString();
    const parsed = JSON.parse(stdout);
    return parsed.status === 0 ? parsed.result?.accessToken ?? null : null;
  } catch {
    return null;
  }
}

Then buildConnection uses bearer-only mode — no oauth2 config, no refreshToken passed to jsforce. The CLI does the refresh; jsforce just uses the bearer. The cost is ~1.5s of sf subprocess at boot, paid once. Worth it.

2. MCP annotations don't "ride" upstream when you proxy

I assumed when the proxy forwards a tools/list response from @salesforce/mcp to its own client, upstream annotations would just pass through. They don't — because @salesforce/mcp doesn't ship most of them, and even when it does (run_soql_query has readOnlyHint: true), the proxy needs control over annotations to provide the unified safety model.

The fix was small but conceptually important: the proxy ignores upstream annotations entirely and writes its own from the tier map. That's what makes the composition work as a safety layer on top of @salesforce/mcp, not just alongside it.

3. Tool name collisions are real and require prefixes

Both servers ship run_soql_query, list_metadata, describe_object, list_apex_classes, list_profiles. Without the dx_ prefix on proxied tools, the MCP SDK throws on duplicate registration. The prefix isn't just a hygiene choice — it's load-bearing.

It also turns out to be a UX win. When Claude Code looks at its tool list and sees get_health_findings next to dx_run_apex_test, the user can tell at a glance which lane each tool belongs to. The prefix is documentation as much as it's a namespace.

4. ESM module-load order matters more than you think

The MCP server imports a database client (Drizzle + PostgreSQL or SQLite). The DB client throws at module load if DATABASE_URL isn't in process.env. The CLI bin would happily run for --help, but as soon as anything imported the governance tools (which transitively import the DB client), boot failed before printing usage.

The fix is a tiny side-effect-only module — — that loads .env.local via Node 22's process.loadEnvFile:

// bin/_load-env.ts — must be the first import
import { existsSync } from "node:fs";
for (const path of [".env.local", ".env"]) {
  if (existsSync(path)) {
    try { process.loadEnvFile(path); break; } catch {}
  }
}

The entrypoint imports it before anything else — :

// bin/sf-dev-ai-mcp.ts
import "./_load-env";  // MUST be first
import { parseArgs } from "node:util";
// ...everything else

ESM resolves imports depth-first in document order, so _load-env runs before anything else can read process.env. Stupid simple, but figuring out why you need a separate file is the surprise.

5. `StdioServerTransport.onclose` doesn't fire on stdin EOF

This one cost me an orphaned child process during the first smoke test. I assumed the MCP server SDK would detect when its stdin closed (because Claude Code disconnected, say) and fire transport.onclose. It doesn't — the SDK's onclose only fires when you call close() explicitly. There's no auto-detection of EOF.

So when the parent process's stdin closed, the server kept running. Which meant the DX MCP child kept running too. Orphans. The fix:

process.stdin.on("end", () => shutdown("stdin EOF"));
for (const sig of ["SIGTERM", "SIGINT", "SIGPIPE"] as const) {
  process.on(sig, () => void shutdown(`signal ${sig}`));
}

Every long-running MCP stdio server should have this. It's not in the SDK examples — and looking at a few community servers, lots of them have the same orphan-child problem just waiting to bite them.

Outlook

A few things I think will be true in the Salesforce-and-AI space over the next year, that this project is shaped by.

Composability is the new feature surface. The default reflex when a platform vendor ships a tool is to either use it or duplicate it. MCP makes a third option real: compose it. Wrap it. Layer on top of it. Apache-2.0 plus a clean stdio interface means the build/deploy/migrate problem and the governance problem don't have to be solved by the same team, the same vendor, or the same codebase. The boundary between "Salesforce's own tools" and "third-party tools" is going to blur a lot.

Tier-based agentic safety becomes table stakes. Once a few MCP servers ship explicit annotations and confirmation gates, the ones that don't will look reckless. The community thread asking @salesforce/mcp for production-deploy guards isn't going away. I'd expect either Salesforce or a downstream layer (which is what sf-dev-ai is, in this lane) to standardise the safety story within the next two releases.

The whole-org analyst lane is still wide open. Vibes is the developer's IDE agent. Hosted MCP is the business user's CRM agent. The lane between — the architect, the admin, the platform engineer, the audit lead — is the underserved one. Health Check sits in Setup. Permission set analysis lives in a spreadsheet someone exports quarterly. There's a lot of room here for opinionated, annotated, MCP-native tooling.

What's next on sf-dev-ai itself. Audit log persistence (the manifest currently declares audit_logging: false as a known gap; closing it unblocks SOC and FedRAMP conversations). Durable workflows for long-running deploys. Cross-org governance diff (two orgs in parallel, structural delta). More health and debt rules — the engine architecture makes pushing new ones cheap, and I'd like to see this comfortably north of 30 rules.

Every customer running Agentforce in production needs governance tooling. That's the technical claim — and I think it's now defensible.

Code is here

The repo is . Apache-2.0. The 60-second try-it path:

git clone https://github.com/dominic-righthere/sf-dev-ai.git
cd sf-dev-ai
cp .env.example .env.local
# edit .env.local: ANTHROPIC_API_KEY + SESSION_SECRET (openssl rand -base64 32)
npm install
npm run db:push    # creates ./sfdev.db (SQLite, no external service)
npm run dev        # http://localhost:3000

For the MCP stdio path in Claude Code, see the README's "Use it from Claude Code / Cursor" section. If you want the unified-proxy mode with @salesforce/mcp, pass --with-dx-mcp. If you find a sixth surprise in the code, I'd be curious to hear about it.