Grounding a Salesforce agent in Data 360 — LlamaIndex vs LangChain
The questions that actually matter about a Salesforce org are not the ones a language model can answer from its weights:
- Which permission sets grant Modify All Data, and to how many users?
- What sharing risks exist on the Loan Application object?
- Which Apex classes are below the test-coverage deploy gate?
These have answers, but the answers live in the org's configuration and its data — not in any model's training set. So the interesting engineering isn't the prompt. It's the grounding: getting the org's real state in front of the model as context, with citations, so the answer is checkable.
I built that grounding layer over Salesforce metadata and Data 360 (Data Cloud), and then I built it twice — once with LlamaIndex, once with LangChain / LangGraph — over the same corpus, to see where the two frameworks actually differ. The code is sf-agent-grounding; it runs org-free on a committed sample, or against a live org.
The corpus is the work
Retrieval is only as good as what you feed it. A vector index over "the whole org" is noise; the signal is a small set of short, self-describing documents. I model three kinds:
- Objects — name, custom flag, fields.
- Permission sets — and the system permissions that matter, like Modify All Data.
- Governance findings — health, technical-debt, and RBAC results. This is the highest-value source: it's already the answer to "what's wrong," phrased for a human.
One finding document, verbatim from the sample org:
finding:H-001 :: Governance finding [HIGH] in permissions: Permission set
'Loan_Reviewer' grants Modify All Data to 14 non-admin users. Remediation:
Replace Modify All Data with object-level Edit on Loan_Application__c and Case.
Crucially, the document type is framework-neutral — just id / text / metadata. The same Doc feeds both stacks, which is the only way the comparison is fair.
The pipeline
The grounding pipeline
Same shape in both frameworks; each stage lights as you scroll.
Retrieval is only as good as the corpus. We turn objects, permission sets, and governance findings into short, self-describing documents — the same Doc shape for both frameworks.
Embeddings run locally via fastembed (ONNX) — no embeddings API key, deterministic, and identical across both stacks. That's what keeps the comparison honest.
Each framework builds its own index over the same vectors: LlamaIndex's VectorStoreIndex, LangChain's InMemoryVectorStore. Same vectors in, searchable index out.
A question pulls the top-k nearest documents. On every eval question, both stacks retrieved the same grounding document — retrieval parity.
Claude answers from the retrieved evidence, with citations. LangChain wraps this in a LangGraph agent that can also decide to query Data 360 — that's where the two diverge.
Two frameworks, one corpus
Here's the entire difference between the two, side by side. Walk it:
What the comparison actually showed
I ran the same five questions through both stacks and compared the top retrieved document. The result was the boring-but-important kind:
| Question | LlamaIndex top-1 | LangChain top-1 | Agree |
|---|---|---|---|
| Permission sets granting Modify All Data | finding:H-001 | finding:H-001 | ✅ |
| Sharing risks on Loan Application | finding:H-002 | finding:H-002 | ✅ |
| Apex below the coverage gate | finding:D-001 | finding:D-001 | ✅ |
| Unassigned permission sets | finding:R-001 | finding:R-001 | ✅ |
| Fields on Loan Application | object:Loan_Application__c | object:Loan_Application__c | ✅ |
Retrieval was identical on every question. That's the honest headline: when the embedding model and corpus are the same, retrieval quality tracks the corpus, not the framework. Citation quality is a data-modeling problem, not a framework choice.
Where they actually diverge
So why pick one? Not for retrieval — for what surrounds it.
- LlamaIndex is terser to stand up for pure RAG.
index.as_query_engine().query(q)is the whole grounded-answer path;as_retriever()is the whole retrieval path. If the job is "answer questions from a corpus," it's less ceremony. - LangChain + LangGraph earns its weight the moment the model needs to choose. I gave the agent two tools — org retrieval and a Data 360 query — and let it decide. "Show our highest-LTV accounts, then flag governance risks" makes it query Data 360 for the accounts and retrieve findings for the risks, in one turn. That orchestration is the thing LangGraph is for.
The lesson I'd give a customer: the framework decision is a decision about the agent's control flow, not its retrieval. If you're grounding answers, either is fine — optimize the corpus. If the agent has to pick between tools and data sources, reach for the graph.
The Data 360 angle
The reason this isn't just "RAG over metadata" is the second source. Data 360 (Data Cloud) is queryable with ANSI SQL over REST:
POST /services/data/v64.0/ssot/queryv2
{ "sql": "SELECT account_name__c, ltv__c FROM UnifiedAccount__dlm ORDER BY ltv__c DESC LIMIT 5" }
That unifies CRM, web, and warehouse data behind one model — and with Zero-Copy, the same model federates to Snowflake or BigQuery, so the agent reaches warehouse data without anyone copying it. An agent grounded in both the org's governance state and its unified customer data can answer questions neither source can alone.
Run it yourself
The whole thing runs with no org and no embeddings key — the sample corpus ships in the repo:
uv sync --extra rag --extra notebooks
uv run jupyter lab # open notebooks/04_compare_and_eval.ipynb
Point SF_ORG_ALIAS at a live org (a current free Developer Edition includes Agentforce and Data Cloud) and the same code pulls your real objects, permission sets, and findings.
Code, notebooks, and the sample corpus: sf-agent-grounding. Apache-2.0. It's the retrieval-shaped companion to sf-dev-ai, which grounds an agent through tools instead.