Grounding a Salesforce agent in Data 360 — LlamaIndex vs LangChain

The questions that actually matter about a Salesforce org are not the ones a language model can answer from its weights:

These have answers, but the answers live in the org's configuration and its data — not in any model's training set. So the interesting engineering isn't the prompt. It's the grounding: getting the org's real state in front of the model as context, with citations, so the answer is checkable.

I built that grounding layer over Salesforce metadata and Data 360 (Data Cloud), and then I built it twice — once with LlamaIndex, once with LangChain / LangGraph — over the same corpus, to see where the two frameworks actually differ. The code is sf-agent-grounding; it runs org-free on a committed sample, or against a live org.

The corpus is the work

Retrieval is only as good as what you feed it. A vector index over "the whole org" is noise; the signal is a small set of short, self-describing documents. I model three kinds:

One finding document, verbatim from the sample org:

finding:H-001 :: Governance finding [HIGH] in permissions: Permission set
'Loan_Reviewer' grants Modify All Data to 14 non-admin users. Remediation:
Replace Modify All Data with object-level Edit on Loan_Application__c and Case.

Crucially, the document type is framework-neutral — just id / text / metadata. The same Doc feeds both stacks, which is the only way the comparison is fair.

The pipeline

The grounding pipeline

Same shape in both frameworks; each stage lights as you scroll.

Corpus
objects · permission sets · governance findings
Embed
fastembed · local · no API key
Vector index
VectorStoreIndex / InMemoryVectorStore
Retrieve
top-k nearest documents
Ground
Claude answers from the evidence
Grounded answer + citations
Corpus

Retrieval is only as good as the corpus. We turn objects, permission sets, and governance findings into short, self-describing documents — the same Doc shape for both frameworks.

Embed

Embeddings run locally via fastembed (ONNX) — no embeddings API key, deterministic, and identical across both stacks. That's what keeps the comparison honest.

Vector index

Each framework builds its own index over the same vectors: LlamaIndex's VectorStoreIndex, LangChain's InMemoryVectorStore. Same vectors in, searchable index out.

Retrieve

A question pulls the top-k nearest documents. On every eval question, both stacks retrieved the same grounding document — retrieval parity.

Ground

Claude answers from the retrieved evidence, with citations. LangChain wraps this in a LangGraph agent that can also decide to query Data 360 — that's where the two diverge.

Two frameworks, one corpus

Here's the entire difference between the two, side by side. Walk it:

1/3Same embeddings, on purpose
build_index (LlamaIndex) vs build_store (LangChain)
1# LlamaIndex
2def build_index(docs):
3 Settings.embed_model = FastEmbedEmbedding(model_name=EMBED_MODEL)
4 li_docs = [Document(text=d.text, metadata={**d.metadata, "id": d.id}, id_=d.id)
5 for d in docs]
6 return VectorStoreIndex.from_documents(li_docs)
7 
8# LangChain
9def build_store(docs):
10 emb = FastEmbedEmbeddings(model_name=EMBED_MODEL)
11 store = InMemoryVectorStore(embedding=emb)
12 store.add_documents(
13 [Document(page_content=d.text, metadata={**d.metadata, "id": d.id})
14 for d in docs]
15 )
16 return store
Both call fastembed with the same local model — identical vectors, so any difference downstream is the framework, not the embedding. No embeddings API key, deterministic, runs offline.

What the comparison actually showed

I ran the same five questions through both stacks and compared the top retrieved document. The result was the boring-but-important kind:

QuestionLlamaIndex top-1LangChain top-1Agree
Permission sets granting Modify All Datafinding:H-001finding:H-001
Sharing risks on Loan Applicationfinding:H-002finding:H-002
Apex below the coverage gatefinding:D-001finding:D-001
Unassigned permission setsfinding:R-001finding:R-001
Fields on Loan Applicationobject:Loan_Application__cobject:Loan_Application__c

Retrieval was identical on every question. That's the honest headline: when the embedding model and corpus are the same, retrieval quality tracks the corpus, not the framework. Citation quality is a data-modeling problem, not a framework choice.

Where they actually diverge

So why pick one? Not for retrieval — for what surrounds it.

The lesson I'd give a customer: the framework decision is a decision about the agent's control flow, not its retrieval. If you're grounding answers, either is fine — optimize the corpus. If the agent has to pick between tools and data sources, reach for the graph.

The Data 360 angle

The reason this isn't just "RAG over metadata" is the second source. Data 360 (Data Cloud) is queryable with ANSI SQL over REST:

POST /services/data/v64.0/ssot/queryv2
{ "sql": "SELECT account_name__c, ltv__c FROM UnifiedAccount__dlm ORDER BY ltv__c DESC LIMIT 5" }

That unifies CRM, web, and warehouse data behind one model — and with Zero-Copy, the same model federates to Snowflake or BigQuery, so the agent reaches warehouse data without anyone copying it. An agent grounded in both the org's governance state and its unified customer data can answer questions neither source can alone.

Run it yourself

The whole thing runs with no org and no embeddings key — the sample corpus ships in the repo:

uv sync --extra rag --extra notebooks
uv run jupyter lab    # open notebooks/04_compare_and_eval.ipynb

Point SF_ORG_ALIAS at a live org (a current free Developer Edition includes Agentforce and Data Cloud) and the same code pulls your real objects, permission sets, and findings.

Code, notebooks, and the sample corpus: sf-agent-grounding. Apache-2.0. It's the retrieval-shaped companion to sf-dev-ai, which grounds an agent through tools instead.