Grounding a Salesforce agent in Data 360 — LlamaIndex vs LangChain

The questions that actually matter about a Salesforce org are not the ones a language model can answer from its weights:

Which permission sets grant Modify All Data, and to how many users?
What sharing risks exist on the Loan Application object?
Which Apex classes are below the test-coverage deploy gate?

These have answers, but the answers live in the org's configuration and its data — not in any model's training set. So the interesting engineering isn't the prompt. It's the grounding: getting the org's real state in front of the model as context, with citations, so the answer is checkable.

I built that grounding layer over Salesforce metadata and Data 360 (Data Cloud), and then I built it twice — once with LlamaIndex, once with LangChain / LangGraph — over the same corpus, to see where the two frameworks actually differ. The code is sf-agent-grounding; it runs org-free on a committed sample, or against a live org.

The corpus is the work

Retrieval is only as good as what you feed it. A vector index over "the whole org" is noise; the signal is a small set of short, self-describing documents. I model three kinds:

Objects — name, custom flag, fields.
Permission sets — and the system permissions that matter, like Modify All Data.
Governance findings — health, technical-debt, and RBAC results. This is the highest-value source: it's already the answer to "what's wrong," phrased for a human.

One finding document, verbatim from the sample org:

finding:H-001 :: Governance finding [HIGH] in permissions: Permission set
'Loan_Reviewer' grants Modify All Data to 14 non-admin users. Remediation:
Replace Modify All Data with object-level Edit on Loan_Application__c and Case.

Crucially, the document type is framework-neutral — just id / text / metadata. The same Doc feeds both stacks, which is the only way the comparison is fair.

The pipeline

The grounding pipeline

Same shape in both frameworks; each stage lights as you scroll.

Corpus

objects · permission sets · governance findings

↓

Embed

fastembed · local · no API key

↓

Vector index

VectorStoreIndex / InMemoryVectorStore

↓

Retrieve

top-k nearest documents

↓

Ground

Claude answers from the evidence

↓

Grounded answer + citations

Corpus

Retrieval is only as good as the corpus. We turn objects, permission sets, and governance findings into short, self-describing documents — the same Doc shape for both frameworks.

Embed

Embeddings run locally via fastembed (ONNX) — no embeddings API key, deterministic, and identical across both stacks. That's what keeps the comparison honest.

Vector index

Each framework builds its own index over the same vectors: LlamaIndex's VectorStoreIndex, LangChain's InMemoryVectorStore. Same vectors in, searchable index out.

Retrieve

A question pulls the top-k nearest documents. On every eval question, both stacks retrieved the same grounding document — retrieval parity.

Ground

Claude answers from the retrieved evidence, with citations. LangChain wraps this in a LangGraph agent that can also decide to query Data 360 — that's where the two diverge.

Two frameworks, one corpus

Here's the entire difference between the two, side by side. Walk it:

1/3Same embeddings, on purpose

build_index (LlamaIndex) vs build_store (LangChain)

1# LlamaIndex

2def build_index(docs):

3 Settings.embed_model = FastEmbedEmbedding(model_name=EMBED_MODEL)

4 li_docs = [Document(text=d.text, metadata={**d.metadata, "id": d.id}, id_=d.id)

5 for d in docs]

6 return VectorStoreIndex.from_documents(li_docs)

8# LangChain

9def build_store(docs):

10 emb = FastEmbedEmbeddings(model_name=EMBED_MODEL)

11 store = InMemoryVectorStore(embedding=emb)

12 store.add_documents(

13 [Document(page_content=d.text, metadata={**d.metadata, "id": d.id})

14 for d in docs]

15 )

16 return store

Both call fastembed with the same local model — identical vectors, so any difference downstream is the framework, not the embedding. No embeddings API key, deterministic, runs offline.

What the comparison actually showed

I ran the same five questions through both stacks and compared the top retrieved document. The result was the boring-but-important kind:

Question	LlamaIndex top-1	LangChain top-1	Agree
Permission sets granting Modify All Data	`finding:H-001`	`finding:H-001`	✅
Sharing risks on Loan Application	`finding:H-002`	`finding:H-002`	✅
Apex below the coverage gate	`finding:D-001`	`finding:D-001`	✅
Unassigned permission sets	`finding:R-001`	`finding:R-001`	✅
Fields on Loan Application	`object:Loan_Application__c`	`object:Loan_Application__c`	✅

Retrieval was identical on every question. That's the honest headline: when the embedding model and corpus are the same, retrieval quality tracks the corpus, not the framework. Citation quality is a data-modeling problem, not a framework choice.

Where they actually diverge

So why pick one? Not for retrieval — for what surrounds it.

LlamaIndex is terser to stand up for pure RAG. index.as_query_engine().query(q) is the whole grounded-answer path; as_retriever() is the whole retrieval path. If the job is "answer questions from a corpus," it's less ceremony.
LangChain + LangGraph earns its weight the moment the model needs to choose. I gave the agent two tools — org retrieval and a Data 360 query — and let it decide. "Show our highest-LTV accounts, then flag governance risks" makes it query Data 360 for the accounts and retrieve findings for the risks, in one turn. That orchestration is the thing LangGraph is for.

The lesson I'd give a customer: the framework decision is a decision about the agent's control flow, not its retrieval. If you're grounding answers, either is fine — optimize the corpus. If the agent has to pick between tools and data sources, reach for the graph.

The Data 360 angle

The reason this isn't just "RAG over metadata" is the second source. Data 360 (Data Cloud) is queryable with ANSI SQL over REST:

POST /services/data/v64.0/ssot/queryv2
{ "sql": "SELECT account_name__c, ltv__c FROM UnifiedAccount__dlm ORDER BY ltv__c DESC LIMIT 5" }

That unifies CRM, web, and warehouse data behind one model — and with Zero-Copy, the same model federates to Snowflake or BigQuery, so the agent reaches warehouse data without anyone copying it. An agent grounded in both the org's governance state and its unified customer data can answer questions neither source can alone.

Run it yourself

The whole thing runs with no org and no embeddings key — the sample corpus ships in the repo:

uv sync --extra rag --extra notebooks
uv run jupyter lab    # open notebooks/04_compare_and_eval.ipynb

Point SF_ORG_ALIAS at a live org (a current free Developer Edition includes Agentforce and Data Cloud) and the same code pulls your real objects, permission sets, and findings.

Code, notebooks, and the sample corpus: sf-agent-grounding. Apache-2.0. It's the retrieval-shaped companion to sf-dev-ai, which grounds an agent through tools instead.