Retrieval-Augmented Generation (RAG)
RAG lets the agent ground its answers in your documents instead of relying only on the model’s training data. You ingest text into a pgvector store once (with the Knowledge Ingestor), then any AI Agent task pointed at the same store retrieves the most relevant chunks and feeds them to the model as context.
When to use it
Use RAG when answers must reflect facts the model can’t know: internal policies, product manuals, contracts, knowledge-base articles, or terms specific to your organisation. The bundled system prompt tells the agent to treat retrieved snippets as the primary source of factual claims.
PostgreSQL + pgvector setup
RAG stores embeddings in PostgreSQL using the pgvector extension. One-time setup on your database:
CREATE EXTENSION IF NOT EXISTS vector;
You do not need to create the table by hand — the Knowledge Ingestor creates it if it does not
exist (createTable = true), using the configured table name (default langchain4j_embeddings).
Embedding models
The embedding model turns text into vectors. Two options:
| Local (default) | Remote | |
|---|---|---|
| Model | AllMiniLmL6V2 (ONNX, bundled) |
e.g. OpenAI text-embedding-ada-002 |
| Dimension | 384 | e.g. 1536 |
| External call | None | Yes (calls the provider) |
| How to select | leave embeddingModelName empty |
set embeddingModelName (+ apiKey) |
Important
The embeddingDimension must match the model that produced the vectors. If you ingest with one
model/dimension and query with another, retrieval breaks. Keep ingestion and the AI Agent task on
the same embeddingModelName and embeddingDimension.
Agent-side RAG configuration
On the AI Agent task, RAG activates as soon as you set PostgreSQL host (pgHost). The full
RAG group:
| Field | Parameter | Default |
|---|---|---|
| PostgreSQL host | pgHost |
— (setting it activates RAG) |
| PostgreSQL port | pgPort |
5432 |
| PostgreSQL database | pgDatabase |
— |
| PostgreSQL user | pgUser |
— |
| PostgreSQL password | pgPassword |
— |
| PostgreSQL table | pgTable |
langchain4j_embeddings |
| Max RAG results | maxRagResults |
5 |
| Min RAG score | minRagScore |
0.0 (no filtering) |
| Embedding dimension | embeddingDimension |
384 |
| Embedding model name | embeddingModelName |
empty → local AllMiniLmL6V2 |
maxRagResults caps how many chunks are retrieved per query; minRagScore (0.0–1.0) drops chunks
below a cosine-similarity threshold. AllMiniLmL6V2 typically yields similarities in the 0.3–0.7
range for related content, so start at 0.0 and tune upward if you see irrelevant context.
Ingesting knowledge — the Knowledge Ingestor connector
The CIB seven - Knowledge Ingestor template (connectorId = cibseven-knowledge-ingestor) embeds
text from within a process. Per invocation it: splits content into chunks (recursive splitter),
attaches source/metadata to each segment, embeds them, and stores them in pgvector — returning
the number of chunks via ${chunksIngested}.
| Field | Parameter | Default |
|---|---|---|
| Content | content |
— (required) |
| Source | source |
— (stored as segment metadata) |
| Metadata | metadata |
— (comma-separated key=value) |
| Chunk size | chunkSize |
500 |
| Chunk overlap | chunkOverlap |
50 |
| Embedding model name | embeddingModelName |
empty → local |
| API key | apiKey |
OPENAI_API_KEY env |
| Embedding dimension | embeddingDimension |
384 |
| PostgreSQL host | pgHost |
— (required) |
| … pgPort/pgDatabase/pgUser/pgPassword/pgTable | as above |
<camunda:connector>
<camunda:connectorId>cibseven-knowledge-ingestor</camunda:connectorId>
<camunda:inputOutput>
<camunda:inputParameter name="content">${documentText}</camunda:inputParameter>
<camunda:inputParameter name="source">${documentSource}</camunda:inputParameter>
<camunda:inputParameter name="pgHost">localhost</camunda:inputParameter>
<camunda:inputParameter name="pgDatabase">postgres</camunda:inputParameter>
<camunda:inputParameter name="pgUser">my_user</camunda:inputParameter>
<camunda:inputParameter name="pgPassword">${pgPassword}</camunda:inputParameter>
<camunda:outputParameter name="ingestedChunks">${chunksIngested}</camunda:outputParameter>
</camunda:inputOutput>
</camunda:connector>
Use the same pgTable, embeddingModelName, and embeddingDimension here as on the AI Agent
task that will query it. See the knowledge-base.bpmn demo in Examples.
Ingesting knowledge — the KnowledgeIngestor CLI
For bulk one-off loading of a file (e.g. a PDF) outside a process, the module ships a command-line
ingestor runnable via the Maven exec plugin from the connect/ai-agent module:
mvn exec:java -Dexec.args="\
--file knowledge-base.pdf \
--pgHost localhost --pgUser postgres --pgPassword secret \
--pgDatabase postgres --pgTable langchain4j_embeddings \
--chunkSize 500 --chunkOverlap 50"
--file, --pgHost, --pgUser, --pgPassword are required; the rest default as in
Configuration Reference. The CLI uses the local AllMiniLmL6V2 model (384-dim).
Data-residency and performance caveats
Warning — Remote embeddings ignore baseUrl
When you set a remote embeddingModelName, the embedding calls go to the public OpenAI endpoint
regardless of the chat baseUrl. For air-gapped or sovereignty-constrained deployments, use the
local AllMiniLmL6V2 model (leave embeddingModelName empty) so no document text leaves your
network. See Limitations.
- No ANN index by default. The embedding store is created without a vector index, so retrieval is a linear scan — fine for small/medium knowledge bases; for very large stores, add an index on the pgvector column out-of-band.