Retrieval-Augmented Generation (RAG)

RAG lets the agent ground its answers in your documents instead of relying only on the model’s training data. You ingest text into a pgvector store once (with the Knowledge Ingestor), then any AI Agent task pointed at the same store retrieves the most relevant chunks and feeds them to the model as context.

When to use it

Use RAG when answers must reflect facts the model can’t know: internal policies, product manuals, contracts, knowledge-base articles, or terms specific to your organisation. The bundled system prompt tells the agent to treat retrieved snippets as the primary source of factual claims.

PostgreSQL + pgvector setup

RAG stores embeddings in PostgreSQL using the pgvector extension. One-time setup on your database:

CREATE EXTENSION IF NOT EXISTS vector;

You do not need to create the table by hand — the Knowledge Ingestor creates it if it does not exist (createTable = true), using the configured table name (default langchain4j_embeddings).

Embedding models

The embedding model turns text into vectors. Two options:

	Local (default)	Remote
Model	`AllMiniLmL6V2` (ONNX, bundled)	e.g. OpenAI `text-embedding-ada-002`
Dimension	384	e.g. 1536
External call	None	Yes (calls the provider)
How to select	leave `embeddingModelName` empty	set `embeddingModelName` (+ `apiKey`)

Important

The embeddingDimension must match the model that produced the vectors. If you ingest with one model/dimension and query with another, retrieval breaks. Keep ingestion and the AI Agent task on the same embeddingModelName and embeddingDimension.

Agent-side RAG configuration

On the AI Agent task, RAG activates as soon as you set PostgreSQL host (pgHost). The full RAG group:

Field	Parameter	Default
PostgreSQL host	`pgHost`	— (setting it activates RAG)
PostgreSQL port	`pgPort`	`5432`
PostgreSQL database	`pgDatabase`	—
PostgreSQL user	`pgUser`	—
PostgreSQL password	`pgPassword`	—
PostgreSQL table	`pgTable`	`langchain4j_embeddings`
Max RAG results	`maxRagResults`	`5`
Min RAG score	`minRagScore`	`0.0` (no filtering)
Embedding dimension	`embeddingDimension`	`384`
Embedding model name	`embeddingModelName`	empty → local `AllMiniLmL6V2`

maxRagResults caps how many chunks are retrieved per query; minRagScore (0.0–1.0) drops chunks below a cosine-similarity threshold. AllMiniLmL6V2 typically yields similarities in the 0.3–0.7 range for related content, so start at 0.0 and tune upward if you see irrelevant context.

Ingesting knowledge — the Knowledge Ingestor connector

The CIB seven - Knowledge Ingestor template (connectorId = cibseven-knowledge-ingestor) embeds text from within a process. Per invocation it: splits content into chunks (recursive splitter), attaches source/metadata to each segment, embeds them, and stores them in pgvector — returning the number of chunks via ${chunksIngested}.

Field	Parameter	Default
Content	`content`	— (required)
Source	`source`	— (stored as segment metadata)
Metadata	`metadata`	— (comma-separated `key=value`)
Chunk size	`chunkSize`	`500`
Chunk overlap	`chunkOverlap`	`50`
Embedding model name	`embeddingModelName`	empty → local
API key	`apiKey`	`OPENAI_API_KEY` env
Embedding dimension	`embeddingDimension`	`384`
PostgreSQL host	`pgHost`	— (required)
… pgPort/pgDatabase/pgUser/pgPassword/pgTable		as above

<camunda:connector>
  <camunda:connectorId>cibseven-knowledge-ingestor</camunda:connectorId>
  <camunda:inputOutput>
    <camunda:inputParameter name="content">${documentText}</camunda:inputParameter>
    <camunda:inputParameter name="source">${documentSource}</camunda:inputParameter>
    <camunda:inputParameter name="pgHost">localhost</camunda:inputParameter>
    <camunda:inputParameter name="pgDatabase">postgres</camunda:inputParameter>
    <camunda:inputParameter name="pgUser">my_user</camunda:inputParameter>
    <camunda:inputParameter name="pgPassword">${pgPassword}</camunda:inputParameter>
    <camunda:outputParameter name="ingestedChunks">${chunksIngested}</camunda:outputParameter>
  </camunda:inputOutput>
</camunda:connector>

Use the same pgTable, embeddingModelName, and embeddingDimension here as on the AI Agent task that will query it. See the knowledge-base.bpmn demo in Examples.

Ingesting knowledge — the `KnowledgeIngestor` CLI

For bulk one-off loading of a file (e.g. a PDF) outside a process, the module ships a command-line ingestor runnable via the Maven exec plugin from the connect/ai-agent module:

mvn exec:java -Dexec.args="\
  --file knowledge-base.pdf \
  --pgHost localhost --pgUser postgres --pgPassword secret \
  --pgDatabase postgres --pgTable langchain4j_embeddings \
  --chunkSize 500 --chunkOverlap 50"

--file, --pgHost, --pgUser, --pgPassword are required; the rest default as in Configuration Reference. The CLI uses the local AllMiniLmL6V2 model (384-dim).

Data-residency and performance caveats

Warning — Remote embeddings ignore baseUrl

When you set a remote embeddingModelName, the embedding calls go to the public OpenAI endpoint regardless of the chat baseUrl. For air-gapped or sovereignty-constrained deployments, use the local AllMiniLmL6V2 model (leave embeddingModelName empty) so no document text leaves your network. See Limitations.

No ANN index by default. The embedding store is created without a vector index, so retrieval is a linear scan — fine for small/medium knowledge bases; for very large stores, add an index on the pgvector column out-of-band.