Retrieval-Augmented Generation (RAG)

RAG lets the agent ground its answers in your documents instead of relying only on the model’s training data. You ingest text into a pgvector store once (with the Knowledge Ingestor), then any AI Agent task pointed at the same store retrieves the most relevant chunks and feeds them to the model as context.

When to use it

Use RAG when answers must reflect facts the model can’t know: internal policies, product manuals, contracts, knowledge-base articles, or terms specific to your organisation. The bundled system prompt tells the agent to treat retrieved snippets as the primary source of factual claims.

PostgreSQL + pgvector setup

RAG stores embeddings in PostgreSQL using the pgvector extension. One-time setup on your database:

CREATE EXTENSION IF NOT EXISTS vector;

You do not need to create the table by hand — the Knowledge Ingestor creates it if it does not exist (createTable = true), using the configured table name (default langchain4j_embeddings).

Embedding models

The embedding model turns text into vectors. Two options:

Local (default) Remote
Model AllMiniLmL6V2 (ONNX, bundled) e.g. OpenAI text-embedding-ada-002
Dimension 384 e.g. 1536
External call None Yes (calls the provider)
How to select leave embeddingModelName empty set embeddingModelName (+ apiKey)

Important

The embeddingDimension must match the model that produced the vectors. If you ingest with one model/dimension and query with another, retrieval breaks. Keep ingestion and the AI Agent task on the same embeddingModelName and embeddingDimension.

Agent-side RAG configuration

On the AI Agent task, RAG activates as soon as you set PostgreSQL host (pgHost). The full RAG group:

Field Parameter Default
PostgreSQL host pgHost — (setting it activates RAG)
PostgreSQL port pgPort 5432
PostgreSQL database pgDatabase
PostgreSQL user pgUser
PostgreSQL password pgPassword
PostgreSQL table pgTable langchain4j_embeddings
Max RAG results maxRagResults 5
Min RAG score minRagScore 0.0 (no filtering)
Embedding dimension embeddingDimension 384
Embedding model name embeddingModelName empty → local AllMiniLmL6V2

maxRagResults caps how many chunks are retrieved per query; minRagScore (0.0–1.0) drops chunks below a cosine-similarity threshold. AllMiniLmL6V2 typically yields similarities in the 0.3–0.7 range for related content, so start at 0.0 and tune upward if you see irrelevant context.

Ingesting knowledge — the Knowledge Ingestor connector

The CIB seven - Knowledge Ingestor template (connectorId = cibseven-knowledge-ingestor) embeds text from within a process. Per invocation it: splits content into chunks (recursive splitter), attaches source/metadata to each segment, embeds them, and stores them in pgvector — returning the number of chunks via ${chunksIngested}.

Field Parameter Default
Content content — (required)
Source source — (stored as segment metadata)
Metadata metadata — (comma-separated key=value)
Chunk size chunkSize 500
Chunk overlap chunkOverlap 50
Embedding model name embeddingModelName empty → local
API key apiKey OPENAI_API_KEY env
Embedding dimension embeddingDimension 384
PostgreSQL host pgHost — (required)
… pgPort/pgDatabase/pgUser/pgPassword/pgTable as above
<camunda:connector>
  <camunda:connectorId>cibseven-knowledge-ingestor</camunda:connectorId>
  <camunda:inputOutput>
    <camunda:inputParameter name="content">${documentText}</camunda:inputParameter>
    <camunda:inputParameter name="source">${documentSource}</camunda:inputParameter>
    <camunda:inputParameter name="pgHost">localhost</camunda:inputParameter>
    <camunda:inputParameter name="pgDatabase">postgres</camunda:inputParameter>
    <camunda:inputParameter name="pgUser">my_user</camunda:inputParameter>
    <camunda:inputParameter name="pgPassword">${pgPassword}</camunda:inputParameter>
    <camunda:outputParameter name="ingestedChunks">${chunksIngested}</camunda:outputParameter>
  </camunda:inputOutput>
</camunda:connector>

Use the same pgTable, embeddingModelName, and embeddingDimension here as on the AI Agent task that will query it. See the knowledge-base.bpmn demo in Examples.

Ingesting knowledge — the KnowledgeIngestor CLI

For bulk one-off loading of a file (e.g. a PDF) outside a process, the module ships a command-line ingestor runnable via the Maven exec plugin from the connect/ai-agent module:

mvn exec:java -Dexec.args="\
  --file knowledge-base.pdf \
  --pgHost localhost --pgUser postgres --pgPassword secret \
  --pgDatabase postgres --pgTable langchain4j_embeddings \
  --chunkSize 500 --chunkOverlap 50"

--file, --pgHost, --pgUser, --pgPassword are required; the rest default as in Configuration Reference. The CLI uses the local AllMiniLmL6V2 model (384-dim).

Data-residency and performance caveats

Warning — Remote embeddings ignore baseUrl

When you set a remote embeddingModelName, the embedding calls go to the public OpenAI endpoint regardless of the chat baseUrl. For air-gapped or sovereignty-constrained deployments, use the local AllMiniLmL6V2 model (leave embeddingModelName empty) so no document text leaves your network. See Limitations.

  • No ANN index by default. The embedding store is created without a vector index, so retrieval is a linear scan — fine for small/medium knowledge bases; for very large stores, add an index on the pgvector column out-of-band.

On this Page: