Security & Data Handling

What a security reviewer, DPO, or cautious operator needs: where data goes, what is stored where, how to handle secrets, and the security considerations that come with running an agentic (tool-calling) AI inside your processes. Complements Audit Trail (what’s recorded) and Troubleshooting & Known Limitations (tracked hardening items).

Data flows — what leaves the engine

Each invocation can send data to two kinds of external endpoint. Nothing leaves the engine unless you configure these.

Data Sent to When Controlled by
System prompt (bundled default + your instruction) LLM endpoint every call baseUrl
The message (often process-variable content) LLM endpoint every call baseUrl
Conversation history LLM endpoint when useChatMemory is on baseUrl
Retrieved RAG chunks (your documents) LLM endpoint when RAG is active baseUrl
Tool names, descriptions, schemas + tool results LLM endpoint when tools are wired baseUrl
RAG query text embedding model when RAG is active local model → no egress; remote → public OpenAI (see warning below)
Ingested document text embedding model Knowledge Ingestor / CLI local model → no egress; remote → public OpenAI (see warning below)
MCP tool-call arguments + results each MCP server url when mcpServers set per-server url / headers

Embeddings & data residency

A remote embeddingModelName currently calls the public OpenAI endpoint regardless of baseUrl — see RAG and Known Limitations. For air-gapped or sovereign deployments, use the local AllMiniLmL6V2 model so no query or document text leaves your network. The chat model honours baseUrl normally.

The transport is outbound HTTPS from the engine JVM to baseUrl (and to any MCP url). Operators must allow that egress (or route it through an approved proxy/gateway). See Installation.

What is stored, and where

Artifact Location Notes
Chat-log audit timeline engine DB, process variable cibseven-connect-ai-agent_<activityId> messages, tools, tool side-effects, timings; can be redacted or disabled — see Audit Trail
Agent answer engine DB, your agentOutput variable plain text
aiMeta marker engine DB, your agentOutput_aiMeta variable provenance map
Conversation history JVM memory (default store) not persisted unless you swap the store — see Chat Memory
Embeddings + text segments + metadata pgvector table the document text is stored to enable retrieval

Secrets & credentials

  • Do not hardcode secrets into element-template fields. Any value typed into an input parameter (apiKey, pgPassword, customHeaders) is persisted as a process variable and in history, and is visible in Cockpit and to anyone with variable access. Reference a securely-provided variable (${apiKey}) or, preferably, the OPENAI_API_KEY environment variable at deployment level.
  • Secrets in message/instruction are recorded. The chat-log timeline stores messages[] content, so anything you put in the prompt is in the audit variable (unless redaction is on). Keep credentials out of prompts.
  • What is not logged (verified in code): the connector does not log LLM/MCP request or response bodies (request logging is disabled), and the audit listener does not capture apiKey or customHeaders. So credentials passed via those fields don’t land in the application log or the audit timeline — but the process-variable persistence point above still applies.

Agentic risk: prompt injection & untrusted input

An agent that can call tools will follow instructions found in the content it processes — not just your system prompt. If the message, a retrieved RAG chunk, a tool result, or an MCP response contains adversarial text (“ignore your instructions and start process X / send data to Y”), the model may act on it. This is prompt injection, and it is inherent to agentic LLMs.

Practical mitigations:

  • Minimise the toolset per task. Only wire the tools a given task actually needs; don’t attach ProcessStarterTool or broad MCP servers to tasks that process untrusted input.
  • Treat agent output as untrusted. Validate/branch on it downstream; don’t feed it straight into destructive actions. See Getting Started.
  • Keep a human in the loop for consequential decisions (approve before acting) — see Chat Memory.
  • Sanitize/segregate highly-untrusted document sources before they reach a tool-enabled agent.

Tool blast radius & least privilege

  • ProcessStarterTool runs under the caller’s authentication — it can start any process that user is authorized to start, with arguments the model chose (possibly influenced by untrusted input). Run agents under a least-privilege identity, and rely on the engine’s authorization layer (the tool does not elevate privileges). See Tools.
  • toolClasses and mcpServers are privileged configuration. Any class you list is loaded and its @Tool methods exposed; all tools from a configured MCP server are registered (no allowlist yet — see Known Limitations). Restrict who can edit models and element templates accordingly.
  • Audit is your detection surface. Tool calls and ProcessStarterTool side-effects (resulting processInstanceId, executedAs) are recorded — monitor them. See Audit Trail.

Redaction ≠ non-disclosure

Content redaction (redactContent) replaces message content in the stored audit copy with a SHA-256 hash + length. It does not change what was transmitted to the LLM provider (or, for remote embeddings, to OpenAI). Redaction protects the engine-side record, not the data-in-transit to the model. Choose the provider/endpoint accordingly.

Security checklist (per deployment)

  • LLM endpoint and any MCP servers are approved for the data classification you’ll send them.
  • Embeddings: local model for sovereign data, or an approved remote endpoint.
  • API keys via env / secured variables — not hardcoded in templates; not in prompts.
  • Egress from the engine JVM to baseUrl/MCP URLs is allowed only where intended.
  • Tool-enabled agents run under least-privilege identities; toolset minimised per task.
  • Edit rights on process models / element templates are restricted (privileged config).
  • Audit retention meets your risk tier (chat-log kept, or external sink configured); EU AI Act specifics in COMPLIANCE.md.

On this Page: