Polling Client - Data Copy and Import
Polling Client - Data Copy and Import
The Polling Client is a core component of CIB ins7ght responsible for continuously synchronizing data from CIB seven to the CIB ins7ght database. This section explains the architecture, data flow, and mechanisms behind this critical component.
Architecture Overview
The Polling Client operates as a scheduled background service within the CIB ins7ght application. It connects to the CIB seven database through REST APIs, retrieves process data, and stores it in the CIB ins7ght database for analysis.
Key Components:
- Polling Scheduler: Orchestrates the polling cycles and manages timing
- Data Provider: Interfaces with CIB seven REST API to fetch data
- Data Transformer: Converts CIB seven data format to CIB ins7ght schema
- Data Persister: Stores transformed data into the CIB ins7ght database
Data Flow
CIB seven Database (Read-Only)
↓
REST API Layer
↓
Polling Scheduler
↓
Data Providers
↓
Data Transformers
↓
CIB ins7ght Database
↓
Analytics Engine
Polling Mechanism
Polling Cycle
The Polling Client operates in continuous cycles:
- Trigger: The scheduler initiates a polling cycle based on the configured interval
- Fetch: Each data provider retrieves new or updated records from CIB seven
- Transform: Data is converted to the CIB ins7ght schema
- Persist: Transformed data is saved to the CIB ins7ght database
- Wait: The system waits for the next polling interval
Default Polling Interval: 10 seconds (configurable)
Incremental Synchronization
After the initial full import, the Polling Client uses incremental synchronization:
- Tracks the last successfully imported timestamp for each data type
- Only fetches records created or modified after the last sync
- Minimizes network traffic and processing overhead
- Ensures efficient real-time data updates
Data Sources
The Polling Client retrieves data from CIB seven via REST API:
1. Process Instances
Purpose: Historical process instance information including execution status and timing
Key Data Elements:
- Process instance identifiers
- Process definition references
- Start and end timestamps
- Duration metrics
- Current state (ACTIVE, COMPLETED, CANCELED)
Synchronization: Incremental updates based on timestamp, fetching only new or modified instances since last poll.
2. Activity Instances
Purpose: Individual activity execution data for detailed performance analysis
Key Data Elements:
- Activity instance identifiers
- Parent process instance reference
- Activity ID and name from process definition
- Start and end timestamps
- Duration metrics
Synchronization: Continuous sync of activity-level execution data to enable bottleneck analysis.
3. Process Definitions
Purpose: Process definition metadata including versioning information
Key Data Elements:
- Process definition identifiers
- Process keys and names
- Version numbers
- Deployment references
Synchronization: Triggered when new process versions are deployed to CIB seven.
4. Incidents
Purpose: Process execution incidents and error tracking
Key Data Elements:
- Incident identifiers
- Process instance references
- Incident types and categories
- Error messages and details
- Timestamp of occurrence
Synchronization: Real-time import of incidents to ensure immediate visibility into process issues.
5. BPMN Process Definitions
Purpose: Serialized BPMN diagrams for process visualization
Key Data Elements:
- BPMN XML content
- Resource names
- Deployment information
Synchronization: Fetched when new process definitions are detected or on-demand for visualization purposes.
Data Providers
Each data type has a dedicated provider component that specializes in fetching and processing specific types of data:
Common Provider Functionality:
- Connection management to CIB seven
- Error handling and retry logic
- Timestamp tracking for incremental sync
- Batch processing capabilities
Provider Types:
The system includes specialized providers for each data source:
- Process Instance Provider: Fetches process instance data
- Activity Instance Provider: Retrieves activity execution data
- Process Definition Provider: Imports process definitions
- Incident Provider: Collects incident information
- BPMN Definition Provider: Downloads BPMN process definitions
Data Transformation
Raw data from CIB seven is transformed to match the CIB ins7ght schema:
Transformation Steps:
- Field Mapping: CIB seven column names are mapped to CIB ins7ght equivalents
- Data Type Conversion: Date formats, numeric types are standardized
- Calculated Fields: Additional metrics are computed (e.g., normalized durations)
- Relationship Mapping: Foreign key relationships are established
- Validation: Data integrity checks are performed
Initial Import
When CIB ins7ght is first connected to a CIB seven instance:
- Full Historical Import: All existing data is retrieved from CIB seven
- Baseline Establishment: The system establishes a complete historical dataset
- Index Creation: Database indexes are created for optimal query performance
- Statistics Calculation: Initial KPIs and metrics are computed
Note: Initial import duration depends on data volume. For large datasets (millions of records), this process may take several hours.
Error Handling and Resilience
The Polling Client includes error handling:
Failure Recovery
- Transaction Management: Database operations use transactions
- State Persistence: Last successful sync timestamp is tracked in the database
- Resume from Last Success: Polling resumes from the last known good state after failures
- Error Logging: Errors are logged for troubleshooting
Monitoring
- Logs: Detailed logging for troubleshooting and monitoring polling activity
Performance Optimization
Batch Processing
- Data is fetched in configurable page sizes
- Default page sizes:
- Deployments: 500
- Activity Instances: 100,000
- Process Instances: 500
- Incidents: 500
- Reduces memory footprint
- Optimizes network utilization
Configuration
Key configuration properties for the Polling Client:
polling:
time: 10 # Polling interval in seconds
engineRestUrl: http://localhost:8080/engine-rest
maxPageSizeDeployments: 500 # Max deployments per request
maxPageSizeActivityInstances: 100000 # Max activity instances per request
maxPageSizeProcessInstances: 500 # Max process instances per request
maxPageSizeIncidents: 500 # Max incidents per request
batchIntervalInHours: 24 # Batch processing interval
auth:
type: basic # Authentication type: basic, webclient, or sso
basic:
username: demo
password: demo
See Configuration Guide for detailed authentication options.
Monitoring and Troubleshooting
Check Polling Status
Polling Info Endpoint:
GET /pollinfo
This endpoint returns the current polling status and timestamps.
Response Example:
[
{
"id": "uuid",
"fetchFrom": "2025-10-29T10:00:00Z",
"fetchedUntil": "2025-10-29T10:30:00Z",
"isPollingSince": "2025-10-29T10:30:15Z"
}
]
Fields:
fetchFrom: Last timestamp from which data was fetchedfetchedUntil: Last timestamp until which data was fetchedisPollingSince: Timestamp when current polling cycle started
Application Logs:
Monitor polling activity through application logs for detailed information:
- Polling start and completion times
- Number of records fetched per data type
- Connection status to CIB seven
- Any errors or warnings
Common Issues
Polling Not Starting:
- Check CIB seven connection settings (
polling.engineRestUrl) - Verify authentication configuration (
polling.auth) - Review application logs for errors
- Ensure CIB seven REST API is accessible
Slow Import Performance:
- Increase page size parameters for larger datasets
- Check network latency to CIB seven
- Review database performance and indexing
- Verify CIB seven server performance
Data Inconsistencies:
- Verify CIB seven version compatibility
- Check for schema changes in CIB seven
- Review transformation logic for errors
Security Considerations
- Polling Client accesses CIB seven via REST API
- Credentials should be stored as environment variables
- HTTPS/TLS can be configured for encrypted connections to CIB seven
- Application logs record polling activity