Connection

Create and manage your data connections for reliable, timely data integration workflows.

Connections are the core of your Precog workspace, controlling how your data flows from sources to destinations. They ensure your data moves at the right time, with the right frequency, and in the right way to meet your business requirements for data freshness and system efficiency.

Understanding Connections

A connection in Precog defines what data flows from your sources to your destinations, and when. Connections can be simple one-time loads, scheduled to run at regular intervals, or triggered by specific events to meet your business requirements for data freshness and system efficiency.

Connection Types

Time-Based Schedules

  • Fixed intervals - Hourly, daily, weekly, monthly execution patterns
  • Cron expressions - Precise timing control using standard cron syntax
  • Business calendars - Skip holidays, weekends, or other business-specific dates
  • Timezone handling - Execute based on business timezone requirements

Event-Driven Schedules

  • Data arrival triggers - Execute when new data appears in source systems
  • API notifications - Respond to external system notifications

Dependency-Based Schedules

  • Sequential execution - Run schedules in specific order with dependencies
  • Parallel processing - Execute multiple independent workflows simultaneously
  • Conditional logic - Run schedules based on data conditions or previous results
  • Fan-out patterns - Trigger multiple downstream schedules from one completion

Manual and On-Demand

  • Manual triggers - Execute schedules immediately through the interface
  • API-triggered - Start schedules programmatically through API calls
  • Testing schedules - Run schedules with sample data for validation
  • Recovery runs - Re-execute failed or missed schedule runs

Creating and Configuring Schedules

Basic Schedule Setup

Simple Time-Based Schedule

# Daily schedule example
Name: 'Daily Customer Sync'
Description: 'Sync customer data from CRM to warehouse'
Trigger: Time-based
Frequency: Daily at 2:00 AM UTC
Timezone: America/New_York
Sources: [Salesforce CRM]
Destinations: [Snowflake DW]
Retry Policy: 3 attempts, 5 minutes apart

Cron Expression Schedule

# Complex cron schedule
Name: "Business Hours Sync"
Description: "Sync every 2 hours during business hours"
Trigger: Cron expression
Cron: "0 */2 8-18 * * MON-FRI"
Timezone: America/Chicago
Description: "Every 2 hours from 8 AM to 6 PM, Monday to Friday"

Event-Driven Schedule

Advanced Scheduling Features

Schedule Dependencies

# Dependency chain example
Schedule 1:
  Name: 'Load Base Data'
  Trigger: Daily at 1:00 AM

Schedule 2:
  Name: 'Calculate Metrics'
  Trigger: Dependency completion
  Depends On: ['Load Base Data']

Schedule 3:
  Name: 'Generate Reports'
  Trigger: Dependency completion
  Depends On: ['Calculate Metrics']

Conditional Execution

# Conditional schedule
Name: 'Weekend Full Refresh'
Description: 'Full data refresh only on weekends'
Trigger: Time-based
Cron: '0 2 * * SAT,SUN'
Conditions:
  - Check: Previous weekday incremental runs completed
  - Action: Run full refresh if conditions met
  - Fallback: Skip and alert if conditions not met

Parallel Processing

# Parallel schedule execution
Name: 'Multi-Source Sync'
Description: 'Sync multiple sources simultaneously'
Execution Mode: Parallel
Sub-Schedules:
  - CRM Data Sync
  - Marketing Data Sync
  - Support Data Sync
Max Parallel: 3
Timeout: 2 hours per sub-schedule

Error Handling and Reliability

Retry Configuration

  • Retry attempts - Number of automatic retry attempts for failed executions
  • Retry intervals - Time between retry attempts (fixed, exponential backoff)
  • Retry conditions - Which types of errors should trigger retries
  • Escalation - Actions to take when all retry attempts are exhausted

Timeout and Resource Management

  • Execution timeout - Maximum time allowed for schedule completion
  • Resource limits - CPU, memory, and connection limits for schedule execution
  • Concurrency control - Prevent overlapping executions of the same schedule
  • Queue management - Handle multiple scheduled executions efficiently

Notification and Alerting

  • Success notifications - Confirm successful schedule completion
  • Failure alerts - Immediate notification of schedule failures
  • Performance warnings - Alert when execution times increase significantly
  • Data quality alerts - Notifications for data validation failures

Schedule Management

Monitoring and Performance

Execution Monitoring

  • Real-time status - View currently running schedules and their progress
  • Execution history - Complete log of all schedule runs with timing and results
  • Performance metrics - Track execution times, success rates, and resource usage
  • Capacity planning - Monitor system load and plan for schedule expansion

Performance Optimization

  • Timing optimization - Adjust schedule timing to balance freshness with system load
  • Resource allocation - Configure appropriate CPU, memory, and connection resources
  • Batch sizing - Optimize data processing batch sizes for efficiency
  • Parallel execution - Use parallelism to reduce overall processing time

Trend Analysis

  • Execution time trends - Track how schedule performance changes over time
  • Data volume trends - Monitor growth in data processing requirements
  • Error rate analysis - Identify patterns in schedule failures and issues
  • Resource utilization - Understand how schedules use system resources

Maintenance and Updates

Schedule Modifications

  • Timing changes - Update schedule frequency and timing as requirements change
  • Source and destination updates - Modify data flows without breaking dependencies
  • Configuration versioning - Track changes to schedule configuration over time
  • Testing changes - Validate schedule modifications in non-production environments

Capacity Management

  • Load balancing - Distribute schedule execution across available resources
  • Peak time management - Handle high-volume periods with appropriate scheduling
  • Resource scaling - Adjust system resources based on schedule requirements
  • Maintenance windows - Plan schedule downtime for system maintenance

Compliance and Auditing

  • Change tracking - Log all modifications to schedule configuration
  • Access control - Control who can modify or execute schedules
  • Audit trails - Maintain complete records of schedule executions and results
  • Compliance reporting - Generate reports for regulatory and business requirements

Common Scheduling Patterns

Business Intelligence Refresh

# Typical BI refresh pattern
Morning Data Preparation:
  - Time: 1:00 AM - 3:00 AM
  - Load overnight batch files
  - Sync changed records from operational systems
  - Run data quality validations

Business Hours Processing:
  - Time: 6:00 AM - 8:00 PM
  - Incremental updates every 15-30 minutes
  - Real-time processing for critical data
  - User-triggered ad-hoc refreshes

Evening Batch Processing:
  - Time: 8:00 PM - 12:00 AM
  - Full data reconciliation
  - Generate daily reports
  - Archive processed data

Multi-Region Data Sync

# Global data synchronization
Region 1 (US East):
  - Local time: 2:00 AM EST
  - Process North American data
  - Replicate to global warehouse

Region 2 (EU):
  - Local time: 2:00 AM CET
  - Process European data
  - Sync with US processed data

Region 3 (Asia Pacific):
  - Local time: 2:00 AM JST
  - Process APAC data
  - Consolidate global view

Global Aggregation:
  - UTC time: 10:00 AM
  - Combine all regional data
  - Generate global reports and dashboards

Real-Time Processing Pipeline

# Streaming data processing
Continuous Ingestion:
  - Trigger: Event-driven (API notification)
  - Frequency: Real-time processing
  - Batch: Micro-batches every 30 seconds
  - Latency Target: < 1 minute end-to-end

Hourly Aggregation:
  - Trigger: Time-based (hourly)
  - Process: Aggregate micro-batch results
  - Output: Hourly summaries and metrics

Daily Reconciliation:
  - Trigger: Daily at 12:01 AM
  - Process: Compare real-time vs batch processing
  - Output: Data quality reports and corrections

Troubleshooting Common Issues

Schedule Execution Problems

Schedule Not Running

  • Verify schedule is enabled and not in maintenance mode
  • Check that source systems are accessible and functioning
  • Confirm authentication credentials are valid and not expired
  • Review resource availability and system capacity

Missed Executions

  • Check system load during scheduled execution times
  • Verify timezone configuration and daylight saving time handling
  • Review concurrent execution limits and queue management
  • Analyze system maintenance windows that might affect scheduling

Partial Failures

  • Monitor individual components of complex schedules
  • Check dependencies and ensure prerequisite schedules completed successfully
  • Verify data availability and quality in source systems
  • Review timeout settings and execution time limits

Performance Issues

Slow Schedule Execution

  • Analyze data volumes and processing complexity
  • Review network connectivity and bandwidth between systems
  • Optimize query performance and data transfer efficiency
  • Consider breaking large schedules into smaller, parallel components

Resource Contention

  • Monitor system resource usage during peak schedule times
  • Distribute schedule execution across time periods
  • Implement resource limits and priority scheduling
  • Consider upgrading system capacity for critical schedules

Cascading Delays

  • Review schedule dependencies and identify bottlenecks
  • Implement timeout and failure handling to prevent cascade failures
  • Consider parallel execution where dependencies allow
  • Plan buffer time between dependent schedule executions

Data Quality Issues

Incomplete Data Processing

  • Verify source data availability at schedule execution time
  • Check for data validation failures that halt processing
  • Review batch sizing and timeout configurations
  • Implement data completeness checks and alerting

Data Consistency Problems

  • Ensure proper transaction handling in multi-step processes
  • Verify that dependencies execute in the correct order
  • Check for race conditions in parallel processing scenarios
  • Implement data reconciliation and validation steps

Timing-Related Issues

  • Coordinate schedule timing with source system data refresh cycles
  • Account for time zone differences in global data processing
  • Handle daylight saving time transitions appropriately
  • Plan for month-end, quarter-end, and year-end date handling

Next Steps

After setting up your schedules:

  1. Monitor History - Track schedule performance and execution results
  2. Configure Sources - Optimize sources for scheduled processing
  3. Set Up Destinations - Ensure destinations can handle scheduled data delivery
  4. Manage Settings - Configure workspace-wide scheduling preferences

For advanced scheduling scenarios and troubleshooting specific schedule issues, consult your Precog workspace documentation or contact support for assistance with complex scheduling requirements.