Connection
Create and manage your data connections for reliable, timely data integration workflows.
Connections are the core of your Precog workspace, controlling how your data flows from sources to destinations. They ensure your data moves at the right time, with the right frequency, and in the right way to meet your business requirements for data freshness and system efficiency.
Understanding Connections
A connection in Precog defines what data flows from your sources to your destinations, and when. Connections can be simple one-time loads, scheduled to run at regular intervals, or triggered by specific events to meet your business requirements for data freshness and system efficiency.
Connection Types
Time-Based Schedules
- Fixed intervals - Hourly, daily, weekly, monthly execution patterns
- Cron expressions - Precise timing control using standard cron syntax
- Business calendars - Skip holidays, weekends, or other business-specific dates
- Timezone handling - Execute based on business timezone requirements
Event-Driven Schedules
- Data arrival triggers - Execute when new data appears in source systems
- API notifications - Respond to external system notifications
Dependency-Based Schedules
- Sequential execution - Run schedules in specific order with dependencies
- Parallel processing - Execute multiple independent workflows simultaneously
- Conditional logic - Run schedules based on data conditions or previous results
- Fan-out patterns - Trigger multiple downstream schedules from one completion
Manual and On-Demand
- Manual triggers - Execute schedules immediately through the interface
- API-triggered - Start schedules programmatically through API calls
- Testing schedules - Run schedules with sample data for validation
- Recovery runs - Re-execute failed or missed schedule runs
Creating and Configuring Schedules
Basic Schedule Setup
Simple Time-Based Schedule
# Daily schedule example
Name: 'Daily Customer Sync'
Description: 'Sync customer data from CRM to warehouse'
Trigger: Time-based
Frequency: Daily at 2:00 AM UTC
Timezone: America/New_York
Sources: [Salesforce CRM]
Destinations: [Snowflake DW]
Retry Policy: 3 attempts, 5 minutes apart
Cron Expression Schedule
# Complex cron schedule
Name: "Business Hours Sync"
Description: "Sync every 2 hours during business hours"
Trigger: Cron expression
Cron: "0 */2 8-18 * * MON-FRI"
Timezone: America/Chicago
Description: "Every 2 hours from 8 AM to 6 PM, Monday to Friday"
Event-Driven Schedule
Advanced Scheduling Features
Schedule Dependencies
# Dependency chain example
Schedule 1:
Name: 'Load Base Data'
Trigger: Daily at 1:00 AM
Schedule 2:
Name: 'Calculate Metrics'
Trigger: Dependency completion
Depends On: ['Load Base Data']
Schedule 3:
Name: 'Generate Reports'
Trigger: Dependency completion
Depends On: ['Calculate Metrics']
Conditional Execution
# Conditional schedule
Name: 'Weekend Full Refresh'
Description: 'Full data refresh only on weekends'
Trigger: Time-based
Cron: '0 2 * * SAT,SUN'
Conditions:
- Check: Previous weekday incremental runs completed
- Action: Run full refresh if conditions met
- Fallback: Skip and alert if conditions not met
Parallel Processing
# Parallel schedule execution
Name: 'Multi-Source Sync'
Description: 'Sync multiple sources simultaneously'
Execution Mode: Parallel
Sub-Schedules:
- CRM Data Sync
- Marketing Data Sync
- Support Data Sync
Max Parallel: 3
Timeout: 2 hours per sub-schedule
Error Handling and Reliability
Retry Configuration
- Retry attempts - Number of automatic retry attempts for failed executions
- Retry intervals - Time between retry attempts (fixed, exponential backoff)
- Retry conditions - Which types of errors should trigger retries
- Escalation - Actions to take when all retry attempts are exhausted
Timeout and Resource Management
- Execution timeout - Maximum time allowed for schedule completion
- Resource limits - CPU, memory, and connection limits for schedule execution
- Concurrency control - Prevent overlapping executions of the same schedule
- Queue management - Handle multiple scheduled executions efficiently
Notification and Alerting
- Success notifications - Confirm successful schedule completion
- Failure alerts - Immediate notification of schedule failures
- Performance warnings - Alert when execution times increase significantly
- Data quality alerts - Notifications for data validation failures
Schedule Management
Monitoring and Performance
Execution Monitoring
- Real-time status - View currently running schedules and their progress
- Execution history - Complete log of all schedule runs with timing and results
- Performance metrics - Track execution times, success rates, and resource usage
- Capacity planning - Monitor system load and plan for schedule expansion
Performance Optimization
- Timing optimization - Adjust schedule timing to balance freshness with system load
- Resource allocation - Configure appropriate CPU, memory, and connection resources
- Batch sizing - Optimize data processing batch sizes for efficiency
- Parallel execution - Use parallelism to reduce overall processing time
Trend Analysis
- Execution time trends - Track how schedule performance changes over time
- Data volume trends - Monitor growth in data processing requirements
- Error rate analysis - Identify patterns in schedule failures and issues
- Resource utilization - Understand how schedules use system resources
Maintenance and Updates
Schedule Modifications
- Timing changes - Update schedule frequency and timing as requirements change
- Source and destination updates - Modify data flows without breaking dependencies
- Configuration versioning - Track changes to schedule configuration over time
- Testing changes - Validate schedule modifications in non-production environments
Capacity Management
- Load balancing - Distribute schedule execution across available resources
- Peak time management - Handle high-volume periods with appropriate scheduling
- Resource scaling - Adjust system resources based on schedule requirements
- Maintenance windows - Plan schedule downtime for system maintenance
Compliance and Auditing
- Change tracking - Log all modifications to schedule configuration
- Access control - Control who can modify or execute schedules
- Audit trails - Maintain complete records of schedule executions and results
- Compliance reporting - Generate reports for regulatory and business requirements
Common Scheduling Patterns
Business Intelligence Refresh
# Typical BI refresh pattern
Morning Data Preparation:
- Time: 1:00 AM - 3:00 AM
- Load overnight batch files
- Sync changed records from operational systems
- Run data quality validations
Business Hours Processing:
- Time: 6:00 AM - 8:00 PM
- Incremental updates every 15-30 minutes
- Real-time processing for critical data
- User-triggered ad-hoc refreshes
Evening Batch Processing:
- Time: 8:00 PM - 12:00 AM
- Full data reconciliation
- Generate daily reports
- Archive processed data
Multi-Region Data Sync
# Global data synchronization
Region 1 (US East):
- Local time: 2:00 AM EST
- Process North American data
- Replicate to global warehouse
Region 2 (EU):
- Local time: 2:00 AM CET
- Process European data
- Sync with US processed data
Region 3 (Asia Pacific):
- Local time: 2:00 AM JST
- Process APAC data
- Consolidate global view
Global Aggregation:
- UTC time: 10:00 AM
- Combine all regional data
- Generate global reports and dashboards
Real-Time Processing Pipeline
# Streaming data processing
Continuous Ingestion:
- Trigger: Event-driven (API notification)
- Frequency: Real-time processing
- Batch: Micro-batches every 30 seconds
- Latency Target: < 1 minute end-to-end
Hourly Aggregation:
- Trigger: Time-based (hourly)
- Process: Aggregate micro-batch results
- Output: Hourly summaries and metrics
Daily Reconciliation:
- Trigger: Daily at 12:01 AM
- Process: Compare real-time vs batch processing
- Output: Data quality reports and corrections
Troubleshooting Common Issues
Schedule Execution Problems
Schedule Not Running
- Verify schedule is enabled and not in maintenance mode
- Check that source systems are accessible and functioning
- Confirm authentication credentials are valid and not expired
- Review resource availability and system capacity
Missed Executions
- Check system load during scheduled execution times
- Verify timezone configuration and daylight saving time handling
- Review concurrent execution limits and queue management
- Analyze system maintenance windows that might affect scheduling
Partial Failures
- Monitor individual components of complex schedules
- Check dependencies and ensure prerequisite schedules completed successfully
- Verify data availability and quality in source systems
- Review timeout settings and execution time limits
Performance Issues
Slow Schedule Execution
- Analyze data volumes and processing complexity
- Review network connectivity and bandwidth between systems
- Optimize query performance and data transfer efficiency
- Consider breaking large schedules into smaller, parallel components
Resource Contention
- Monitor system resource usage during peak schedule times
- Distribute schedule execution across time periods
- Implement resource limits and priority scheduling
- Consider upgrading system capacity for critical schedules
Cascading Delays
- Review schedule dependencies and identify bottlenecks
- Implement timeout and failure handling to prevent cascade failures
- Consider parallel execution where dependencies allow
- Plan buffer time between dependent schedule executions
Data Quality Issues
Incomplete Data Processing
- Verify source data availability at schedule execution time
- Check for data validation failures that halt processing
- Review batch sizing and timeout configurations
- Implement data completeness checks and alerting
Data Consistency Problems
- Ensure proper transaction handling in multi-step processes
- Verify that dependencies execute in the correct order
- Check for race conditions in parallel processing scenarios
- Implement data reconciliation and validation steps
Timing-Related Issues
- Coordinate schedule timing with source system data refresh cycles
- Account for time zone differences in global data processing
- Handle daylight saving time transitions appropriately
- Plan for month-end, quarter-end, and year-end date handling
Next Steps
After setting up your schedules:
- Monitor History - Track schedule performance and execution results
- Configure Sources - Optimize sources for scheduled processing
- Set Up Destinations - Ensure destinations can handle scheduled data delivery
- Manage Settings - Configure workspace-wide scheduling preferences
For advanced scheduling scenarios and troubleshooting specific schedule issues, consult your Precog workspace documentation or contact support for assistance with complex scheduling requirements.