Sources
Connect and manage your data sources for seamless data integration workflows.
Sources are the foundation of your data integration pipelines—they represent the systems, SaaS applications, and APIs where your data originates. Precog provides extensive connectivity options to help you bring data from virtually any system into your workspace for processing and analysis.
Understanding Sources
A source in Precog is any external system or data repository that provides data to your integration pipeline. Sources can range from SaaS platforms to real-time API feeds to application data sources. Each source is configured with connection details, authentication credentials, and data selection criteria to ensure secure and reliable data access.
Source Categories
Application Data Sources
- Business applications - CRM, ERP, and business management systems
- SaaS platform data - Cloud-based application data and exports
- Application APIs - Direct integration with application programming interfaces
- File-based sources - CSV, JSON, XML files from various applications
SaaS Platforms
- CRM systems - Salesforce, HubSpot, Pipedrive, Zoho CRM
- Marketing platforms - Google Analytics, Facebook Ads, Mailchimp, Marketo
- Financial systems - Stripe, PayPal, QuickBooks, Xero, banking APIs
- Support platforms - Zendesk, Intercom, Freshdesk, ServiceNow
APIs
- REST APIs - Custom and third-party HTTP API endpoints
- GraphQL APIs - Modern graph-based API connections
Setting Up Sources
Connection Configuration
SaaS Platform Sources
// Example SaaS connection parameters
Base URL: https://api.salesforce.com/
Client ID: your-client-id
Client Secret: [secure credentials]
Access Token: [OAuth token]
Refresh Token: [refresh token]
API Version: v52.0
API Sources
-- Example API configuration
Base URL: https://api.yourservice.com/v1
Authentication: Bearer Token / API Key / OAuth2
Rate Limits: 1000 requests/hour
Timeout: 30 seconds
Retry Policy: 3 attempts with exponential backoff
Authentication Methods
SaaS Platform Authentication
- OAuth 2.0 - Modern authorization framework with refresh tokens for SaaS platforms
- API Keys - Application-specific keys for service authentication
- Service accounts - Dedicated accounts for automated data access
- SAML/SSO - Enterprise single sign-on integration
API Authentication
- API Keys - Simple token-based authentication
- Bearer tokens - JWT and other bearer token formats
- OAuth 2.0 - Modern authorization framework with refresh tokens
- Basic Auth - Username/password over HTTPS
- Custom headers - Application-specific authentication methods
Data Selection and Filtering
Data Source Selection
- Choose specific data objects from SaaS platforms (e.g., Salesforce objects, Stripe customers)
- Select API endpoints for targeted data extraction
- Configure application-specific access for multi-tenant applications
- Set up data pattern matching for dynamic source discovery
Query-Based Selection
- Use custom SQL queries for complex data selection
- Apply WHERE clauses for date ranges and filtered datasets
- Join multiple tables at the source for efficiency
- Parameterize queries for dynamic data selection
API Endpoint Configuration
- Select specific API endpoints and resources
- Configure query parameters for filtered data requests
- Set up pagination handling for large datasets
- Manage rate limiting and request throttling
Source Management
Connection Testing and Validation
Connectivity Tests
- Network connectivity - Verify that Precog can reach your source system
- Authentication validation - Confirm credentials and permissions are correct
- Data access testing - Verify that selected data can be retrieved successfully
- Performance benchmarking - Test data retrieval speed and reliability
Ongoing Monitoring
- Connection health checks - Regular validation of source connectivity
- Performance monitoring - Track query response times and throughput
- Error rate tracking - Monitor failed connection attempts and data errors
- Credential expiration alerts - Notifications before authentication expires
Security and Compliance
Data Protection
- Encryption in transit - All data transferred using TLS/SSL encryption
- Credential security - Secure storage of authentication information
- Network isolation - VPC and firewall configuration for secure access
- Access logging - Complete audit trail of source system access
Compliance Considerations
- Data residency - Control where data processing occurs geographically
- PII handling - Special handling for personally identifiable information
- Regulatory compliance - GDPR, HIPAA, SOC2 compliance features
- Data retention - Configure appropriate data retention and deletion policies
Performance Optimization
Query Optimization
- Efficient queries - Optimize SQL queries and API calls for performance
- Indexing considerations - Work with database administrators on optimal indexing
- Batch processing - Configure appropriate batch sizes for data retrieval
- Parallel processing - Use multiple connections when supported by source systems
Network and Bandwidth
- Compression - Enable data compression for large datasets
- Connection pooling - Reuse connections for improved efficiency
- Regional proximity - Deploy processing close to source systems when possible
- Bandwidth management - Schedule large data transfers during off-peak hours
Common Source Types and Configuration
Salesforce CRM
# Salesforce source configuration
Type: Salesforce API
Authentication: OAuth 2.0
Objects: [Account, Contact, Opportunity, Lead]
API Version: v58.0
Batch Size: 2000 records
Rate Limiting: Respect Salesforce limits
Incremental: LastModifiedDate field
PostgreSQL Database
# PostgreSQL source configuration
Type: PostgreSQL
Host: db.company.com
Port: 5432
Database: crm_production
Schema: public
Tables: [customers, orders, products]
SSL: required
Connection Pool: 5 connections
Incremental: updated_at timestamp
Google Analytics
# Google Analytics source configuration
Type: Google Analytics 4
Property ID: 123456789
Authentication: Service Account
Metrics: [sessions, pageviews, conversions]
Dimensions: [date, source, medium, campaign]
Date Range: Last 30 days
Sampling: Unsampled (when possible)
Troubleshooting Common Issues
Connection Problems
Network Connectivity
- Verify firewall rules allow connections from Precog IP addresses
- Check that source systems are accessible from external networks
- Confirm DNS resolution for hostnames and endpoints
- Test connectivity using network tools like ping and telnet
Authentication Failures
- Verify credentials are correct and not expired
- Check that user accounts have necessary permissions
- Confirm OAuth tokens are valid and not revoked
- Review API key access levels and restrictions
Permission Issues
- Ensure database users have SELECT permissions on required tables
- Verify API accounts have access to necessary resources
- Review role-based access controls in source systems
Performance Issues
Slow Data Retrieval
- Optimize queries by adding appropriate WHERE clauses
- Consider adding database indexes for frequently queried columns
- Reduce batch sizes if source systems are overwhelmed
- Schedule large data transfers during off-peak hours
Timeout Errors
- Increase connection timeout settings for slow networks
- Break large queries into smaller, more manageable chunks
- Implement retry logic for transient network issues
- Consider using database views to pre-optimize complex queries
Rate Limiting
- Respect API rate limits and implement proper throttling
- Spread requests over time to avoid hitting limits
- Use bulk APIs when available for large data volumes
- Cache frequently accessed but slowly changing data
Data Quality Issues
Missing or Incomplete Data
- Verify that all required tables and fields are accessible
- Check for data type mismatches and conversion issues
- Review NULL value handling and default value assignments
- Confirm date range filters are not excluding expected data
Data Format Problems
- Ensure character encoding matches between source and destination
- Handle special characters and Unicode properly
- Verify number and date format compatibility
Inconsistent Data
- Identify and handle duplicate records appropriately
- Implement data validation rules to catch quality issues early
- Set up alerts for unexpected data patterns or volumes
- Document known data quality issues and their handling
Next Steps
After setting up your sources:
- Configure Destinations - Set up where your data will flow
- Create Schedules - Automate your data processing
- Monitor History - Track source performance and troubleshoot issues
- Manage Settings - Configure workspace-wide source preferences
For specific source type documentation and advanced configuration options, consult the detailed connector guides in your Precog workspace or contact support for assistance with complex source setups.