Sources

Connect and manage your data sources for seamless data integration workflows.

Sources are the foundation of your data integration pipelines—they represent the systems, SaaS applications, and APIs where your data originates. Precog provides extensive connectivity options to help you bring data from virtually any system into your workspace for processing and analysis.

Understanding Sources

A source in Precog is any external system or data repository that provides data to your integration pipeline. Sources can range from SaaS platforms to real-time API feeds to application data sources. Each source is configured with connection details, authentication credentials, and data selection criteria to ensure secure and reliable data access.

Source Categories

Application Data Sources

  • Business applications - CRM, ERP, and business management systems
  • SaaS platform data - Cloud-based application data and exports
  • Application APIs - Direct integration with application programming interfaces
  • File-based sources - CSV, JSON, XML files from various applications

SaaS Platforms

  • CRM systems - Salesforce, HubSpot, Pipedrive, Zoho CRM
  • Marketing platforms - Google Analytics, Facebook Ads, Mailchimp, Marketo
  • Financial systems - Stripe, PayPal, QuickBooks, Xero, banking APIs
  • Support platforms - Zendesk, Intercom, Freshdesk, ServiceNow

APIs

  • REST APIs - Custom and third-party HTTP API endpoints
  • GraphQL APIs - Modern graph-based API connections

Setting Up Sources

Connection Configuration

SaaS Platform Sources

// Example SaaS connection parameters
Base URL: https://api.salesforce.com/
Client ID: your-client-id
Client Secret: [secure credentials]
Access Token: [OAuth token]
Refresh Token: [refresh token]
API Version: v52.0

API Sources

-- Example API configuration
Base URL: https://api.yourservice.com/v1
Authentication: Bearer Token / API Key / OAuth2
Rate Limits: 1000 requests/hour
Timeout: 30 seconds
Retry Policy: 3 attempts with exponential backoff

Authentication Methods

SaaS Platform Authentication

  • OAuth 2.0 - Modern authorization framework with refresh tokens for SaaS platforms
  • API Keys - Application-specific keys for service authentication
  • Service accounts - Dedicated accounts for automated data access
  • SAML/SSO - Enterprise single sign-on integration

API Authentication

  • API Keys - Simple token-based authentication
  • Bearer tokens - JWT and other bearer token formats
  • OAuth 2.0 - Modern authorization framework with refresh tokens
  • Basic Auth - Username/password over HTTPS
  • Custom headers - Application-specific authentication methods

Data Selection and Filtering

Data Source Selection

  • Choose specific data objects from SaaS platforms (e.g., Salesforce objects, Stripe customers)
  • Select API endpoints for targeted data extraction
  • Configure application-specific access for multi-tenant applications
  • Set up data pattern matching for dynamic source discovery

Query-Based Selection

  • Use custom SQL queries for complex data selection
  • Apply WHERE clauses for date ranges and filtered datasets
  • Join multiple tables at the source for efficiency
  • Parameterize queries for dynamic data selection

API Endpoint Configuration

  • Select specific API endpoints and resources
  • Configure query parameters for filtered data requests
  • Set up pagination handling for large datasets
  • Manage rate limiting and request throttling

Source Management

Connection Testing and Validation

Connectivity Tests

  • Network connectivity - Verify that Precog can reach your source system
  • Authentication validation - Confirm credentials and permissions are correct
  • Data access testing - Verify that selected data can be retrieved successfully
  • Performance benchmarking - Test data retrieval speed and reliability

Ongoing Monitoring

  • Connection health checks - Regular validation of source connectivity
  • Performance monitoring - Track query response times and throughput
  • Error rate tracking - Monitor failed connection attempts and data errors
  • Credential expiration alerts - Notifications before authentication expires

Security and Compliance

Data Protection

  • Encryption in transit - All data transferred using TLS/SSL encryption
  • Credential security - Secure storage of authentication information
  • Network isolation - VPC and firewall configuration for secure access
  • Access logging - Complete audit trail of source system access

Compliance Considerations

  • Data residency - Control where data processing occurs geographically
  • PII handling - Special handling for personally identifiable information
  • Regulatory compliance - GDPR, HIPAA, SOC2 compliance features
  • Data retention - Configure appropriate data retention and deletion policies

Performance Optimization

Query Optimization

  • Efficient queries - Optimize SQL queries and API calls for performance
  • Indexing considerations - Work with database administrators on optimal indexing
  • Batch processing - Configure appropriate batch sizes for data retrieval
  • Parallel processing - Use multiple connections when supported by source systems

Network and Bandwidth

  • Compression - Enable data compression for large datasets
  • Connection pooling - Reuse connections for improved efficiency
  • Regional proximity - Deploy processing close to source systems when possible
  • Bandwidth management - Schedule large data transfers during off-peak hours

Common Source Types and Configuration

Salesforce CRM

# Salesforce source configuration
Type: Salesforce API
Authentication: OAuth 2.0
Objects: [Account, Contact, Opportunity, Lead]
API Version: v58.0
Batch Size: 2000 records
Rate Limiting: Respect Salesforce limits
Incremental: LastModifiedDate field

PostgreSQL Database

# PostgreSQL source configuration
Type: PostgreSQL
Host: db.company.com
Port: 5432
Database: crm_production
Schema: public
Tables: [customers, orders, products]
SSL: required
Connection Pool: 5 connections
Incremental: updated_at timestamp

Google Analytics

# Google Analytics source configuration
Type: Google Analytics 4
Property ID: 123456789
Authentication: Service Account
Metrics: [sessions, pageviews, conversions]
Dimensions: [date, source, medium, campaign]
Date Range: Last 30 days
Sampling: Unsampled (when possible)

Troubleshooting Common Issues

Connection Problems

Network Connectivity

  • Verify firewall rules allow connections from Precog IP addresses
  • Check that source systems are accessible from external networks
  • Confirm DNS resolution for hostnames and endpoints
  • Test connectivity using network tools like ping and telnet

Authentication Failures

  • Verify credentials are correct and not expired
  • Check that user accounts have necessary permissions
  • Confirm OAuth tokens are valid and not revoked
  • Review API key access levels and restrictions

Permission Issues

  • Ensure database users have SELECT permissions on required tables
  • Verify API accounts have access to necessary resources
  • Review role-based access controls in source systems

Performance Issues

Slow Data Retrieval

  • Optimize queries by adding appropriate WHERE clauses
  • Consider adding database indexes for frequently queried columns
  • Reduce batch sizes if source systems are overwhelmed
  • Schedule large data transfers during off-peak hours

Timeout Errors

  • Increase connection timeout settings for slow networks
  • Break large queries into smaller, more manageable chunks
  • Implement retry logic for transient network issues
  • Consider using database views to pre-optimize complex queries

Rate Limiting

  • Respect API rate limits and implement proper throttling
  • Spread requests over time to avoid hitting limits
  • Use bulk APIs when available for large data volumes
  • Cache frequently accessed but slowly changing data

Data Quality Issues

Missing or Incomplete Data

  • Verify that all required tables and fields are accessible
  • Check for data type mismatches and conversion issues
  • Review NULL value handling and default value assignments
  • Confirm date range filters are not excluding expected data

Data Format Problems

  • Ensure character encoding matches between source and destination
  • Handle special characters and Unicode properly
  • Verify number and date format compatibility

Inconsistent Data

  • Identify and handle duplicate records appropriately
  • Implement data validation rules to catch quality issues early
  • Set up alerts for unexpected data patterns or volumes
  • Document known data quality issues and their handling

Next Steps

After setting up your sources:

  1. Configure Destinations - Set up where your data will flow
  2. Create Schedules - Automate your data processing
  3. Monitor History - Track source performance and troubleshoot issues
  4. Manage Settings - Configure workspace-wide source preferences

For specific source type documentation and advanced configuration options, consult the detailed connector guides in your Precog workspace or contact support for assistance with complex source setups.