Semantic Modeling for Snowflake Cortex

Overview

Semantic Modeling bridges the gap between raw data tables and business understanding. When you connect a data source to Precog and load it into Snowflake, Precog can automatically generate semantic models that describe your data in terms analysts and business users understand.

These semantic models power Snowflake Cortex Analyst, enabling natural language queries against your data warehouse. Instead of writing SQL, users can ask questions like "What were our top-selling products last quarter?" and get accurate answers.

Why It Matters

Data teams spend significant time translating business questions into SQL queries. Semantic models capture this translation once, making it available to everyone in the organization.

With Precog-generated semantic models:

  • Business users can query data using natural language through Snowflake Cortex Analyst
  • Analysts spend less time on ad-hoc requests and more on strategic analysis
  • Data engineers define meaning once rather than repeatedly explaining table structures

The result is faster insights with fewer errors and less back-and-forth between teams.

How It Works

When you configure a data source in Precog, the platform analyzes your source schema and generates a YAML-based semantic model following Snowflake's specification. This model includes:

Logical Tables

Each source table becomes a logical table with clearly defined columns categorized as:

  • Dimensions — Categorical attributes for grouping and filtering (e.g., customer name, product category)
  • Time Dimensions — Date and timestamp fields for time-based analysis
  • Facts — Numeric measures that can be aggregated (e.g., order amount, quantity)

Metrics

Pre-defined calculations that combine facts with aggregation logic:

  • Revenue = SUM(order_amount)
  • Average Order Value = AVG(order_amount)
  • Customer Count = COUNT(DISTINCT customer_id)

Relationships

How tables connect to each other for multi-table queries:

  • Orders → Customers (many-to-one via customer_id)
  • Order Items → Products (many-to-one via product_id)

Filters

Common query constraints that business users frequently apply:

  • Active Customers Only
  • Last 12 Months
  • Exclude Test Transactions

The Semantic Model Format

Precog generates semantic models in YAML format, following Snowflake's Cortex Analyst specification. Here's an example structure:

name: ecommerce_semantic_model
description: Semantic model for e-commerce analytics

tables:
  - name: orders
    description: Customer orders with transaction details
    base_table:
      database: ANALYTICS
      schema: ECOMMERCE
      table: ORDERS

    dimensions:
      - name: order_status
        description: Current status of the order
        expr: ORDER_STATUS
        data_type: VARCHAR

    time_dimensions:
      - name: order_date
        description: Date when the order was placed
        expr: ORDER_DATE
        data_type: DATE

    facts:
      - name: order_amount
        description: Total amount of the order
        expr: ORDER_AMOUNT
        data_type: NUMBER

    metrics:
      - name: total_revenue
        description: Sum of all order amounts
        expr: SUM(order_amount)

    filters:
      - name: completed_orders
        description: Only completed orders
        expr: ORDER_STATUS = 'COMPLETED'

relationships:
  - name: orders_to_customers
    left_table: orders
    right_table: customers
    relationship_columns:
      - left_column: CUSTOMER_ID
        right_column: ID
    join_type: left_outer
    relationship_type: many_to_one

Working with Semantic Models in Precog

Viewing Your Semantic Model

After a successful data load, Precog automatically uploads your semantic model to Snowflake. From the source details page, you can access a deep link that takes you directly to your semantic model in Snowflake.

In Snowflake, you can view the YAML file directly and make edits if needed — such as adding descriptions, synonyms, or custom metrics.

Regenerating Models

To regenerate your semantic model:

  1. Edit your use case — Update the use case description or settings on the source details page
  2. Trigger a data reload — Run your connection to reload the datasets

Precog regenerates the semantic model during the reload and uploads the updated version to Snowflake automatically.

Using Semantic Models with Snowflake Cortex Analyst

Precog automatically uploads your semantic model to Snowflake, so you can start using Cortex Analyst right away.

Users can ask questions in plain English:

  • "Show me monthly revenue trends for the past year"
  • "Which customers have the highest lifetime value?"
  • "Compare sales by region for Q4"

Cortex Analyst translates these questions into SQL using your semantic model's definitions.

Best Practices

Writing Effective Use Case Descriptions

The quality of your semantic model depends on clear, detailed use case descriptions. A well-written use case helps Precog generate a more accurate and useful model, which leads to better natural language query results in Cortex Analyst.

The Formula

A good use case includes four elements:

[Business Domain] + [Key Metrics/KPIs] + [Key Entities] + [Analysis Type]

When configuring semantic modeling for a source, take time to describe:

  • What business questions this data should answer
  • Who will use the data (analysts, executives, operations teams)
  • Key metrics and KPIs that matter for this data
  • Important relationships between entities

Examples: Poor vs. Better Use Cases

Vague descriptions produce generic models. Specific descriptions produce models tuned to your actual needs.

Accounting (Xero)

PoorBetter
"Financial reporting""Analyze accounts payable and receivable performance including invoice aging, payment trends, and cash flow by customer and supplier"
"Invoice analysis""Track invoice processing including outstanding amounts, overdue rates, and payment timing by customer and invoice type"
"Cash flow""Monitor cash flow including receipts, payments, and bank balance trends by account and period"

Sales/CRM (HubSpot)

PoorBetter
"Sales analysis""Track sales pipeline performance including deal velocity, win rates by stage, revenue by product line, and rep performance metrics"
"CRM data""Analyze customer engagement including contact activity, deal progression, and conversion rates by company size and industry"
"Pipeline report""Track deal pipeline including stage duration, drop-off rates, and forecast accuracy by owner and product"

Quick Templates

Use these templates as starting points for your use cases:

For accounting data:

Analyze [process] including [metric1], [metric2], and [metric3]
by [dimension1] and [dimension2]

Example: "Analyze accounts receivable including aging analysis, collection rates, and outstanding amounts by customer segment and invoice type"

For sales/CRM data:

Track [sales/marketing process] performance including [metric1],
[metric2], and [metric3] across [entity1] and [entity2]

Example: "Track deal pipeline performance including conversion rates, average deal size, and sales cycle length across products and territories"

Common Metrics by Domain

Include metrics relevant to your data source:

Accounting: Invoice aging, outstanding amounts, overdue rates, payment timing, days sales outstanding (DSO), cash flow, receipts, payments, spend by category, budget variance

Sales/CRM: Deal velocity, win rates, close rates, pipeline value, forecast accuracy, conversion rates by stage, average deal size, revenue by product, contact activity, engagement scores, lead source attribution

What to Avoid

  • Vague terms alone: "analysis" or "reporting" without specifics
  • Missing metrics: Not stating what you want to measure
  • No dimensions: Omitting how you want to slice the data
  • Assumed context: The system doesn't know your business jargon

Practical Insight

Semantic modeling transforms Precog from a data loading tool into an analytics enabler. By capturing business meaning at the point of data ingestion, you create a foundation for self-service analytics that scales across your organization.

Start with core business entities — customers, orders, products — and expand as users identify additional needs. The goal isn't perfect coverage on day one, but a model that grows with your analytics maturity.