Snowflake (Snowplow Schema)

Prerequisites:

To integrate with Snowplow Snowflake, you will need to access your Snowflake console.

For this self-service integration, we also have some data requirements:

  1. All of your events must be unified into one singular table as opposed to having separate tables for each event type.
  2. There can only be a maximum of one authenticatedID and one unauthenticatedID for aliasing.
  3. The event timestamp must be in UTC
  4. All joins must be done beforehand.

Instructions:

  1. In Analytics, click on the gear icon and select Project Settings. Project Settings
  2. Select the Data Sources tab. Data Sources
  3. Select New Data Source. New Data Source
  4. Select Connect via Data Warehouse or Lake: Connect via Data Warehouse or Lake
  5. Select Snowflake as your data connection Snowplow as the connection schema and click Connect. Snowflake Connection
  6. You should see this Snowflake + Snowplow Overview screen. Click Next. Snowflake Overview

Connection Information

Connection Information

  1. Log in to your Snowflake account.
  2. Enter the Account Info into Analytics.

    • Account Info is everything to the left of .gcp.snowflakecomputing.com/… Account Info
  3. Enter the Warehouse name. Warehouse Name
  4. Enter the Database name. Database Name
  5. Click into Warehouses and copy the Schema. Schema
  6. Enter the Table name. Table Name
  7. For Auto-Generated Password, we randomly generate a password for you to use. If you would like to create your own password, please replace the autofilled value in that field.

Grant Permissions

Grant Permissions

  1. You will need to copy and paste these code snippets into your Snowflake worksheet.
  2. Navigate to the Worksheets tab and paste the snippets into the SQL runner, and hit Run All.
  3. The last snippet needs to be applied in Admin -> Security -> + Network Policy. Network Policy
  4. Click Next to test your connection

Data Loading

  1. Start Date
    Select the date from which Analytics should load your data.

    • If your data history exceeds 1 billion events, a Solutions Engineer will contact you to assist with the integration.
  2. Schedule Interval
    Select the frequency to make new data available in Analytics.
  3. Processing Delay
    Select when we should start extracting your data in UTC. This time should be when all of your previous day’s data is fully available in your table for extraction.

Event Modeling

  1. In the Structured Event Name section, select the field that should be used to derive Analytics event names. Typically, most customers use the se_action field, but it completely depends on your implementation.
    We will first look at this field’s value to use as the event name in Analytics. If this value is null, then we will use the event_name field. If this field’s value is also null, we will then use the event field.
    If you are not using Snowplow structured events, select none.
  2. For Timestamp, select the field that represents the time that the event was performed. Analytics will use this field to run its queries. If unsure, leave as derived_tstamp.

    • collector_tstamp - Timestamp for the event recorded by the collector.
    • dvce_created_tstamp - Timestamp for the event recorded on the client device.
    • dvce_sent_tstamp - When the event was actually sent by the client device.
    • etl_tstamp - Timestamp for when the event was validated and enriched. Note: the name is historical and does not mean that the event is loaded at this point (this is further downstream).
    • derived_tstamp - Timestamp making allowance for inaccurate device clock.
    • true_tstamp - User-set “true timestamp” for the event.
  3. For Vendor Name, input the Snowplow vendor names used so we can simplify your event property names.
  4. Click Next

After this step, we will perform a few checks on your data with the model that you provided. The checks are:

  • Valid event field (Do at least 80% of your records have a value for the event field?)
  • Valid timestamp field (Do at least 80% of your records have a value for the timestamp field?)
  • Total number of unique events. We recommend 20-300 unique events and limit it to 2000.

User Identification (Aliasing)

After some basic checks, we can define your users within your data. For more information on User Identification (Aliasing), please refer to this article.

  1. If you choose to enable Aliasing, click on Enabled:
  2. Type - Select the Snowplow field type

    • Atomic - If the anonymous ID field is an atomic field, select this option.

      • Field Name - Select the field that should be used to identify anonymous users
    • Context - If your anonymous ID is contained within the Contexts field, choose this option

      • Field Name - Select the context field that contains your anonymousID
  3. If you choose to disable Aliasing, press Disabled:

    • Type - Select the Snowplow field type

      • Atomic - If the anonymous ID field is an atomic field, select this option.

        • Field Name - Select the field that should be used to identify anonymous users
      • Context - If your anonymous ID is contained within the Contexts field, choose this option

        • Field Name - Select the context field that contains your anonymousID

If you have a non-null value that represents null UserID values, please click on the Show Advanced button. In this field, please enter these non-null values.

After this step, we will perform additional checks on your data with the user model that you provided. The checks are:

  • User Hotspot (Is there a single UserID that represents over 40% of your records?)
  • Anti-Hotspot (Does your data have too many unique userIDs? A good events table contains multiple events per user)
  • Aliasing
    • Too many unauthenticated IDs for a single authenticated userID
    • Too many authenticated IDs for a single anonymous ID

Assisted Modeling

You should see a summary of your data based on the last 7 days in three main blocks.

  1. Events Summary
    You should see a daily breakdown of your Total Event Count, and the number of Unique Event Names. If there are certain events to exclude, please click on the Exclude checkbox for those events.

    • If you would like to exclude any events by regex or property value, please contact a product specialist.
    • If this section looks good, click Next
  2. Properties Summary
    Here you will see the number of Unique Event Property Names. If there are certain properties to exclude, please click on the Exclude checkbox for those events.

    • If you require more advanced configurations such as parsing out JSON fields, creating derived properties, or excluding properties based on regex, please contact a product specialist.
    • If this section looks good, click Next
  3. Users Summary
    This section lists the number of Unique users seen. If the numbers do not look correct, please go back to the User Modeling section to confirm that the correct ID was chosen. Please note that the counts may not reflect exactly what gets loaded into Analytics due to aliasing and other event modeling configurations.

    • If this section looks good, click Next

Waiting for Data

If you see this screen, you’re all done! You should see your data in Analytics within 48-72 hours and will be notified by email. Waiting for Data

Advanced Settings

For additional advanced settings such as excluding certain events and properties, please refer to this page

If you have any questions or concerns about the above Integration, please contact your Customer Support Manager, or email support@mparticle.com.

Was this page helpful?