Snowplow Schema Overview

Overview

Analytics supports loading Snowplow data via Snowflake, BigQuery, and S3. Snowplow is an open-source platform that allows businesses to capture granular and event-level data on user behavior from across multiple touchpoints and store it in a single location. The platform is designed to function at enterprise levels and Snowplow events can be plugged into almost all analytics tools.

Because Snowplow collects event-level data already characterized by events and properties, Snowplow data needs next-to-no adjusting before being connected to Analytics for analysis.

After the Snowplow Enrich step, Snowplow events are stored in an AWS S3 bucket or streamed to AWS Kinesis. Analytics reads Snowplow data from either source (we will refer to both as the Unified Log). Users can extend their data model with other data sources at the Data Modeling step (see Advanced data Modeling) before conducting analysis in Analytics.

Snowplow Data Flow

An implementation of Snowplow allows for tracking a range of predefined events, modeled structured events, or unstructured events that users can custom model.

By default, a wide range of common properties are logged with any implementation of Snowplow. In addition, customers can define both out-of-the-box and custom entities.

Deriving Analytics Events and Properties

Overview

Because Snowplow inherently uses an event-based model, there is no transformation needed to plug Snowplow data into Analytics for analysis. See below for how events and properties are derived from specific types of Snowplow events.

By default, for all event generation, the ‘domain_userid’ field in the Snowplow Unified Log is used as the unique user identifier in Analytics. The ‘collector_tstamp’ in the unified log is the timestamp used by Analytics.

Event and Properties Summary Table

Snowplow Entity Unique Identifier Used Timestamp Properties
Predefined events (page views, page pings, ecommerce transactions, errors) ‘domain_userid’ ‘collector_tstamp’ Common, Platform-specific fields, and applicable Custom Contexts
Structured Events ‘domain_userid’ ‘collector_tstamp’ Common and Platform-specific fields, ‘se_category’, ‘se_label’, ‘se_property’, ‘se_value’ and applicable Custom Contexts
Unstructured Events ‘domain_userid’ ‘collector_tstamp’ Common, Platform-specific fields, and applicable Custom Contexts

Predefined Events

Snowplow has a set of predefined event types that can be instrumented:

  • Page views
  • Page pings
  • Ecommerce transactions
  • Errors

If instrumented, the integration will generate these events, where the ‘event’ field in the Unified Log is used as the event name in Analytics.

Common and Platform-Specific Properties

For all Snowplow events, a range of datetime, user and device fields are recorded along with the event. If instrumented, all of these fields are generated as Analytics properties. Additionally, any platform-specific fields will be recorded as well, such as page referer and URL information for a web-specific instrumentation.

Structured Events

Snowplow custom structured events are generated using the ‘se_action’ field as the event name for Analytics (or the ‘event_name’ if ‘se_action’ is not populated). The ‘se_category’, ‘se_label’, ‘se_property’ and ‘se_value’ fields are added as Analytics properties in addition to all common and platform-specific properties.

Unstructured Events

Snowplow allows customers to model flexible custom unstructured events as needed. The ‘event_name’ field is used for the event name in Analytics (or it defaults to ‘unstruct’ if ‘event_name’ is not populated).

Custom Contexts

Snowplow allows customers to define their own context around events, such as extra user properties for a customer (membership information, age, etc.) or extra properties about a product for a purchase event (SKU, tags, product name, etc.).

Aliasing

If it is an S3 or batch-based Snowplow integration, and the data has both authenticated and unauthenticated sessions for their users, Analytics automatically aliases these sessions by tying the ‘user_id’ field to associated ‘domain_userid’ in order to have a shared user history for analysis. This is only available for Snowplow Relay customers under the Pro or Enterprise Plans. For further reading on Analytics aliasing process, please see Aliasing Documentation.

Was this page helpful?