Documentation

Developers

API References
Data Subject Request API

Data Subject Request API Version 1 and 2

Data Subject Request API Version 3

Platform API

Platform API Overview

Accounts

Apps

Audiences

Calculated Attributes

Data Points

Feeds

Field Transformations

Services

Users

Workspaces

Warehouse Sync API

Warehouse Sync API Overview

Warehouse Sync API Tutorial

Warehouse Sync API Reference

Data Mapping

Warehouse Sync SQL Reference

Warehouse Sync Troubleshooting Guide

ComposeID

Warehouse Sync API v2 Migration

Bulk Profile Deletion API Reference

Calculated Attributes Seeding API

Custom Access Roles API

Data Planning API

Group Identity API Reference

Pixel Service

Profile API

Events API

mParticle JSON Schema Reference

IDSync

Client SDKs
AMP

AMP SDK

Android

Initialization

Configuration

Network Security Configuration

Event Tracking

User Attributes

IDSync

Screen Events

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

WebView Integration

Logger

Preventing Blocked HTTP Traffic with CNAME

Linting Data Plans

Troubleshooting the Android SDK

API Reference

Upgrade to Version 5

Cordova

Cordova Plugin

Identity

Direct Url Routing

Direct URL Routing FAQ

Web

Android

iOS

Flutter

Getting Started

Usage

API Reference

React Native

Getting Started

Identity

Roku

Getting Started

Identity

Media

iOS

Initialization

Configuration

Event Tracking

User Attributes

IDSync

Screen Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

Webview Integration

Upload Frequency

App Extensions

Preventing Blocked HTTP Traffic with CNAME

Linting Data Plans

Troubleshooting iOS SDK

Social Networks

iOS 14 Guide

iOS 15 FAQ

iOS 16 FAQ

iOS 17 FAQ

iOS 18 FAQ

API Reference

Upgrade to Version 7

Xbox

Getting Started

Identity

Unity

Upload Frequency

Getting Started

Opt Out

Initialize the SDK

Event Tracking

Commerce Tracking

Error Tracking

Screen Tracking

Identity

Location Tracking

Session Management

Web

Initialization

Content Security Policy

Configuration

Event Tracking

User Attributes

IDSync

Page View Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Custom Logger

Persistence

Native Web Views

Self-Hosting

Multiple Instances

Web SDK via Google Tag Manager

Preventing Blocked HTTP Traffic with CNAME

Facebook Instant Articles

Troubleshooting the Web SDK

Browser Compatibility

Linting Data Plans

API Reference

Upgrade to Version 2 of the SDK

Xamarin

Getting Started

Identity

Web

Alexa

Server SDKs

Node SDK

Go SDK

Python SDK

Ruby SDK

Java SDK

Tools

mParticle Command Line Interface

Linting Tools

Smartype

Media SDKs

Android

Web

iOS

Quickstart
Android

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Step 9. Test your local app

HTTP Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

iOS Quick Start

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Java Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Node Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Python Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Web

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Guides
Partners

Introduction

Outbound Integrations

Outbound Integrations

Firehose Java SDK

Inbound Integrations

Kit Integrations

Overview

Android Kit Integration

JavaScript Kit Integration

iOS Kit Integration

Data Hosting Locations

Compose ID

Glossary

Migrate from Segment to mParticle

Migrate from Segment to mParticle

Migrate from Segment to Client-side mParticle

Migrate from Segment to Server-side mParticle

Segment-to-mParticle Migration Reference

Rules Developer Guide

API Credential Management

The Developer's Guided Journey to mParticle

Guides

Getting Started

Create an Input

Start capturing data

Connect an Event Output

Create an Audience

Connect an Audience Output

Transform and Enhance Your Data

Platform Guide
The New mParticle Experience

The new mParticle Experience

The Overview Map

Observability

Observability Overview

Observability User Guide

Observability Span Glossary

Introduction

Data Retention

Connections

Activity

Live Stream

Data Filter

Rules

Tiered Events

mParticle Users and Roles

Analytics Free Trial

Troubleshooting mParticle

Usage metering for value-based pricing (VBP)

Analytics

Introduction

Setup

Sync and Activate Analytics User Segments in mParticle

User Segment Activation

Welcome Page Announcements

Settings

Project Settings

Roles and Teammates

Organization Settings

Global Project Filters

Portfolio Analytics

Analytics Data Manager

Analytics Data Manager Overview

Events

Event Properties

User Properties

Revenue Mapping

Export Data

UTM Guide

Query Builder

Data Dictionary

Query Builder Overview

Modify Filters With And/Or Clauses

Query-time Sampling

Query Notes

Filter Where Clauses

Event vs. User Properties

Group By Clauses

Annotations

Cross-tool Compatibility

Apply All for Filter Where Clauses

Date Range and Time Settings Overview

Understanding the Screen View Event

Analyses

Analyses Introduction

Segmentation: Basics

Getting Started

Visualization Options

For Clauses

Date Range and Time Settings

Calculator

Numerical Settings

Segmentation: Advanced

Assisted Analysis

Properties Explorer

Frequency in Segmentation

Trends in Segmentation

Did [not] Perform Clauses

Cumulative vs. Non-Cumulative Analysis in Segmentation

Total Count of vs. Users Who Performed

Save Your Segmentation Analysis

Export Results in Segmentation

Explore Users from Segmentation

Funnels: Basics

Getting Started with Funnels

Group By Settings

Conversion Window

Tracking Properties

Date Range and Time Settings

Visualization Options

Interpreting a Funnel Analysis

Funnels: Advanced

Group By

Filters

Conversion over Time

Conversion Order

Trends

Funnel Direction

Multi-path Funnels

Analyze as Cohort from Funnel

Save a Funnel Analysis

Explore Users from a Funnel

Export Results from a Funnel

Cohorts

Getting Started with Cohorts

Analysis Modes

Save a Cohort Analysis

Export Results

Explore Users

Saved Analyses

Manage Analyses in Dashboards

Journeys

Getting Started

Event Menu

Visualization

Ending Event

Save a Journey Analysis

Users

Getting Started

User Activity Timelines

Time Settings

Export Results

Save A User Analysis

Dashboards

Dashboards––Getting Started

Manage Dashboards

Organize Dashboards

Dashboard Filters

Scheduled Reports

Favorites

Time and Interval Settings in Dashboards

Query Notes in Dashboards

User Aliasing

Analytics Resources

The Demo Environment

Keyboard Shortcuts

Tutorials

Analytics for Marketers

Analytics for Product Managers

Compare Conversion Across Acquisition Sources

Analyze Product Feature Usage

Identify Points of User Friction

Time-based Subscription Analysis

Dashboard Tips and Tricks

Understand Product Stickiness

Optimize User Flow with A/B Testing

User Segments

APIs

User Segments Export API

Dashboard Filter API

IDSync

IDSync Overview

Use Cases for IDSync

Components of IDSync

Store and Organize User Data

Identify Users

Default IDSync Configuration

Profile Conversion Strategy

Profile Link Strategy

Profile Isolation Strategy

Best Match Strategy

Aliasing

Data Master
Group Identity

Overview

Create and Manage Group Definitions

Introduction

Catalog

Live Stream

Data Plans

Data Plans

Blocked Data Backfill Guide

Personalization
Predictive Attributes

Predictive Attributes Overview

Create Predictive Attributes

Assess and Troubleshoot Predictions

Use Predictive Attributes in Campaigns

Predictive Audiences

Predictive Audiences Overview

Using Predictive Audiences

Introduction

Profiles

Calculated Attributes

Calculated Attributes Overview

Using Calculated Attributes

Create with AI Assistance

Calculated Attributes Reference

Audiences

Audiences Overview

Real-time Audiences

Standard Audiences

Journeys

Journeys Overview

Manage Journeys

Download an audience from a journey

Audience A/B testing from a journey

Journeys 2.0

Warehouse Sync

Data Privacy Controls

Data Subject Requests

Default Service Limits

Feeds

Cross-Account Audience Sharing

Approved Sub-Processors

Import Data with CSV Files

Import Data with CSV Files

CSV File Reference

Glossary

Video Index

Analytics (Deprecated)
Identity Providers

Single Sign-On (SSO)

Setup Examples

Settings

Debug Console

Data Warehouse Delay Alerting

Introduction

Developer Docs

Introduction

Integrations

Introduction

Rudderstack

Google Tag Manager

Segment

Data Warehouses and Data Lakes

Advanced Data Warehouse Settings

AWS Kinesis (Snowplow)

AWS Redshift (Define Your Own Schema)

AWS S3 Integration (Define Your Own Schema)

AWS S3 (Snowplow Schema)

BigQuery (Snowplow Schema)

BigQuery Firebase Schema

BigQuery (Define Your Own Schema)

GCP BigQuery Export

Snowflake (Snowplow Schema)

Snowplow Schema Overview

Snowflake (Define Your Own Schema)

APIs

Dashboard Filter API (Deprecated)

REST API

User Segments Export API (Deprecated)

SDKs

SDKs Introduction

React Native

iOS

Android

Java

JavaScript

Python

Object API

Developer Basics

Aliasing

Integrations

24i

Event

Aarki

Audience

Abakus

Event

ABTasty

Audience

Actable

Feed

AdChemix

Event

Adikteev

Audience

Event

Adjust

Event

Feed

AdMedia

Audience

Adobe Marketing Cloud

Cookie Sync

Event

Adobe Audience Manager

Audience

Adobe Target

Audience

AdPredictive

Feed

Adobe Campaign Manager

Audience

Airship

Audience

Event

Feed

Algolia

Event

AgilOne

Event

AlgoLift

Event

Feed

Alooma

Event

Amazon Advertising

Audience

Amazon Kinesis

Event

Amazon Kinesis Firehose

Audience

Event

Amazon S3

Event

Amazon Redshift

Data Warehouse

Adobe Marketing Cloud

Event

Amazon SNS

Event

Amazon SQS

Event

Amobee

Audience

Amplitude

Event

Forwarding Data Subject Requests

Analytics

Audience

Event

Forwarding Data Subject Requests

Ampush

Audience

Event

AppsFlyer

Event

Feed

Forwarding Data Subject Requests

AppLovin

Audience

Event

Apptentive

Event

Anodot

Event

Apptimize

Event

Apteligent

Event

Attentive

Event

Feed

Attractor

Event

Batch

Event

Audience

Bidease

Audience

Bing Ads

Event

Bluecore

Event

Microsoft Azure Blob Storage

Event

Bluedot

Feed

Blueshift

Event

Feed

Forwarding Data Subject Requests

Branch

Event

Feed

Forwarding Data Subject Requests

Branch S2S Event

Event

Braze

Audience

Forwarding Data Subject Requests

Feed

Event

Button

Audience

Event

Cadent

Audience

Bugsnag

Event

Census

Feed

ciValue

Event

Feed

CleverTap

Event

Audience

Feed

comScore

Event

Conversant

Event

Cordial

Audience

Feed

Criteo

Audience

Event

Crossing Minds

Event

Cortex

Event

Feed

Forwarding Data Subject Requests

CustomerGlu

Event

Feed

Customer.io

Audience

Feed

Event

Databricks

Data Warehouse

Custom Feed

Custom Feed

Antavo

Feed

Datadog

Event

Dynalyst

Audience

Didomi

Event

Dynamic Yield

Audience

Event

Emarsys

Audience

Edge226

Audience

Epsilon

Event

Everflow

Audience

Facebook Offline Conversions

Event

Facebook

Event

Audience

Google Analytics for Firebase

Event

Flurry

Event

Fiksu

Audience

Event

Flybits

Event

ForeSee

Event

Formation

Event

Feed

Foursquare

Feed

Audience

FreeWheel Data Suite

Audience

Google Ads

Audience

Event

Google Analytics

Event

Google Ad Manager

Audience

Google Analytics 4

Event

Google BigQuery

Audience

Data Warehouse

Google Cloud Storage

Audience

Event

Google Enhanced Conversions

Event

Google Marketing Platform

Audience

Cookie Sync

Event

Google Pub/Sub

Event

Google Marketing Platform Offline Conversions

Event

Heap

Event

Herow

Feed

Google Tag Manager

Event

Hightouch

Feed

Hyperlocology

Event

Indicative

Event

Audience

Ibotta

Event

Impact

Event

InMarket

Audience

InMobi

Audience

Event

Insider

Audience

Event

Feed

Intercom

Event

Inspectlet

Event

iPost

Audience

Feed

ironSource

Audience

Jampp

Audience

Event

Iterable

Audience

Feed

Event

Kafka

Event

Kayzen

Audience

Event

Kissmetrics

Event

Kochava

Feed

Event

Forwarding Data Subject Requests

Klaviyo

Audience

Event

Kubit

Event

LaunchDarkly

Feed

Leanplum

Feed

Event

Audience

LifeStreet

Audience

Liftoff

Audience

Event

LiveLike

Event

Liveramp

Audience

MadHive

Audience

mAdme Technologies

Event

Mailchimp

Audience

Event

Feed

Localytics

Event

Marigold

Audience

Mautic

Audience

Event

MediaMath

Audience

Mediasmart

Audience

Microsoft Azure Event Hubs

Event

Mixpanel

Audience

Forwarding Data Subject Requests

Event

MoEngage

Audience

Feed

Event

Mintegral

Audience

Moloco

Event

Audience

Monetate

Event

Movable Ink

Event

Movable Ink - V2

Event

Multiplied

Event

Nanigans

Event

myTarget

Audience

Event

Nami ML

Feed

Narrative

Audience

Event

Feed

NCR Aloha

Event

Optimizely

Audience

Event

Neura

Event

OneTrust

Event

Oracle BlueKai

Event

Paytronix

Feed

Oracle Responsys

Event

Audience

Persona.ly

Audience

PieEye

Inbound Data Subject Requests

Personify XP

Event

Plarin

Event

Pilgrim

Feed

Event

Pinterest

Event

Audience

Postie

Event

Audience

Punchh

Audience

Event

Feed

Primer

Event

Qualtrics

Event

Pushwoosh

Event

Audience

Quantcast

Event

Radar

Event

Feed

Remerge

Event

Audience

Reddit

Audience

Event

Regal

Event

Retina AI

Event

Feed

Reveal Mobile

Event

Rokt

Audience

RTB House

Audience

Event

Sailthru

Audience

Event

RevenueCat

Feed

Salesforce Email

Audience

Feed

Event

Salesforce Mobile Push

Event

Samba TV

Audience

Event

Scalarr

Event

SendGrid

Audience

Feed

SessionM

Event

Feed

ShareThis

Audience

Feed

Shopify

Custom Pixel

Feed

Signal

Event

SimpleReach

Event

Singular

Event

Feed

Singular-DEPRECATED

Event

Skyhook

Event

Slack

Event

SmarterHQ

Event

Snapchat

Event

Audience

Snapchat Conversions

Event

Smadex

Audience

Snowplow

Event

Snowflake

Data Warehouse

Split

Event

Feed

Sprig

Audience

Event

Splunk MINT

Event

StartApp

Audience

Statsig

Event

Feed

Stormly

Audience

Event

Swrve

Feed

Event

Talon.One

Audience

Event

Feed

Tapad

Audience

Tapjoy

Audience

Taptica

Audience

Taplytics

Event

Teak

Audience

The Trade Desk

Audience

Cookie Sync

Event

Ticketure

Feed

TikTok Event

Audience

Audience (Deprecated)

Event

Audience Migration

Triton Digital

Audience

Quadratic Labs

Event

Treasure Data

Audience

Event

Valid

Event

Twitter

Audience

Event

TUNE

Event

Vkontakte

Audience

Vungle

Audience

Voucherify

Audience

Event

Webhook

Event

Webtrends

Event

White Label Loyalty

Event

Xandr

Audience

Cookie Sync

Wootric

Event

YouAppi

Audience

Yahoo (formerly Verizon Media)

Audience

Cookie Sync

Z2A Digital

Audience

Event

Yotpo

Feed

Zendesk

Event

Feed

Data Warehouse

The mParticle integration with Databricks allows you to forward your data from mParticle to Databricks. Databricks is a Delta Lake platform built on Apache Spark and facilitates both distributed data storage and computation. When connected to Databricks, arbitrary work, and SQL queries can be scheduled to run against a configured Compute Cluster or SQL Warehouse, whose capacity can be tailored according to your needs.

Prerequisites

Before setting up the Databricks integration within mParticle, you must configure the following within your Databricks account:

  • A SQL Warehouse to run future queries.
  • A dedicated Service Principal to allow mParticle to upload data to your Databricks catalog.

    • When creating a Service Principle, you will create a Client ID / Client Secret pair. These credentials will be used by mParticle to generate OAuth access tokens to upload data to your catalog on your behalf.
    • When creating your new service principal, make sure to grant it access to the Catalog or Schema you create for this integration. If not, the access token that mParticle generates on behalf of the Service Principal won’t allow access to your Databricks resources.
  • A Catalog and Schema within your Databricks workspace.

1. Create a SQL Warehouse

A SQL Warehouse within Databricks is a computational resource that allows you to run SQL queries on your data. You will need to create a SQL Warehouse that mParticle can use when forwarding your data.

To create a new SQL warehouse:

  1. Log into your Databricks account.
  2. Under SQL in the left hand nav, select SQL Warehouse.
  3. Click Create SQL warehouse.

screenshot showing the Databricks UI when creating a new SQL Warehouse

  1. Enter a meaningful name for your new warehouse, and select a Cluster size appropriate for your organization’s needs.
  2. Under Auto stop, enter a time duration appropriate for your organization. Note that “cold” cluster start-up times can reach several minutes, which may impact how efficiently mParticle is able to forward data to Databricks.
  3. Under Type, select Serverless.
  4. Click Create.

To learn more about SQL warehouses and their configuration settings, visit the Databricks documentation: Create a SQL warehouse.

2. Create a new Service Principal

A service principal in Databricks is an API-only identity used to grant automated tools and applications, like mParticle, secure access to your data catalogs. mParticle will use the service principal you create to authenticate itself when forwarding your data to Databricks.

To create a new Service Principal:

  1. Navigate to Settings, and select Identity and access under Workspace admin.

screenshot showing the Identity and Access settings page in the Databricks UI

  1. Click Manage under Service principals.
  2. Click Add service principal.
  3. Make sure to enable Databricks SQL access and Workspace access under Entitlements. You can enable these for an existing service principal by navigating to the Configurations tab when viewing the service principal and enabling both settings before clicking Update.

screenshot showing entitlement settings for a service principal in the Databricks UI

  1. After creating your new service principal, go to the Secrets tab on the Service principal details page, and click Generate secret. Save your Client ID and Client Secret, as you will need to enter them when configuring your connection to Databricks in mParticle.

screenshot showing the Secrets tab of the new Service Principal in the Databricks UI

To learn more about Service Principals, visit the Databricks documentation: Manage service principals.

3. Create a new Catalog

All data in Databricks is organized within Catalogs. Catalogs contain Schemas that define the structure of your data, and tables that contain the data itself.

To create a new catalog:

  1. Navigate to Catalog in the left hand nav of your Databricks workspace.
  2. Click Create Catalog.
  3. Under Type, select Standard, or whichever type best suits your organization’s use case. The mParticle Databricks integration supports all catalog types.
  4. Click Create.
  5. After creating and opening your catalog, go to the Permissions tab.
  6. Click Grant, enter the service principal you created in 2 Create a new Service Principal under Principals, and enable the following privileges to ensure mParticle can generate the necessary tables in your catalog:

    • USE CATALOG
    • USE SCHEMA
    • READ VOLUME
    • WRITE VOLUME
    • CREATE TABLE
    • CREATE VOLUME

screenshot showing the Permissions tab of the catalog in the Databricks UI

You can learn more about catalogs in the Databricks documentation: What are catalogs in Databricks?

4. Create a new Schema

Within the Databricks data hierarchy, a schema is a subcomponent of a catalog that defines in more granularity how your data is organized and structured.

To create a new schema:

  1. Navigate to Catalog in the left hand nav of your Databricks workspace.
  2. Select the catalog you created in the previous step, and click Create schema.
  3. Enter a meaningful name and description before clicking Create.
  4. Since you already granted read/write privileges to your service principal for the catalog containing this schema, you don’t need to grant it privileges here. However, depending on how your organization has structured your Databricks workspace, you may choose instead to set these privileges at the schema scope instead of the catalog scope.

It is also possible to create a new schema using the Databricks SQL Editor.

Configure your Databricks integration in mParticle

1. Create an outbound configuration for Databricks

To create an outbound configuration for Databricks within mParticle:

  1. Log into your mParticle account.
  2. From the Overview Map in the new UI, click Add under Outputs. If you’re using the Classic UI, navigate to Setup > Outputs in the left hand nav bar.
  3. Go to the Data Warehouse tab and use the Add Data Warehouse dropdown menu to add a new Databricks configuration.

screenshot showing the data warehouse tab of the setup outputs page in the mParticle UI

  1. Hover over Databricks and click Configure.

screenshot showing the Databricks option in the setup outputs page

  1. Enter a unique Configuration name, and check Use same settings for Development & Production, if you want to use the same configuration settings for both your development and production data.
  2. Click Save.
  3. You will be taken to the Settings tab for your new configuration. Under Databricks Parameters, enter the following:
  • Deployment Name: The Databricks deployment name of the workspace containing the SQL Warehouse, Service Principal, and Catalog/Schema you created in the Prerequisites section.

    • If you log into your Databricks account, the URL in your browser will resemble https://<deployment-name>.databricks.com, where <deployment-name> is the deployment name.
  • Warehouse ID: The ID of the SQL Warehouse you created in 1 Create a SQL Warehouse.
  • Catalog Name: The name of the Catalog you created in 3 Create a new Catalog.
  • Schema Name: The name of the Schema you created in 4 Create a new Schema.
  • Client ID: The Client ID you generated for your service principal in 2 Create a new Service Principal.
  • Client Secret: The Client Secret you generated for your service principal in 2 Create a new Service Principal.
  • Event Stats Threshold: the number of events that must be reached before mParticle begins adding additional events to their own dedicated table.

    • Until this threshold is reached, all events will be uploaded to a common table.

2. Create a connection with your new Databricks output

  1. From your mParticle account, navigate to Connections > Connect.
  2. Select one of your configured inputs that you want to forward data to Databricks from
  3. Click Connect Output, and select Databricks from your list of configured outputs.
  4. Set Connection Status to Active or Inactive. You can always activate a connection later after completing the configuration, but only active connections will forward data.
  5. If you enable Send Batches without Events, mParticle will forward all batches to Databricks, even if they only contain user data with no event data.
  6. For feed connections only: If you enable Split Partner Feed Data by Event Name, mParticle will separate partner feed data by each unique event name. If you leave this setting disabled, mParticle will place all data from a single partner feed into a single table.

    • This setting only applies to partner feed data. If you create a connection to Databricks using one of the other platform inputs, mParticle will place data from that input into a single table.
  7. For feed connections only: Enter an optional, custom name for the table mParticle will add your data to.

    • If left blank, mParticle will use the name of the partner.
    • This setting is ignored if Split Partner Feed Data by Event Name is enabled.
  8. Click Add Connection.

Data mapping

All Databricks tables that mParticle generates are created within the schema you created in step 4 of the prerequisites. Databricks refers to databases and schemas interchangeably. The schema you create when configuring this integration serves as the main database that will contain the actual tables of data forwarded from mParticle. For more information, read about schemas in the Databricks documentation.

When mParticle adds data to a table in your Databricks schema, all the main objects and fields listed in the mParticle JSON schema are automatically mapped to objects and fields within Databricks. This includes complex objects or collections, allowing you to forward any event data from mParticle to Databricks.

For example, see the following sample CommerceEvent with a ProductAction field after it has been forwarded to Databricks:

screenshot showing a product action event in the Databricks UI

The same associated event data that was available in mParticle is queryable within Databricks.

What tables will my data be added to?

When determining which table a given event will be added to in Databricks, mParticle employs a cache that tracks all events forwarded to Databricks in the given workspace within the last 30 days.

The Event Stats Threshold configuration setting is cross-referenced with this cache to determine how many events must be forwarded to Databricks before mParticle begins adding them to their own, dedicated table.

If a given event type’s frequency exceeds the configured Event Stats Threshold, then those events will start to be uploaded to their own dedicated table. Until the threshold is reached, events are uploaded to the common table with the name [your-schema-name]_otherevents table.

Partner Feed connection settings

The only exception to how mParticle adds your data to Databricks tables pertains to Feed connections. There are two special settings for Databricks Feed connections that can influence the tables events are added to.

Databricks Table Name

When creating a Databricks connection, you can specify a name for a table that mParticle will create to store your feed data in. If you leave this setting blank, mParticle creates a table with the name set to the partner feed name. This setting is only applicable to feed inputs.

If Split Partner Feed Data by Event Name is enabled, this setting is ignored.

Split Partner Feed Data by Event Name

When creating a Databricks connection, you can enable a setting called Split Partner Feed Data by Event Name. If enabled, then mParticle will create a separate table for each unique event name forwarded to Databricks. If this setting is disabled, then all Partner feed data is added to a single table.

Data forwarding

When forwarding event data to Databricks, mParticle generates an OAuth access token using the Service Principal credentials you set up in 2 Create a new Service Principal. After authenticating to Databricks with the OAuth access token, mParticle creates a new Unity Catalog Volume called mparticle_staging under the schema you set up in 4 Create a new schema. This Unity Catalog Volume acts as a staging area for your data before it’s ultimately loaded into the appropriate Databricks table.

All data ingested into mParticle that is to be forwarded to Databricks is written to parquet files, which are uploaded to the staging Unity Catalog Volume in Databricks. mParticle then automatically issues the necessary commands to load the parquet data into the respective tables, as well as clean-up any previously-loaded files.

screenshot showing the example schema in the Databricks UI

Upload frequency

mParticle forwards data to Databricks in bulk. By default, uploads occur every 90 minutes or until 100,000 messages have accumulated in the upload queue, whichever comes first.

Accessing your data in Databricks

Once data has been loaded into a given table in your Databricks workspace, it can be easily queried using standard SQL syntax. This can be accomplished from within Databricks’ SQL Editor.

screenshot showing the SQL Editor in the Databricks UI

Settings reference

Configuration settings

Setting name Type Required? Encrypted? Default setting Description
Deployment Name string yes no null The databricks deployment that’s associated with the given Service Principal and SQL Warehouse. For example: if your Server Hostname is 1234.cloud.databricks.com, the Deployment Name that you should enter would be 1234.
Warehouse ID string yes no null The SQL Warehouse ID upon which to execute SQL statements.
Service Principal Client ID string yes no null The dedicated Service Principal’s Client ID, which will be used to generate OAuth Access Tokens to facilitate future uploads.
Service Principal Client Secret string yes yes null The dedicated Service Principal’s Client Secret, which will be used to generate OAuth Access Tokens to facilitate future uploads.
Catalog Name string yes no null The default catalog for statement execution.
Schema Name string yes no null The default schema for statement execution.
Events Threshold int yes no 10000 The threshold to determine the number of events that need to be seen before we start forwarding them to their own, dedicated table. Until this threshold is reached, events will be uploaded to a common table.

Connection settings

Setting name Type Required Default Input Description
Databricks Table Name string no null Feed Table name for this partner feed. If not set, the partner name will be used. Only applicable to feeds inputs, no effect on apps inputs. If “Split Partner Feed Data by Event Name” checkbox is enabled, this setting is not used.
Split Partner Feed Data by Event Name boolean no false Feed If enabled, split partner feed data by event name. Otherwise load data into the same table.
Send Batches without Events boolean no true All If enabled, an event batch that contains no events will be forwarded.
    Last Updated: December 5, 2024