Data Subject Request API Version 1 and 2
Data Subject Request API Version 3
Platform API Overview
Accounts
Apps
Audiences
Calculated Attributes
Data Points
Feeds
Field Transformations
Services
Users
Workspaces
Warehouse Sync API Overview
Warehouse Sync API Tutorial
Warehouse Sync API Reference
Data Mapping
Warehouse Sync SQL Reference
Warehouse Sync Troubleshooting Guide
ComposeID
Warehouse Sync API v2 Migration
Bulk Profile Deletion API Reference
Calculated Attributes Seeding API
Custom Access Roles API
Data Planning API
Group Identity API Reference
Pixel Service
Profile API
Events API
mParticle JSON Schema Reference
IDSync
AMP SDK
Initialization
Configuration
Network Security Configuration
Event Tracking
User Attributes
IDSync
Screen Events
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Push Notifications
WebView Integration
Logger
Preventing Blocked HTTP Traffic with CNAME
Linting Data Plans
Troubleshooting the Android SDK
API Reference
Upgrade to Version 5
Cordova Plugin
Identity
Direct URL Routing FAQ
Web
Android
iOS
Getting Started
Identity
Initialization
Configuration
Event Tracking
User Attributes
IDSync
Screen Tracking
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Push Notifications
Webview Integration
Upload Frequency
App Extensions
Preventing Blocked HTTP Traffic with CNAME
Linting Data Plans
Troubleshooting iOS SDK
Social Networks
iOS 14 Guide
iOS 15 FAQ
iOS 16 FAQ
iOS 17 FAQ
iOS 18 FAQ
API Reference
Upgrade to Version 7
Upload Frequency
Getting Started
Opt Out
Initialize the SDK
Event Tracking
Commerce Tracking
Error Tracking
Screen Tracking
Identity
Location Tracking
Session Management
Initialization
Content Security Policy
Configuration
Event Tracking
User Attributes
IDSync
Page View Tracking
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Custom Logger
Persistence
Native Web Views
Self-Hosting
Multiple Instances
Web SDK via Google Tag Manager
Preventing Blocked HTTP Traffic with CNAME
Facebook Instant Articles
Troubleshooting the Web SDK
Browser Compatibility
Linting Data Plans
API Reference
Upgrade to Version 2 of the SDK
Getting Started
Identity
Web
Alexa
Node SDK
Go SDK
Python SDK
Ruby SDK
Java SDK
Overview
Step 1. Create an input
Step 2. Verify your input
Step 3. Set up your output
Step 4. Create a connection
Step 5. Verify your connection
Step 6. Track events
Step 7. Track user data
Step 8. Create a data plan
Step 9. Test your local app
Overview
Step 1. Create an input
Step 2. Verify your input
Step 3. Set up your output
Step 4. Create a connection
Step 5. Verify your connection
Step 6. Track events
Step 7. Track user data
Step 8. Create a data plan
Step 1. Create an input
Step 2. Create an output
Step 3. Verify output
Introduction
Outbound Integrations
Firehose Java SDK
Inbound Integrations
Data Hosting Locations
Compose ID
Glossary
Migrate from Segment to mParticle
Migrate from Segment to Client-side mParticle
Migrate from Segment to Server-side mParticle
Segment-to-mParticle Migration Reference
Rules Developer Guide
API Credential Management
The Developer's Guided Journey to mParticle
Create an Input
Start capturing data
Connect an Event Output
Create an Audience
Connect an Audience Output
Transform and Enhance Your Data
The new mParticle Experience
The Overview Map
Introduction
Data Retention
Connections
Activity
Live Stream
Data Filter
Rules
Tiered Events
mParticle Users and Roles
Analytics Free Trial
Troubleshooting mParticle
Usage metering for value-based pricing (VBP)
Introduction
Sync and Activate Analytics User Segments in mParticle
User Segment Activation
Welcome Page Announcements
Project Settings
Roles and Teammates
Organization Settings
Global Project Filters
Portfolio Analytics
Analytics Data Manager Overview
Events
Event Properties
User Properties
Revenue Mapping
Export Data
UTM Guide
Data Dictionary
Query Builder Overview
Modify Filters With And/Or Clauses
Query-time Sampling
Query Notes
Filter Where Clauses
Event vs. User Properties
Group By Clauses
Annotations
Cross-tool Compatibility
Apply All for Filter Where Clauses
Date Range and Time Settings Overview
Understanding the Screen View Event
Analyses Introduction
Getting Started
Visualization Options
For Clauses
Date Range and Time Settings
Calculator
Numerical Settings
Assisted Analysis
Properties Explorer
Frequency in Segmentation
Trends in Segmentation
Did [not] Perform Clauses
Cumulative vs. Non-Cumulative Analysis in Segmentation
Total Count of vs. Users Who Performed
Save Your Segmentation Analysis
Export Results in Segmentation
Explore Users from Segmentation
Getting Started with Funnels
Group By Settings
Conversion Window
Tracking Properties
Date Range and Time Settings
Visualization Options
Interpreting a Funnel Analysis
Group By
Filters
Conversion over Time
Conversion Order
Trends
Funnel Direction
Multi-path Funnels
Analyze as Cohort from Funnel
Save a Funnel Analysis
Explore Users from a Funnel
Export Results from a Funnel
Saved Analyses
Manage Analyses in Dashboards
Dashboards––Getting Started
Manage Dashboards
Organize Dashboards
Dashboard Filters
Scheduled Reports
Favorites
Time and Interval Settings in Dashboards
Query Notes in Dashboards
User Aliasing
The Demo Environment
Keyboard Shortcuts
Analytics for Marketers
Analytics for Product Managers
Compare Conversion Across Acquisition Sources
Analyze Product Feature Usage
Identify Points of User Friction
Time-based Subscription Analysis
Dashboard Tips and Tricks
Understand Product Stickiness
Optimize User Flow with A/B Testing
User Segments
IDSync Overview
Use Cases for IDSync
Components of IDSync
Store and Organize User Data
Identify Users
Default IDSync Configuration
Profile Conversion Strategy
Profile Link Strategy
Profile Isolation Strategy
Best Match Strategy
Aliasing
Overview
Create and Manage Group Definitions
Introduction
Catalog
Live Stream
Data Plans
Blocked Data Backfill Guide
Predictive Attributes Overview
Create Predictive Attributes
Assess and Troubleshoot Predictions
Use Predictive Attributes in Campaigns
Predictive Audiences Overview
Using Predictive Audiences
Introduction
Profiles
Warehouse Sync
Data Privacy Controls
Data Subject Requests
Default Service Limits
Feeds
Cross-Account Audience Sharing
Approved Sub-Processors
Import Data with CSV Files
CSV File Reference
Glossary
Video Index
Single Sign-On (SSO)
Setup Examples
Introduction
Introduction
Introduction
Rudderstack
Google Tag Manager
Segment
Advanced Data Warehouse Settings
AWS Kinesis (Snowplow)
AWS Redshift (Define Your Own Schema)
AWS S3 Integration (Define Your Own Schema)
AWS S3 (Snowplow Schema)
BigQuery (Snowplow Schema)
BigQuery Firebase Schema
BigQuery (Define Your Own Schema)
GCP BigQuery Export
Snowflake (Snowplow Schema)
Snowplow Schema Overview
Snowflake (Define Your Own Schema)
Aliasing
Event
Audience
Event
Audience
Feed
Event
Audience
Cookie Sync
Event
Audience
Audience
Feed
Audience
Event
Event
Event
Audience
Event
Event
Data Warehouse
Event
Event
Event
Audience
Event
Event
Event
Event
Event
Audience
Event
Event
Event
Feed
Event
Audience
Event
Feed
Event
Event
Event
Data Warehouse
Custom Feed
Feed
Event
Audience
Event
Audience
Audience
Event
Audience
Event
Event
Event
Event
Event
Audience
Event
Audience
Event
Audience
Data Warehouse
Event
Audience
Cookie Sync
Event
Event
Event
Event
Feed
Event
Feed
Event
Event
Event
Audience
Event
Event
Audience
Event
Event
Event
Feed
Audience
Event
Audience
Audience
Event
Event
Audience
Audience
Audience
Event
Audience
Event
Event
Event
Event
Event
Feed
Event
Event
Event
Event
Feed
Audience
Event
Event
Event
Event
Event
Event
Event
Audience
Feed
Event
Event
Custom Pixel
Feed
Event
Event
Event
Event
Event
Event
Event
Audience
Event
Data Warehouse
Event
Audience
Audience
Audience
Audience
Event
Audience
Audience
Cookie Sync
Event
Feed
Audience
Event
Event
Event
Audience
Audience
Event
Event
Event
Audience
Cookie Sync
Event
Audience
Audience
Cookie Sync
Feed
The mParticle integration with Databricks allows you to forward your data from mParticle to Databricks. Databricks is a Delta Lake platform built on Apache Spark and facilitates both distributed data storage and computation. When connected to Databricks, arbitrary work, and SQL queries can be scheduled to run against a configured Compute Cluster
or SQL Warehouse
, whose capacity can be tailored according to your needs.
Before setting up the Databricks integration within mParticle, you must configure the following within your Databricks account:
A dedicated Service Principal to allow mParticle to upload data to your Databricks catalog.
A SQL Warehouse within Databricks is a computational resource that allows you to run SQL queries on your data. You will need to create a SQL Warehouse that mParticle can use when forwarding your data.
To create a new SQL warehouse:
To learn more about SQL warehouses and their configuration settings, visit the Databricks documentation: Create a SQL warehouse.
A service principal in Databricks is an API-only identity used to grant automated tools and applications, like mParticle, secure access to your data catalogs. mParticle will use the service principal you create to authenticate itself when forwarding your data to Databricks.
To create a new Service Principal:
To learn more about Service Principals, visit the Databricks documentation: Manage service principals.
All data in Databricks is organized within Catalogs. Catalogs contain Schemas that define the structure of your data, and tables that contain the data itself.
To create a new catalog:
Click Grant, enter the service principal you created in 2 Create a new Service Principal under Principals, and enable the following privileges to ensure mParticle can generate the necessary tables in your catalog:
You can learn more about catalogs in the Databricks documentation: What are catalogs in Databricks?
Within the Databricks data hierarchy, a schema is a subcomponent of a catalog that defines in more granularity how your data is organized and structured.
To create a new schema:
It is also possible to create a new schema using the Databricks SQL Editor.
To create an outbound configuration for Databricks within mParticle:
Deployment Name: The Databricks deployment name of the workspace containing the SQL Warehouse, Service Principal, and Catalog/Schema you created in the Prerequisites section.
https://<deployment-name>.databricks.com
, where <deployment-name>
is the deployment name.Event Stats Threshold: the number of events that must be reached before mParticle begins adding additional events to their own dedicated table.
For feed connections only: If you enable Split Partner Feed Data by Event Name, mParticle will separate partner feed data by each unique event name. If you leave this setting disabled, mParticle will place all data from a single partner feed into a single table.
For feed connections only: Enter an optional, custom name for the table mParticle will add your data to.
All Databricks tables that mParticle generates are created within the schema you created in step 4 of the prerequisites. Databricks refers to databases and schemas interchangeably. The schema you create when configuring this integration serves as the main database that will contain the actual tables of data forwarded from mParticle. For more information, read about schemas in the Databricks documentation.
When mParticle adds data to a table in your Databricks schema, all the main objects and fields listed in the mParticle JSON schema are automatically mapped to objects and fields within Databricks. This includes complex objects or collections, allowing you to forward any event data from mParticle to Databricks.
For example, see the following sample CommerceEvent
with a ProductAction
field after it has been forwarded to Databricks:
The same associated event data that was available in mParticle is queryable within Databricks.
When determining which table a given event will be added to in Databricks, mParticle employs a cache that tracks all events forwarded to Databricks in the given workspace within the last 30 days.
The Event Stats Threshold configuration setting is cross-referenced with this cache to determine how many events must be forwarded to Databricks before mParticle begins adding them to their own, dedicated table.
If a given event type’s frequency exceeds the configured Event Stats Threshold, then those events will start to be uploaded to their own dedicated table. Until the threshold is reached, events are uploaded to the common table with the name [your-schema-name]_otherevents
table.
The only exception to how mParticle adds your data to Databricks tables pertains to Feed connections. There are two special settings for Databricks Feed connections that can influence the tables events are added to.
When creating a Databricks connection, you can specify a name for a table that mParticle will create to store your feed data in. If you leave this setting blank, mParticle creates a table with the name set to the partner feed name. This setting is only applicable to feed inputs.
If Split Partner Feed Data by Event Name is enabled, this setting is ignored.
When creating a Databricks connection, you can enable a setting called Split Partner Feed Data by Event Name. If enabled, then mParticle will create a separate table for each unique event name forwarded to Databricks. If this setting is disabled, then all Partner feed data is added to a single table.
When forwarding event data to Databricks, mParticle generates an OAuth access token using the Service Principal credentials you set up in 2 Create a new Service Principal. After authenticating to Databricks with the OAuth access token, mParticle creates a new Unity Catalog Volume called mparticle_staging
under the schema you set up in 4 Create a new schema. This Unity Catalog Volume acts as a staging area for your data before it’s ultimately loaded into the appropriate Databricks table.
All data ingested into mParticle that is to be forwarded to Databricks is written to parquet
files, which are uploaded to the staging Unity Catalog Volume in Databricks. mParticle then automatically issues the necessary commands to load the parquet data into the respective tables, as well as clean-up any previously-loaded files.
mParticle forwards data to Databricks in bulk. By default, uploads occur every 90 minutes or until 100,000 messages have accumulated in the upload queue, whichever comes first.
Once data has been loaded into a given table in your Databricks workspace, it can be easily queried using standard SQL syntax. This can be accomplished from within Databricks’ SQL Editor.
Setting name | Type | Required? | Encrypted? | Default setting | Description |
---|---|---|---|---|---|
Deployment Name | string | yes | no | null | The databricks deployment that’s associated with the given Service Principal and SQL Warehouse. For example: if your Server Hostname is 1234.cloud.databricks.com , the Deployment Name that you should enter would be 1234 . |
Warehouse ID | string | yes | no | null | The SQL Warehouse ID upon which to execute SQL statements. |
Service Principal Client ID | string | yes | no | null | The dedicated Service Principal’s Client ID, which will be used to generate OAuth Access Tokens to facilitate future uploads. |
Service Principal Client Secret | string | yes | yes | null | The dedicated Service Principal’s Client Secret, which will be used to generate OAuth Access Tokens to facilitate future uploads. |
Catalog Name | string | yes | no | null | The default catalog for statement execution. |
Schema Name | string | yes | no | null | The default schema for statement execution. |
Events Threshold | int | yes | no | 10000 | The threshold to determine the number of events that need to be seen before we start forwarding them to their own, dedicated table. Until this threshold is reached, events will be uploaded to a common table. |
Setting name | Type | Required | Default | Input | Description |
---|---|---|---|---|---|
Databricks Table Name | string | no | null | Feed | Table name for this partner feed. If not set, the partner name will be used. Only applicable to feeds inputs, no effect on apps inputs. If “Split Partner Feed Data by Event Name” checkbox is enabled, this setting is not used. |
Split Partner Feed Data by Event Name | boolean | no | false | Feed | If enabled, split partner feed data by event name. Otherwise load data into the same table. |
Send Batches without Events | boolean | no | true | All | If enabled, an event batch that contains no events will be forwarded. |