Data Subject Request API Version 1 and 2
Data Subject Request API Version 3
Platform API Overview
Accounts
Apps
Audiences
Calculated Attributes
Data Points
Feeds
Field Transformations
Services
Users
Workspaces
Warehouse Sync API Overview
Warehouse Sync API Tutorial
Warehouse Sync API Reference
Data Mapping
Warehouse Sync SQL Reference
Warehouse Sync Troubleshooting Guide
ComposeID
Warehouse Sync API v2 Migration
Bulk Profile Deletion API Reference
Calculated Attributes Seeding API
Data Planning API
Custom Access Roles API
Group Identity API Reference
Pixel Service
Profile API
Events API
mParticle JSON Schema Reference
IDSync
AMP SDK
Initialization
Configuration
Network Security Configuration
Event Tracking
User Attributes
IDSync
Screen Events
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Push Notifications
WebView Integration
Logger
Preventing Blocked HTTP Traffic with CNAME
Linting Data Plans
Troubleshooting the Android SDK
API Reference
Upgrade to Version 5
Cordova Plugin
Identity
Direct URL Routing FAQ
Web
Android
iOS
Initialization
Configuration
Event Tracking
User Attributes
IDSync
Screen Tracking
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Push Notifications
Webview Integration
Upload Frequency
App Extensions
Preventing Blocked HTTP Traffic with CNAME
Linting Data Plans
Troubleshooting iOS SDK
Social Networks
iOS 14 Guide
iOS 15 FAQ
iOS 16 FAQ
iOS 17 FAQ
iOS 18 FAQ
API Reference
Upgrade to Version 7
Getting Started
Identity
Upload Frequency
Getting Started
Opt Out
Initialize the SDK
Event Tracking
Commerce Tracking
Error Tracking
Screen Tracking
Identity
Location Tracking
Session Management
Initialization
Content Security Policy
Configuration
Event Tracking
User Attributes
IDSync
Page View Tracking
Commerce Events
Location Tracking
Media
Kits
Application State and Session Management
Data Privacy Controls
Error Tracking
Opt Out
Custom Logger
Persistence
Native Web Views
Self-Hosting
Multiple Instances
Web SDK via Google Tag Manager
Preventing Blocked HTTP Traffic with CNAME
Facebook Instant Articles
Troubleshooting the Web SDK
Browser Compatibility
Linting Data Plans
API Reference
Upgrade to Version 2 of the SDK
Getting Started
Identity
Web
Alexa
Overview
Step 1. Create an input
Step 2. Verify your input
Step 3. Set up your output
Step 4. Create a connection
Step 5. Verify your connection
Step 6. Track events
Step 7. Track user data
Step 8. Create a data plan
Step 9. Test your local app
Overview
Step 1. Create an input
Step 2. Verify your input
Step 3. Set up your output
Step 4. Create a connection
Step 5. Verify your connection
Step 6. Track events
Step 7. Track user data
Step 8. Create a data plan
Overview
Step 1. Create an input
Step 2. Verify your input
Step 3. Set up your output
Step 4. Create a connection
Step 5. Verify your connection
Step 6. Track events
Step 7. Track user data
Step 8. Create a data plan
Step 1. Create an input
Step 2. Create an output
Step 3. Verify output
Node SDK
Go SDK
Python SDK
Ruby SDK
Java SDK
Introduction
Outbound Integrations
Firehose Java SDK
Inbound Integrations
Compose ID
Glossary
Data Hosting Locations
Migrate from Segment to mParticle
Migrate from Segment to Client-side mParticle
Migrate from Segment to Server-side mParticle
Segment-to-mParticle Migration Reference
Rules Developer Guide
API Credential Management
The Developer's Guided Journey to mParticle
Create an Input
Start capturing data
Connect an Event Output
Create an Audience
Connect an Audience Output
Transform and Enhance Your Data
The new mParticle Experience
The Overview Map
Introduction
Data Retention
Connections
Activity
Live Stream
Data Filter
Rules
Tiered Events
mParticle Users and Roles
Analytics Free Trial
Troubleshooting mParticle
Usage metering for value-based pricing (VBP)
Introduction
Sync and Activate Analytics User Segments in mParticle
User Segment Activation
Welcome Page Announcements
Project Settings
Roles and Teammates
Organization Settings
Global Project Filters
Portfolio Analytics
Analytics Data Manager Overview
Events
Event Properties
User Properties
Revenue Mapping
Export Data
UTM Guide
Data Dictionary
Query Builder Overview
Modify Filters With And/Or Clauses
Query-time Sampling
Query Notes
Filter Where Clauses
Event vs. User Properties
Group By Clauses
Annotations
Cross-tool Compatibility
Apply All for Filter Where Clauses
Date Range and Time Settings Overview
Understanding the Screen View Event
Analyses Introduction
Getting Started
Visualization Options
For Clauses
Date Range and Time Settings
Calculator
Numerical Settings
Assisted Analysis
Properties Explorer
Frequency in Segmentation
Trends in Segmentation
Did [not] Perform Clauses
Cumulative vs. Non-Cumulative Analysis in Segmentation
Total Count of vs. Users Who Performed
Save Your Segmentation Analysis
Export Results in Segmentation
Explore Users from Segmentation
Getting Started with Funnels
Group By Settings
Conversion Window
Tracking Properties
Date Range and Time Settings
Visualization Options
Interpreting a Funnel Analysis
Group By
Filters
Conversion over Time
Conversion Order
Trends
Funnel Direction
Multi-path Funnels
Analyze as Cohort from Funnel
Save a Funnel Analysis
Explore Users from a Funnel
Export Results from a Funnel
Saved Analyses
Manage Analyses in Dashboards
Dashboards––Getting Started
Manage Dashboards
Dashboard Filters
Organize Dashboards
Scheduled Reports
Favorites
Time and Interval Settings in Dashboards
Query Notes in Dashboards
User Aliasing
The Demo Environment
Keyboard Shortcuts
Analytics for Marketers
Analytics for Product Managers
Compare Conversion Across Acquisition Sources
Analyze Product Feature Usage
Identify Points of User Friction
Time-based Subscription Analysis
Dashboard Tips and Tricks
Understand Product Stickiness
Optimize User Flow with A/B Testing
User Segments
IDSync Overview
Use Cases for IDSync
Components of IDSync
Store and Organize User Data
Identify Users
Default IDSync Configuration
Profile Conversion Strategy
Profile Link Strategy
Profile Isolation Strategy
Best Match Strategy
Aliasing
Overview
Create and Manage Group Definitions
Introduction
Catalog
Live Stream
Data Plans
Blocked Data Backfill Guide
Predictive Attributes Overview
Create Predictive Attributes
Assess and Troubleshoot Predictions
Use Predictive Attributes in Campaigns
Predictive Audiences Overview
Using Predictive Audiences
Introduction
Profiles
Warehouse Sync
Data Privacy Controls
Data Subject Requests
Default Service Limits
Feeds
Cross-Account Audience Sharing
Approved Sub-Processors
Import Data with CSV Files
CSV File Reference
Glossary
Video Index
Single Sign-On (SSO)
Setup Examples
Introduction
Introduction
Introduction
Rudderstack
Google Tag Manager
Segment
Advanced Data Warehouse Settings
AWS Kinesis (Snowplow)
AWS Redshift (Define Your Own Schema)
AWS S3 Integration (Define Your Own Schema)
AWS S3 (Snowplow Schema)
BigQuery (Snowplow Schema)
BigQuery Firebase Schema
BigQuery (Define Your Own Schema)
GCP BigQuery Export
Snowflake (Snowplow Schema)
Snowplow Schema Overview
Snowflake (Define Your Own Schema)
Aliasing
Warehouse Sync is mParticle’s reverse-ETL solution, allowing you to use an external warehouse as a data source for your mParticle account. By ingesting data from your warehouse using either one-time, on-demand, or recurring syncs, you can benefit from mParticle’s data governance, personalization, and prediction features without having to modify your existing data infrastructure.
You can use Warehouse Sync to ingest both user and event from the following warehouse providers:
Map your warehouse data to fields in mParticle
All functionality of Warehouse Sync is also available from an API. To learn how to create and manage syncs between mParticle and your data warehouse using the API, visit the mParticle developer documentation.
Before beginning the warehouse sync setup, obtain the following information for your mParticle account. Depending on which warehouse you plan to use, you will need some or all of these values:
US1
, US2
, AU1
, or EU1
Your Pod AWS Account ID that correlates with your mParticle region ID:
338661164609
386705975570
526464060896
583371261087
To find your Org ID, Account ID, Pod ID, and Pod AWS Account ID:
From your web browser’s toolbar, navigate to view the page source for the mParticle app.
accountId
, orgId
, podName
, and podId
, as shown in the screenshot below.Work with your warehouse administrator or IT team to ensure your warehouse is reachable and accessible by mParticle.
First, you must create a user in your Snowflake account that mParticle can use to access your warehouse.
// Mark your new user as a legacy service user to exclude it from Snowflake's multifactor authentication policy
CREATE OR REPLACE USER {{user_name}} PASSWORD = "{{unique_secure_password}}" TYPE = LEGACY_SERVICE;
Where:
{{user_name}}
: The ID of the user that mParticle will log in as while executing SQL commands on your Snowflake instance.{{unique_secure_password}}
: a unique, strong password for your new user.mParticle recommends creating a dedicated role for your mParticle warehouse sync user created in step 1.1 above.
From the Users & Roles page of your Snowflake account’s Admin settings, select the Roles tab.
Click the + Role button.
Enter a name for your new role, and select the set of permissions to assign to the role using the under Grant to role dropdown.
Select your new role from the list of roles, and click the Grant to User button.
Under User to receive grant, select the new user you created in step 1.1, and click Grant.
Before continuing, make sure to save the following information, as you will need to refer to this when connecting Data Warehouse to your Snowflake account:
service_account_key
will be the contents of the generated JSON file. Save this value for your Postman setup.Navigate to your BigQuery instance from console.cloud.google.com.
project_id
is the first portion of Dataset ID (the portion before the .
). In the example above, it is mp-project
.dataset_id
is the second portion of Dataset ID (the portion immediately after the .
) In the example above, it is mp-dataset
.region
is the Data location. This is us-east4
in the example above.-- Create a unique user for mParticle
CREATE USER {{user_name}} WITH PASSWORD '{{unique_secure_password}}'
-- Grant schema usage permissions to the new user
GRANT USAGE ON SCHEMA {{schema_name}} TO {{user_name}}
-- Grant SELECT privilege on any tables/views mP needs to access to the new user
GRANT SELECT ON TABLE {{schema_name}}.{{table_name}} TO {{user_name}}
Replace {mp_pod_aws_account_id}
with one of the following values according to your mParticle instance’s location:
338661164609
386705975570
526464060896
583371261087
{
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Resource": "arn:aws:iam::{{mp_pod_aws_account_id}}:role/ingest-pipeline-data-external-{{mp_org_id}}-{{mp_acct_id}}",
"Sid": ""
}
],
"Version": "2012-10-17"
}
mparticle_redshift_assume_role_policy
, and click Create policy.mparticle_redshift_role
, and click Create role.Your configuration will differ between Amazon Redshift and Amazon Redshift Serverless. To complete your configuration, follow the appropriate steps for your use case below.
Make sure to save the value of your new role’s ARN. You will need to use this when setting up Postman in the next section.
mparticle_redshift_role
.mparticle_redshift_role
.Warehouse Sync uses the Databricks-to-Databricks Delta Sharing protocol to ingest data from Databricks into mParticle.
Complete the following steps to prepare your Databricks instance for Warehouse Sync.
mParticle_{YOUR-DATA-POD}
under Recipient name where {YOUR-DATA-POD}
is either us1
, us2
, eu1
, or au1
depending on the location of the data pod configured for your mParticle account.In Sharing identifier, enter one of the following identifiers below, depending on the location of your mParticle account’s data pod:
aws:us-east-1:e92fd7c1-5d24-4113-b83d-07e0edbb787b
aws:us-east-1:e92fd7c1-5d24-4113-b83d-07e0edbb787b
aws:eu-central-1:2b8d9413-05fe-43ce-a570-3f6bc5fc3acf
aws:ap-southeast-2:ac9a9fc4-22a2-40cc-a706-fef8a4cd554e
Within the Create share window, enter mparticle_{YOUR-MPARTICLE-ORG-ID}_{YOUR-MPARTICLE-ACCOUNT-ID}
under Share name where {YOUR-MPARTICLE-ORG-ID}
and {YOUR-MPARTICLE-ACCOUNT-ID}
are your mParticle Org and Account IDs.
Databricks Delta Sharing does not currently support the TIMESTAMP_NTZ data type.
Other data types that are not currently supported by the Databricks integration for Warehouse Sync (for both user and events data) include:
If you are ingesting events data through Warehouse Sync, the following data types are unsupported:
While multi-dimensional, or nested, arrays are unsupported, you can still ingest simple arrays with events data.
You can also create a new warehouse input from the Integrations Directory:
After selecting your warehouse provider, the Warehouse Sync setup wizard will open where you will:
The setup wizard presents different configuration options depending on the warehouse provider you select. Use the tabs below to view the specific setup instructions for Amazon Redshift, Google BigQuery, and Snowflake.
The configuration name is specific to mParticle and will appear in your list of warehouse inputs. You can use any configuration name, but it must be unique since this name is used when configuring the rest of your sync settings.
The database name identifies your database in Amazon Redshift. This must be a valid Amazon Redshift name, and it must exactly match the name for the database you want to connect to.
This is the unique string used to identify an IAM (Identity and Access Management) role within your AWS account. AWS IAM Role ARNs follow the format arn:aws:iam::account:role/role-name-with-path where account is replaced with your AWS account number, role is replaced with the role type in AWS, and role-name-with-path is replaced with the name and location for the role in your AWS account.
Learn more about AWS identifiers in IAM identifiers in the AWS documentation.
mParticle uses the host name and port number when connecting to, and ingesting data from, your warehouse.
Provide the username and password associated with the AWS IAM Role you entered in step 1.4. mParticle will use these credentials when logging into AWS before syncing data.
The configuration name is specific to mParticle and will appear in your list of warehouse inputs. You can use any configuration name, but it must be unique since this name is used when configuring the rest of your sync settings.
Enter the ID for the project in BigQuery containing the dataset. You can find your Project ID from the Google BigQuery Console.
Enter the ID for the dataset you’re connecting to. You can find your Dataset ID from the Google BigQuery Console.
Enter the region where your dataset is localized. For example, aws-us-east-1
or aws-us-west-2
. You can find the region for your data set from the Google BigQuery Console.
Finally, enter the service account ID and upload a JSON file containing your source account key associated with the Project ID you entered in step 1.3. mParticle uses this information to log into BigQuery on your behalf when syncing data.
Your service account ID must match the source account key.
The configuration name is specific to mParticle and will appear in your list of warehouse inputs. You can use any configuration name, but it must be unique since this name is used when configuring the rest of your sync settings.
Select either Prod or Dev depending on whether your want your warehouse data sent to the mParticle development or production environments. This setting determines how mParticle processes ingested data.
Your Snowflake account ID uniquely identifies your Snowflake account. You can find your Snowflake account ID by logging into your Snowflake account and finding your account_locator, cloud_region_id, and cloud. More details here and here.
The Snowflake warehouse name is used to find the specific database you are connecting to. Each Snowflake warehouse can contain multiple databases.
The specific database you want to sync data from.
Enter the region identifier for where your Snowflake data is localized.
Finally, you must provide the username, password, and role specific to the database in Snowflake you are connecting to. These credentials are independent from the username and password you use to access the main Snowflake portal. If you don’t have these credentials, contact the Snowflake database administrator on your team.
The configuration name is specific to mParticle and will appear in your list of warehouse inputs. You can use any configuration name, but it must be unique since this name is used when configuring the rest of your sync settings.
The value you enter for your Databricks provider must match the value of the Databricks organization that contains the schema you want to ingest data from. This is the same value that you saved when following Step 4 of the Data Warehouse setup, Find and save your Databricks provider name, prior to creating your warehouse connection.
Enter the name of the schema in your Databricks account that you want to ingest data from. Databricks uses the terms database and schema interchangeably, so in this situation the schema is the specific collection of tables and views that mParticle will access through this Warehouse Sync connection.
Your data model describes which columns from your warehouse to ingest into mParticle, and which mParticle fields each column should map to. While mParticle data models are written in SQL, all warehouse providers process SQL slightly differently so it is important to use the correct SQL syntax for the warehouse provider you select.
For a detailed reference of all SQL commands Warehouse Sync supports alongside real-world example SQL queries, see Warehouse Sync SQL reference.
mParticle submits the SQL query you provide to your warehouse to retrieve specific columns of data. Depending on the SQL operators and functions you use, the columns selected from your database are transformed, or mapped, to user profile attributes in your mParticle account.
If you use an attribute name in your SQL query that doesn’t exist in your mParticle account, mParticle creates an attribute with the same name and maps this data to the new attribute.
mParticle automatically maps matching column names in your warehouse to reserved mParticle user attributes and device ids. For example, if your database contains a column named customer_id
, it is automatically mapped to the customer_id
user identifier in mParticle. For a complete list of reserved attribute names, see Reserved mParticle user or device identity column names.
Below is an example of a simple table and SQL query to create a data model:
Table name: mp.demo.userdata
Column names: | first_name | last_name | |
---|---|---|---|
John | Doe | john@example.com |
Example SQL query:
SELECT
first_name,
last_name,
email
FROM mp.demo.userdata
This SQL query selects the first_name, last_name, and email columns from the table called mp.demo.userdata
. In the next step, we will set up the mappings for this user data.
After creating a data model that specifies which columns in your warehouse you want to ingest, you must map each column to its respective field within mParticle with a data mapping.
To create a data mapping, first use the dropdown menu titled Type of data to sync to select either User Attributes & Identities Only or Events, depending on whether you want to ingest user data or event data.
To continue with our example user data table and SQL query from above, we’ll select User Attributes & Identities Only:
When mapping attributes and identities from your warehouse to fields in mParticle, you must always create a user_identities
or user_attributes
object first. You can then create the individual mappings for your identities and attributes within these entities, as shown in the next two sections.
To map your user identities:
Object
.user_identites
.user_identities
object, click the + button.Column
.email
.Email Address
.To map your user attributes:
Object
.user_attributes
.user_attributes
object, click the + button.Column
.$unmapped
to select all currently unmapped columns.$unmapped
selects all columns that are not already mapped to mParticle. In our example, because we’ve already mapped the email
column to the Email Address
field in mParticle, it will be excluded. $unmapped
for your warehouse column selection, you can enter an asterisk (*
) for Field in mParticle to use each unmapped column name in your warehouse as the respective field name in mParticle. This allows you to map all of your attributes at once so you don’t have to create a separate mapping for each individual attribute.If you are ingesting event data instead of user data (as shown above), select Events under Type of data to sync once you reach the Review Mapping step.
Object
.events
.events
object, click the + button to add a new mapping. Select the type of mapping you want to use from the Mapping Type dropdown:
Column
- maps a column from your database to a field in mParticleStatic
- maps a static value that you define to a field in mParticle Ignore
- prevents a field that you specify from being ingested into mParticleThe next steps will vary depending on the data you are ingesting and the mapping type you select. Following are several examples of how to use each mapping type.
String
, Number
, or Boolean
for the data type of the static value you are mapping.To add additional mappings, click Add Mapping. You must create a mapping for every column you selected in your data model.
When you have finished creating your mappings, click Next.
The sync frequency settings determine when the initial sync with your database will occur, and how frequently any subsequent syncs will be executed.
Select either Prod or Dev depending on whether you want your warehouse data sent to the mParticle development or production environments. This setting determines how mParticle processes ingested data.
Input protection levels determine how data ingested from your warehouse can contribute to new or existing user profiles in mParticle:
To learn more about these settings and how they can be used in different scenarios, see Input protections.
There are two sync modes: incremental and full.
The remaining setting options change depending on the mode you select from the Sync mode dropdown menu. Navigate between the two configuration options using the tabs below:
Following are the sync settings for incremental syncs:
The main difference between full and incremental sync configurations is the use of an iterator field for incremental syncs. Both full and incremental syncs support all three sync modes (Interval, Once, and On Demand).
Select the column name from your warehouse using the Iterator dropdown menu. The options are generated from the SQL query you ran when creating your data model.
mParticle uses the iterator to determine which data to include and exclude during each incremental sync.
Select the iterator data type using the Iterator data type dropdown. This value must match the datatype of the iterator as it exists in your warehouse.
Select one of the following schedule types:
Sync frequencies can be either Hourly, Daily, Weekly, or Monthly.
The date and time you select for your initial sync is used to calculate the date and time for the next sync. For example, if you select Hourly for your frequency and 11/15/2023 07:00 PM UTC
for your initial sync, then the next sync will occur at 11/15/2023 08:00 PM UTC
.
Use the date and time picker to select the calendar date and time (in UTC) for your initial sync. Subsequent interval syncs will be scheduled based on this initial date and time.
Following are the sync settings for full syncs:
The main difference between full and incremental sync configurations is the use of an iterator field for incremental syncs. Both full and incremental syncs support all three sync modes (Interval, Once, and On Demand).
Select one of the following schedule types:
Sync frequencies can be either Hourly, Daily, Weekly, or Monthly.
The date and time you select for your initial sync is used to calculate the date and time for the next sync. For example, if you select Hourly for your frequency and 11/15/2023 07:00 PM UTC
for your initial sync, then the next sync will occur at 11/15/2023 08:00 PM UTC
.
Use the date and time picker to select the calendar date and time (in UTC) for your initial sync. Subsequent interval syncs will be scheduled based on this initial date and time.
The value you select for Sync Start Date determines how much old, historical data mParticle ingests from your warehouse in your initial sync. When determining how much historical data to ingest, mParticle uses to the column in your database you selected as the Timestamp field in the Create Data Model step.
After your initial sync begins, mParticle begins ingesting any historical data. If mParticle hasn’t finished ingesting historical data before the time a subsequent sync is due to start, the subsequent sync is still executed, and the historical data continues syncing in the background.
Historical data syncing doesn’t contribute to any rate limiting on subsequent syncs.
After entering your sync settings, click Next.
mParticle generates a preview for Data Warehouse syncs that have been configured, but not yet activated. Use this page and the sample enriched user profiles to confirm the following:
After reviewing your sync configuration, click Activate to activate your sync.
Any configured warehouse syncs are listed on this page, grouped by warehouse provider. Expand a warehouse provider to view and manage a sync.
To view details for an existing sync, select it from the list of syncs. A summary page is displayed, showing the current status (Active or Paused), sync frequency, and a list of recent or in-progress syncs.
To pause a sync, click Pause Sync. Paused syncs will only resume running on their configured schedule after you click Resume.
To run an on-demand sync, click Run Sync under Sync Frequency.
Use the Data Model, Mapping, and Settings tabs to view and edit your sync configuration details. Clicking Edit from any of these tabs opens the respective step of the setup wizard where you can make and save your changes.
Was this page helpful?