Developers
API References
Data Subject Request API

Data Subject Request API Version 1 and 2

Data Subject Request API Version 3

Platform API

Key Management

Platform API Overview

Accounts

Apps

Audiences

Calculated Attributes

Data Points

Feeds

Field Transformations

Services

Users

Workspaces

Warehouse Sync API

Warehouse Sync API Overview

Warehouse Sync API Tutorial

Warehouse Sync API Reference

Data Mapping

Warehouse Sync SQL Reference

Warehouse Sync Troubleshooting Guide

ComposeID

Warehouse Sync API v2 Migration

Bulk Profile Deletion API Reference

Calculated Attributes Seeding API

Custom Access Roles API

Data Planning API

Group Identity API Reference

Pixel Service

Profile API

Events API

mParticle JSON Schema Reference

IDSync

Client SDKs
AMP

AMP SDK

Android

Initialization

Configuration

Network Security Configuration

Event Tracking

User Attributes

IDSync

Screen Events

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

WebView Integration

Logger

Preventing Blocked HTTP Traffic with CNAME

Linting Data Plans

Troubleshooting the Android SDK

API Reference

Upgrade to Version 5

Cordova

Cordova Plugin

Identity

Direct Url Routing

Direct URL Routing FAQ

Web

Android

iOS

Flutter

Getting Started

Usage

API Reference

iOS

Workspace Switching

Initialization

Configuration

Event Tracking

User Attributes

IDSync

Screen Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

Webview Integration

Upload Frequency

App Extensions

Preventing Blocked HTTP Traffic with CNAME

Linting Data Plans

Troubleshooting iOS SDK

Social Networks

iOS 14 Guide

iOS 15 FAQ

iOS 16 FAQ

iOS 17 FAQ

iOS 18 FAQ

API Reference

Upgrade to Version 7

React Native

Getting Started

Identity

Roku

Getting Started

Identity

Media

Unity

Upload Frequency

Getting Started

Opt Out

Initialize the SDK

Event Tracking

Commerce Tracking

Error Tracking

Screen Tracking

Identity

Location Tracking

Session Management

Xbox

Getting Started

Identity

Web

Initialization

Configuration

Content Security Policy

Event Tracking

User Attributes

IDSync

Page View Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Custom Logger

Persistence

Native Web Views

Self-Hosting

Multiple Instances

Web SDK via Google Tag Manager

Preventing Blocked HTTP Traffic with CNAME

Facebook Instant Articles

Troubleshooting the Web SDK

Browser Compatibility

Linting Data Plans

API Reference

Upgrade to Version 2 of the SDK

Xamarin

Getting Started

Identity

Alexa

Media SDKs

iOS

Android

Web

Quickstart
Android

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Step 9. Test your local app

iOS Quick Start

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

HTTP Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Java Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Node Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Python Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Web

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Server SDKs

Node SDK

Go SDK

Python SDK

Ruby SDK

Java SDK

Tools

mParticle Command Line Interface

Linting Tools

Smartype

Guides
Partners

Introduction

Outbound Integrations

Outbound Integrations

Firehose Java SDK

Inbound Integrations

Kit Integrations

Overview

Android Kit Integration

JavaScript Kit Integration

iOS Kit Integration

Compose ID

Data Hosting Locations

Glossary

Migrate from Segment to mParticle

Migrate from Segment to mParticle

Migrate from Segment to Client-side mParticle

Migrate from Segment to Server-side mParticle

Segment-to-mParticle Migration Reference

Rules Developer Guide

API Credential Management

The Developer's Guided Journey to mParticle

Guides
Customer 360

Overview

User Profiles

Overview

User Profiles

Group Identity

Overview

Create and Manage Group Definitions

Calculated Attributes

Calculated Attributes Overview

Using Calculated Attributes

Create with AI Assistance

Calculated Attributes Reference

Predictive Attributes

What are predictive attributes?

Predict Future Behavior

Create Future Prediction

Use Future Predictions in Campaigns

Assess and Troubleshoot Predictions

Next Best Action

Next Best Action Overview

Create a Next Best Action (NBA)

View and Manage NBAs

Activate Next Best Actions in Campaigns

Getting Started

Create an Input

Start capturing data

Connect an Event Output

Create an Audience

Connect an Audience Output

Transform and Enhance Your Data

Platform Guide
Billing

Usage and Billing Report

The New mParticle Experience

The new mParticle Experience

The Overview Map

Observability

Observability Overview

Observability User Guide

Observability Troubleshooting Examples

Observability Span Glossary

Platform Settings

Key Management

Event Forwarding

Notification Center (Early Access)

System Alerts

Trends

Introduction

Data Retention

Data Catalog

Connections

Activity

Data Plans

Live Stream

Filters

Rules

Blocked Data Backfill Guide

Tiered Events

mParticle Users and Roles

Analytics Free Trial

Troubleshooting mParticle

Usage metering for value-based pricing (VBP)

Segmentation
New Audiences Experience

Audiences Overview

Create an Audience

Connect an Audience

Manage Audiences

FAQ

Classic Audiences Experience

Real-time Audiences (Legacy)

Standard Audiences (Legacy)

Journeys

Journeys Overview

Manage Journeys

Download an audience from a journey

Audience A/B testing from a journey

New vs. Classic Experience Comparison

Predictive Audiences

Predictive Audiences Overview

Using Predictive Audiences

IDSync

IDSync Overview

Use Cases for IDSync

Components of IDSync

Store and Organize User Data

Identify Users

Default IDSync Configuration

Profile Conversion Strategy

Profile Link Strategy

Profile Isolation Strategy

Best Match Strategy

Aliasing

Analytics

Introduction

Core Analytics (Beta)

Setup

Sync and Activate Analytics User Segments in mParticle

User Segment Activation

Welcome Page Announcements

Settings

Project Settings

Roles and Teammates

Organization Settings

Global Project Filters

Portfolio Analytics

Analytics Data Manager

Analytics Data Manager Overview

Events

Event Properties

User Properties

Revenue Mapping

Export Data

UTM Guide

Analyses

Analyses Introduction

Segmentation: Basics

Getting Started

Visualization Options

For Clauses

Date Range and Time Settings

Calculator

Numerical Settings

Segmentation: Advanced

Assisted Analysis

Properties Explorer

Frequency in Segmentation

Trends in Segmentation

Did [not] Perform Clauses

Cumulative vs. Non-Cumulative Analysis in Segmentation

Total Count of vs. Users Who Performed

Save Your Segmentation Analysis

Export Results in Segmentation

Explore Users from Segmentation

Funnels: Basics

Getting Started with Funnels

Group By Settings

Conversion Window

Tracking Properties

Date Range and Time Settings

Visualization Options

Interpreting a Funnel Analysis

Funnels: Advanced

Group By

Filters

Conversion over Time

Conversion Order

Trends

Funnel Direction

Multi-path Funnels

Analyze as Cohort from Funnel

Save a Funnel Analysis

Export Results from a Funnel

Explore Users from a Funnel

Cohorts

Getting Started with Cohorts

Analysis Modes

Save a Cohort Analysis

Export Results

Explore Users

Saved Analyses

Manage Analyses in Dashboards

Journeys

Getting Started

Event Menu

Visualization

Ending Event

Save a Journey Analysis

Users

Getting Started

User Activity Timelines

Time Settings

Export Results

Save A User Analysis

Query Builder

Data Dictionary

Query Builder Overview

Modify Filters With And/Or Clauses

Query-time Sampling

Query Notes

Filter Where Clauses

Event vs. User Properties

Group By Clauses

Annotations

Cross-tool Compatibility

Apply All for Filter Where Clauses

Date Range and Time Settings Overview

User Attributes at Event Time

Understanding the Screen View Event

User Aliasing

Dashboards

Dashboards––Getting Started

Manage Dashboards

Dashboard Filters

Organize Dashboards

Scheduled Reports

Favorites

Time and Interval Settings in Dashboards

Query Notes in Dashboards

Analytics Resources

The Demo Environment

Keyboard Shortcuts

User Segments

Tutorials

Analytics for Marketers

Analytics for Product Managers

Compare Conversion Across Acquisition Sources

Analyze Product Feature Usage

Identify Points of User Friction

Time-based Subscription Analysis

Dashboard Tips and Tricks

Understand Product Stickiness

Optimize User Flow with A/B Testing

APIs

User Segments Export API

Dashboard Filter API

Warehouse Sync

Warehouse Sync User Guide

Historical Data and Warehouse Sync

Data Privacy Controls

Data Subject Requests

Default Service Limits

Feeds

Cross-Account Audience Sharing

Approved Sub-Processors

Import Data with CSV Files

Import Data with CSV Files

CSV File Reference

Glossary

Video Index

Analytics (Deprecated)
Identity Providers

Single Sign-On (SSO)

Setup Examples

Settings

Debug Console

Data Warehouse Delay Alerting

Introduction

Developer Docs

Introduction

Integrations

Introduction

Rudderstack

Google Tag Manager

Segment

Data Warehouses and Data Lakes

Advanced Data Warehouse Settings

AWS Kinesis (Snowplow)

AWS Redshift (Define Your Own Schema)

AWS S3 Integration (Define Your Own Schema)

AWS S3 (Snowplow Schema)

BigQuery (Snowplow Schema)

BigQuery Firebase Schema

BigQuery (Define Your Own Schema)

GCP BigQuery Export

Snowflake (Snowplow Schema)

Snowplow Schema Overview

Snowflake (Define Your Own Schema)

APIs

Dashboard Filter API (Deprecated)

REST API

User Segments Export API (Deprecated)

SDKs

SDKs Introduction

React Native

iOS

Android

Java

JavaScript

Python

Object API

Developer Basics

Aliasing

Integrations
24i

Event

Abakus

Event

Aarki

Audience

ABTasty

Audience

Actable

Feed

AdChemix

Event

Adikteev

Audience

Event

AdMedia

Audience

Adjust

Event

Feed

Adobe Marketing Cloud

Cookie Sync

Server-to-Server Events

Platform SDK Events

Adobe Audience Manager

Audience

Adobe Campaign Manager

Audience

Adobe Target

Audience

AdPredictive

Feed

AgilOne

Event

Airship

Audience

Event

Feed

Algolia

Event

AlgoLift

Event

Feed

Amazon Advertising

Audience

Alooma

Event

Amazon Kinesis

Event

Amazon Kinesis Firehose

Audience

Event

Amazon Redshift

Data Warehouse

Amazon S3

Event

Amazon SNS

Event

Amazon SQS

Event

Amobee

Audience

Amplitude

Forwarding Data Subject Requests

Event

Ampush

Audience

Event

Analytics

Audience

Forwarding Data Subject Requests

Event

Antavo

Feed

Anodot

Event

AppLovin

Audience

Event

AppsFlyer

Event

Forwarding Data Subject Requests

Feed

Apptimize

Event

Apteligent

Event

Attentive

Event

Feed

Attractor

Event

Batch

Audience

Event

Awin

Event

Microsoft Azure Blob Storage

Event

Bing Ads

Event

Bidease

Audience

Bluecore

Event

Bluedot

Feed

Blueshift

Event

Forwarding Data Subject Requests

Feed

Branch

Event

Forwarding Data Subject Requests

Feed

Braze

Audience

Forwarding Data Subject Requests

Feed

Event

Branch S2S Event

Event

Bugsnag

Event

Cadent

Audience

Button

Audience

Event

Census

Feed

CleverTap

Event

Audience

Feed

ciValue

Event

Feed

comScore

Event

Conversant

Event

Cordial

Audience

Feed

Cortex

Event

Feed

Forwarding Data Subject Requests

Crossing Minds

Event

Custom Feed

Custom Feed

Criteo

Audience

Event

Customer.io

Audience

Event

Feed

CustomerGlu

Feed

Event

Databricks

Data Warehouse

Didomi

Event

Datadog

Event

Dynalyst

Audience

Dynamic Yield

Audience

Event

Edge226

Audience

Emarsys

Audience

Epsilon

Event

Everflow

Audience

Facebook

Audience

Event

Facebook Offline Conversions

Event

Fiksu

Audience

Event

Google Analytics for Firebase

Event

Flurry

Event

Flybits

Event

ForeSee

Event

Formation

Event

Feed

FreeWheel Data Suite

Audience

Google Ad Manager

Audience

Foursquare

Audience

Feed

Friendbuy

Event

Google Ads

Audience

Event

Google Analytics 4

Event

Google Analytics

Event

Google BigQuery

Audience

Data Warehouse

Google Cloud Storage

Audience

Event

Google Enhanced Conversions

Event

Google Marketing Platform

Audience

Cookie Sync

Event

Google Pub/Sub

Event

Google Marketing Platform Offline Conversions

Event

Google Tag Manager

Event

Herow

Feed

Heap

Event

Hightouch

Feed

Hyperlocology

Event

Ibotta

Event

ID5

Kit

InMarket

Audience

Impact

Event

InMobi

Event

Audience

Inspectlet

Event

Insider

Audience

Event

Feed

Intercom

Event

ironSource

Audience

Iterable

Audience

Event

Feed

Jampp

Audience

Event

iPost

Audience

Feed

Kafka

Event

Kissmetrics

Event

Kayzen

Event

Audience

Klaviyo

Audience

Event

Kubit

Event

Kochava

Event

Forwarding Data Subject Requests

Feed

LaunchDarkly

Feed

LifeStreet

Audience

Leanplum

Audience

Event

Feed

Liftoff

Audience

Event

LiveLike

Event

LinkedIn

LinkedIn Conversions API Integration

Liveramp

Audience

Localytics

Event

MadHive

Audience

mAdme Technologies

Event

Marigold

Audience

Mailchimp

Audience

Feed

Event

Mautic

Audience

Event

MediaMath

Audience

Mediasmart

Audience

Microsoft Azure Event Hubs

Event

Mintegral

Audience

Mixpanel

Audience

Event

Forwarding Data Subject Requests

MoEngage

Audience

Event

Feed

Monetate

Event

Moloco

Audience

Event

Movable Ink

Event

Multiplied

Event

Movable Ink - V2

Event

myTarget

Audience

Event

Nanigans

Event

Nami ML

Feed

NCR Aloha

Event

Narrative

Audience

Event

Feed

Neura

Event

OneTrust

Event

Optimizely

Event

Audience

Oracle BlueKai

Event

Oracle Responsys

Audience

Event

Paytronix

Feed

Personify XP

Event

Persona.ly

Audience

PieEye

Inbound Data Subject Requests

Pilgrim

Event

Feed

Pinterest

Event

Audience

Plarin

Event

Primer

Event

Postie

Audience

Event

Punchh

Audience

Event

Feed

Apptentive

Event

Radar

Event

Feed

Qualtrics

Event

Quantcast

Event

Rakuten

Event

Reddit

Event

Audience

Regal

Event

Remerge

Event

Audience

Retina AI

Event

Feed

Reveal Mobile

Event

RevenueCat

Feed

Rokt

Audience

Event

RTB House

Audience

Event

Salesforce Email

Audience

Event

Feed

Sailthru

Audience

Event

Salesforce Mobile Push

Event

Salesforce Sales and Service Cloud

Event

Samba TV

Audience

Event

Scalarr

Event

SessionM

Event

Feed

SendGrid

Audience

Feed

ShareThis

Audience

Feed

Shopify

Custom Pixel

Feed

Signal

Event

SimpleReach

Event

Singular

Feed

Event

Singular-DEPRECATED

Event

Skyhook

Event

Slack

Event

Smadex

Audience

Snapchat Conversions

Event

Snapchat

Event

Audience

SmarterHQ

Event

Snowflake

Data Warehouse

Snowplow

Event

Split

Event

Feed

Splunk MINT

Event

Sprig

Event

Audience

Statsig

Feed

Event

StartApp

Audience

Stormly

Event

Audience

Swrve

Event

Feed

Tapad

Audience

Tapjoy

Audience

Talon.One

Audience

Event

Loyalty Feed

Feed

Taptica

Audience

Taplytics

Event

Teak

Audience

The Trade Desk

Audience

Cookie Sync

Event

Ticketure

Feed

TikTok Event

Audience

Event

Audience (Deprecated)

Audience Migration

Treasure Data

Event

Audience

Triton Digital

Audience

Twitter

Audience

Event

Valid

Event

Vkontakte

Audience

Voucherify

Audience

Event

TUNE

Event

Vungle

Audience

Webhook

Event

Webtrends

Event

White Label Loyalty

Event

Wootric

Event

Xandr

Audience

Cookie Sync

Yahoo (formerly Verizon Media)

Audience

Cookie Sync

Yotpo

Feed

YouAppi

Audience

Z2A Digital

Audience

Event

Zendesk

Event

Feed

Pushwoosh

Audience

Event

Quadratic Labs

Event

Historical Data and Warehouse Sync

When ingesting historical event data into mParticle with Warehouse Sync, it is important to consider historical data handling, data quality, data retention, and platform limits. Historical data is processed differently by mParticle, so pipelines that ingest it have special considerations and requirements.

This guide outlines key points and best practices for handling historical and large-volume data ingestion.

There are two approaches to ingesting data with Warehouse Sync:

  • Full syncs: Ingesting data in batches, on demand. Full pipelines re-read the entire result set every time they run and are well-suited for manual syncs of a specific set of data, small tables, or ad-hoc, on-demand replays/retries.
  • Incremental syncs: Ingesting data incrementally, on a schedule. Incremental pipelines require an iterator column and only fetch rows whose iterator value is greater than the last successful run. This is the recommended approach for ongoing data syncs.

When ingesting historical data, you can use a combination of these two pipeline types to backfill old data and then switch to an ongoing incremental sync.

What is historical data?

As a general rule, any event with a timestamp older than 30 days is considered historical. This data requires special handling to ensure it is processed correctly and made available for long-term use cases like audience segmentation with extended lookback windows.

How mParticle processes historical data

Data that you flag as historical is handled differently from real-time data:

  • No real-time forwarding: Historical events are not forwarded to any connected event, data warehouse, or audience outputs. This prevents old data from triggering real-time workflows in downstream systems.
  • Optimized for historical workflows: The data is retained and can power features that rely on long-term activity (for example, audiences with extended look-back windows), though are subject to your account’s data retention policies.
  • Delayed UI visibility: Because historical uploads bypass the real-time processing stream, these events and the profile updates they generate may take longer to appear in interfaces such as Live Stream or the User Activity view. Generally, historical data is available in the User Activity view within 24 hours of ingestion.
  • Data retention limits still apply: mParticle enforces data retention policies that define how long your data remains available and if older data is made available for use. Before loading very old events, coordinate with your mParticle account team to ensure your retention configuration can accept them; otherwise, the data may be discarded during ingest.

All other processing rules for identity resolution and storage remain the same for historical data.

Types of historical data you can ingest

A core concept in mParticle is the relationship between user data (attributes and identities) and event data. Understanding this is critical for a successful historical data ingest.

Data is sent to mParticle in batches (multiple events batched together) to optimize throughput. Each batch contains event data, and provides context about the user in the form of user attributes and identities.

  • Events are actions a user takes at a specific point in time. For example, a product_view event captures that a user looked at a product. Event data is captured and stored as events using mParticle’s event data format.
  • User Attributes are properties that describe a user, like membership_tier: gold.
  • User Identities are the identifiers used to recognize a user, like an email address or customer ID.

User attributes and identities are stored in a persistent User Profile, which creates a complete, 360-degree view of your user.

When ingesting historical data, you can include user attributes and identities in the same batch as your events. However, in many backfill scenarios, it’s more effective to ingest them separately. This is particularly true when your goal is to have the final user profile reflect the most recent information about a user, rather than the state they were in when a historical event occurred. For example, you can send a batch containing only user_attributes and user_identities to update a user’s profile to ensure that the profile is accurate and up-to-date, then ingest the users’ events when the profile reflects the most recent information.

Historical data ingestion limits and performance considerations

mParticle enforces service limits specific to Warehouse Sync, including limits on:

  • The number of active pipelines per account
  • The number of records ingested per interval type (hourly, daily, weekly, and monthly)
  • The number of historical records ingested
  • The number of database columns ingested

Warehouse Sync ingests data at a rate according to your account’s configured limits and expected data volumes. When the Warehouse Sync API or UI shows a “success” status, it means your data has been ingested can be used by downstream features and sent to output integrations. Keep in mind that while recent non-historical data may be available quickly, some downstream features and integrations may require additional processing time before all ingested data is available. This is especially true for large or historical data loads that are processed differently from real-time data and connected outputs which have their own processing times.

  • Review the Default Service Limits before ingesting a large volume of data.
  • Estimate the total number of records you plan to ingest, both for the initial historical load and for ongoing scheduled runs. Consider whether you will need to re-ingest historical data for testing or validation, and how many new or updated records will be included in each subsequent interval.
  • Based on your estimates, coordinate with your mParticle Customer Success team if you anticipate exceeding any limits.

    • Additionally, communicate your timeline so the large data load can be processed according to your requirements.

Distinguishing batch and event timestamps

When ingesting historical data, it’s essential to distinguish between batch timestamps and event timestamps:

  • Batch timestamp: Indicates when the batch was created. Setting this is optional and should only be used if you need to control the order in which user attributes are synchronized—specifically, when updates may arrive out of order and must be processed according to the batch’s timestamp.
  • Event timestamp: Represents when each event actually occurred. Event timestamps must not be later than the batch timestamp and are subject to your account’s data retention window. You can control how the platform processes events based on the event timestamp using the source_info.is_historical field.

Key points

  • If you are ingesting event data, do not set the batch timestamp. Let mParticle use the event timestamps to determine event order and processing.
  • If you need to update user attributes based on when they were last changed, set the batch timestamp to match the user attribute’s timestamp, and ingest these updates separately from event data.

This approach ensures that both event and profile data are processed accurately and in the correct order, while avoiding issues with data retention, forwarding, and audience availability.

Strategies for ingesting historical data

There are two common strategies for ingesting historical data efficiently:

  1. Single incremental pipeline with an early from date – Set the from value to the start of the period you want to ingest, and leave until blank. The initial run will backfill the entire range, after which the pipeline will automatically switch to incremental updates on its schedule.
  2. Separate historical full and incremental pipelines – Create a full On Demand pipeline and use a WHERE clause in your data model to limit the dataset (for example, WHERE event_timestamp BETWEEN '2024-01-01' AND '2024-06-30'). Trigger the pipeline whenever you need to ingest another slice, updating the clause between runs as needed. Once you have finished ingesting the historical data, disable the full pipeline and create a new incremental pipeline to sync ongoing new data on a regular schedule, setting the from value to the last date you ingested with the full pipeline.

How to create a pipeline for historical data

This section provides an overview of creating a Warehouse Sync pipeline specifically for ingesting historical data. For a complete, detailed walkthrough of how to create a Warehouse Sync pipeline, refer to the main Warehouse Sync setup guide.

1. Connect to your warehouse

The first step is to connect mParticle to your data warehouse. This process is the same for all Warehouse Sync pipelines.

2. Create a data model

Your data model is the SQL query that defines what data to pull from your warehouse. When ingesting historical data, your query needs special consideration.

Mark data as historical

To ensure mParticle correctly identifies and processes your historical data, you must explicitly flag it when creating your data model. The “Sync Historical Data” setting in your Warehouse Sync pipeline only controls how far back the pipeline reads from your warehouse; it does not automatically mark old data as historical.

Example SQL:

Add a CASE statement to your SQL query to create an is_historical column.

SELECT
  *,
  CASE
    WHEN event_timestamp < CURRENT_DATE - INTERVAL '30 days' THEN TRUE
    ELSE FALSE
  END AS is_historical
FROM your_table

Filter the data you ingest

Warehouse Sync provides two complementary ways to control which rows are pulled from your warehouse:

SQL WHERE clauses

Add predicates such as WHERE event_timestamp >= DATE '2023-01-01' or a bounded BETWEEN statement inside the query you save with the data model. The filter executes in your warehouse on every run. This is recommended for filtering on fields other than the iterator field, such as event timestamp, event type, or user region, and is useful for validating your pipeline by limiting the dataset to a small subset of data.

Iterator window (from / until)

Use the Sync Historical Data setting in the UI or use the from and until fields on sync_mode when calling the API to set the minimum and maximum iterator values a pipeline will ever request. This is recommended for a seamless transition between your initial backfill and ongoing incremental syncs, and makes your pipeline runs more easily auditable via the pipeline status APIs.

A row must satisfy both the iterator window and your SQL to be ingested, giving you granular control over the time span and the business logic applied to your data.

Choose an iterator column

When using incremental pipelines in Warehouse Sync, you must specify an iterator column—a timestamp field (such as datetime, date, or Unix timestamp) that tracks which rows have already been processed. This iterator column is essential for reliable incremental updates and should be distinct from your event timestamp field whenever possible.

Best practices for iterator columns:

  • Use a dedicated iterator column (such as a system timestamp indicating when a row was inserted or updated) for incremental syncs. This is preferred because the time an event occurred (event timestamp) often differs from when the row was added or updated in your warehouse (iterator/system time).
  • Use the iterator column to track which rows have been processed, ensuring reliable incremental updates.
  • Use the event timestamp field for filtering in your SQL (for example, in a WHERE clause) and for mapping to the appropriate mParticle field.
  • When setting up your mappings, you may ignore the iterator column if it’s only used for sync tracking.
  • You may use the event timestamp field as the iterator column if convenient, but this is not recommended for most use cases.
  • You can offset the iterator column by a fixed amount to accommodate late-arriving data. For example, if data typically arrives in your warehouse one day after creation, set the pipeline’s delay field to 1d to account for this upstream processing time.

3. Review your mappings

In the “Create data mapping” step of your Warehouse Sync setup, map the is_historical column from your SQL query to the source_info.is_historical field. You should also set a channel for the data. For more details on this process, see Field Transformations.

Example Field Transformation Mapping:

[
  {
    "mapping_type": "column",
    "source": "is_historical",
    "destination": "source_info.is_historical"
  },
  {
    "mapping_type": "static",
    "destination": "source_info.channel",
    "value": "server_to_server"
  }
]

4. Set your sync settings

Your sync settings determine whether your pipeline runs on a schedule (incremental) or on-demand (full), and over what time period. These settings work together with the WHERE clause in your data model to control what data is ingested.

  • For full sync pipelines, you will typically trigger them on-demand for specific historical ranges defined in your SQL query.
  • For incremental sync pipelines, you will set a schedule and an iterator window. For a historical backfill, set the from date to the beginning of your historical period. Subsequent runs will automatically pick up where the last run left off.

5. Final review

Before activating your pipeline, review all settings to ensure they match your intended historical ingestion strategy. It’s a best practice to start with a small test batch to validate your configuration before running a large backfill.

Best practices for large historical data backfills

Successfully ingesting large volumes of historical data requires thoughtful preparation. In addition to the specific practices embedded in the steps above, keep the following general guidelines in mind:

  • Start small: Always begin with a small test batch. This allows you to validate your configuration, surface any issues early, and make adjustments before committing to a full-scale backfill.
  • Batch your loads: Break your historical data into manageable segments (for example, by year or month). Batching makes it easier to monitor progress, isolate and troubleshoot errors, and prevents overwhelming your pipeline. Keep in mind that very large backfills can take several days, may not display intermediate progress in the mParticle UI, and often cannot be paused or stopped once started.
  • Evaluate connected systems and features:

    • Event Tiers: Use Event Tiers to control which events are stored or evaluated. Not all events may be necessary for replays or analysis, especially if you use Audiences or Calculated Attributes. With Tiered Events, you can collect data without storing it, or store data without evaluating it.
    • Forwarding: Since historical data is not forwarded, consider which data is needed for your use case and what is not.
    • Audiences: Audiences rely on event timestamps for processing and retention. Avoid setting batch timestamps when ingesting events, as this can cause very old events to be excluded if the batch timestamp falls outside the audience’s retention window. Because audience calculations performed during ingest may be based on incomplete data, it’s best to set up your audience after the backfill is complete to ensure calculations are accurate and include the full historical dataset.
    • Calculated Attributes: Calculated Attributes behave similarly to audiences—calculations performed during ingest may be based on partial data, affecting accuracy until all data is loaded.
  • Review connected outputs: If you use connected outputs, confirm they can handle the increased data volume. For audiences, consider configuring them after the backfill to ensure calculations include the full historical period.
  • Stagger multiple sources: When syncing from multiple sources, schedule syncs at different times to avoid overloading your pipeline and to reduce processing delays.
  • Ingest user profiles first: If you are loading both user and event data, always ingest user profiles before events. This ensures user information is current and available for accurate event processing.
  • Let mParticle set batch timestamps: For event data, do not manually set the batch timestamp. Allow mParticle to use the event timestamps to maintain correct sequencing and processing.
  • Coordinate with mParticle: Before starting a large backfill (especially for data older than 30 days) notify your mParticle customer success team. They can help you to prepare and monitor your ingest, and verify that your account limits are sufficient for the expected data volume.
  • Choose the right pipeline type: Use incremental pipelines for ongoing syncs and seamless transition between loading your historical data and ongoing syncs, and reserve full pipelines for ad-hoc, on-demand replays where you want a complete refresh. Remember that a pipeline’s sync_mode can’t be changed after creation.
  • Filter Intelligently: Combine SQL WHERE clauses in your data model with pipeline from/until iterator windows to precisely control which rows are ingested and avoid overlapping backfill runs.

Work with your customer success team

Your mParticle customer success team can:

  • Advise on best practices for large ingests
  • Help adjust account limits if necessary
  • Provide guidance on error handling and observability

Engage your Customer Success team early in the planning process for large or unusual data ingests.

Additional Resources

Was this page helpful?

    Last Updated: July 18, 2025