Developers

API References

Data Subject Request API

Data Subject Request API Version 1 and 2

Data Subject Request API Version 3

Platform API

Key Management

Platform API Overview

Accounts

Apps

Audiences

Calculated Attributes

Data Points

Feeds

Field Transformations

Services

Users

Workspaces

Warehouse Sync API

Warehouse Sync API Overview

Warehouse Sync API Tutorial

Warehouse Sync API Reference

Data Mapping

Warehouse Sync SQL Reference

Warehouse Sync Troubleshooting Guide

ComposeID

Warehouse Sync API v2 Migration

Bulk Profile Deletion API Reference

Calculated Attributes Seeding API

Custom Access Roles API

Group Identity API Reference

Data Planning API

Pixel Service

Profile API

Events API

mParticle JSON Schema Reference

IDSync

Client SDKs

AMP

AMP SDK

Android

Initialization

Configuration

Network Security Configuration

Event Tracking

User Attributes

IDSync

Screen Events

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

WebView Integration

Logger

Preventing Blocked HTTP Traffic with CNAME

Workspace Switching

Linting Data Plans

Troubleshooting the Android SDK

API Reference

Upgrade to Version 5

Cordova

Cordova Plugin

Identity

Direct Url Routing

Direct URL Routing FAQ

Web

Android

iOS

Flutter

Getting Started

Usage

API Reference

iOS

Workspace Switching

Initialization

Configuration

Event Tracking

User Attributes

IDSync

Screen Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Push Notifications

Webview Integration

Upload Frequency

App Extensions

Preventing Blocked HTTP Traffic with CNAME

Linting Data Plans

Troubleshooting iOS SDK

Social Networks

iOS 14 Guide

iOS 15 FAQ

iOS 16 FAQ

iOS 17 FAQ

iOS 18 FAQ

API Reference

Upgrade to Version 7

React Native

Getting Started

Identity

Roku

Getting Started

Identity

Media

Unity

Upload Frequency

Getting Started

Opt Out

Initialize the SDK

Event Tracking

Commerce Tracking

Error Tracking

Screen Tracking

Identity

Location Tracking

Session Management

Xbox

Getting Started

Identity

Web

Initialization

Configuration

Content Security Policy

Event Tracking

User Attributes

IDSync

Page View Tracking

Commerce Events

Location Tracking

Media

Kits

Application State and Session Management

Data Privacy Controls

Error Tracking

Opt Out

Custom Logger

Persistence

Native Web Views

Self-Hosting

Multiple Instances

Web SDK via Google Tag Manager

Preventing Blocked HTTP Traffic with CNAME

Facebook Instant Articles

Troubleshooting the Web SDK

Browser Compatibility

Linting Data Plans

API Reference

Upgrade to Version 2 of the SDK

Xamarin

Getting Started

Identity

Alexa

Server SDKs

Node SDK

Go SDK

Python SDK

Ruby SDK

Java SDK

Tools

mParticle Command Line Interface

Linting Tools

Smartype

Media SDKs

Android

iOS

Web

Quickstart

Android

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Step 9. Test your local app

HTTP Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

iOS Quick Start

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Java Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Node Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Python Quick Start

Step 1. Create an input

Step 2. Create an output

Step 3. Verify output

Web

Overview

Step 1. Create an input

Step 2. Verify your input

Step 3. Set up your output

Step 4. Create a connection

Step 5. Verify your connection

Step 6. Track events

Step 7. Track user data

Step 8. Create a data plan

Guides

Partners

Introduction

Outbound Integrations

Firehose Java SDK

Inbound Integrations

Kit Integrations

Overview

Android Kit Integration

JavaScript Kit Integration

iOS Kit Integration

Compose ID

Glossary

Data Hosting Locations

Migrate from Segment to mParticle

Migrate from Segment to Client-side mParticle

Migrate from Segment to Server-side mParticle

Segment-to-mParticle Migration Reference

Rules Developer Guide

API Credential Management

The Developer's Guided Journey to mParticle

Guides

Customer 360

Overview

User Profiles

Overview

User Profiles

Group Identity

Overview

Create and Manage Group Definitions

Calculated Attributes

Calculated Attributes Overview

Using Calculated Attributes

Create with AI Assistance

Calculated Attributes Reference

Predictive Attributes

What are predictive attributes?

Predict Future Behavior

Create Future Prediction

Use Future Predictions in Campaigns

Assess and Troubleshoot Predictions

Next Best Action

Next Best Action Overview

Create a Next Best Action (NBA)

View and Manage NBAs

Activate Next Best Actions in Campaigns

Getting Started

Create an Input

Start capturing data

Connect an Event Output

Create an Audience

Connect an Audience Output

Transform and Enhance Your Data

Platform Guide

Billing

Usage and Billing Report

The New mParticle Experience

The new mParticle Experience

The Overview Map

Observability

Observability Overview

Observability User Guide

Observability Troubleshooting Examples

Observability Span Glossary

Platform Settings

Key Management

Event Forwarding

Notification Center (Early Access)

System Alerts

Trends

Introduction

Data Retention

Connections

Data Catalog

Activity

Data Plans

Live Stream

Filters

Rules

Blocked Data Backfill Guide

Tiered Events

mParticle Users and Roles

Analytics Free Trial

Troubleshooting mParticle

Usage metering for value-based pricing (VBP)

IDSync

IDSync Overview

Use Cases for IDSync

Components of IDSync

Store and Organize User Data

Identify Users

Default IDSync Configuration

Profile Conversion Strategy

Profile Link Strategy

Profile Isolation Strategy

Best Match Strategy

Aliasing

Segmentation

Audiences

Audiences Overview

Create an Audience

Connect an Audience

Manage Audiences

FAQ

Classic Audiences

Standard Audiences (Legacy)

Predictive Audiences

Predictive Audiences Overview

Using Predictive Audiences

New vs. Classic Experience Comparison

Analytics

Introduction

Core Analytics (Beta)

Setup

Sync and Activate Analytics User Segments in mParticle

User Segment Activation

Welcome Page Announcements

Settings

Project Settings

Roles and Teammates

Organization Settings

Global Project Filters

Portfolio Analytics

Analytics Data Manager

Analytics Data Manager Overview

Events

Event Properties

User Properties

Revenue Mapping

Export Data

UTM Guide

Analyses

Analyses Introduction

Segmentation: Basics

Getting Started

Visualization Options

For Clauses

Date Range and Time Settings

Calculator

Numerical Settings

Segmentation: Advanced

Assisted Analysis

Properties Explorer

Frequency in Segmentation

Trends in Segmentation

Did [not] Perform Clauses

Cumulative vs. Non-Cumulative Analysis in Segmentation

Total Count of vs. Users Who Performed

Save Your Segmentation Analysis

Export Results in Segmentation

Explore Users from Segmentation

Funnels: Basics

Getting Started with Funnels

Group By Settings

Conversion Window

Tracking Properties

Date Range and Time Settings

Visualization Options

Interpreting a Funnel Analysis

Funnels: Advanced

Group By

Filters

Conversion over Time

Conversion Order

Trends

Funnel Direction

Multi-path Funnels

Analyze as Cohort from Funnel

Save a Funnel Analysis

Explore Users from a Funnel

Export Results from a Funnel

Cohorts

Getting Started with Cohorts

Analysis Modes

Save a Cohort Analysis

Export Results

Explore Users

Saved Analyses

Manage Analyses in Dashboards

Journeys

Getting Started

Event Menu

Visualization

Ending Event

Save a Journey Analysis

Users

Getting Started

User Activity Timelines

Time Settings

Export Results

Save A User Analysis

Query Builder

Data Dictionary

Query Builder Overview

Modify Filters With And/Or Clauses

Query-time Sampling

Query Notes

Filter Where Clauses

Event vs. User Properties

Group By Clauses

Annotations

Cross-tool Compatibility

Apply All for Filter Where Clauses

Date Range and Time Settings Overview

User Attributes at Event Time

Understanding the Screen View Event

User Aliasing

Dashboards

Dashboards––Getting Started

Manage Dashboards

Dashboard Filters

Organize Dashboards

Scheduled Reports

Favorites

Time and Interval Settings in Dashboards

Query Notes in Dashboards

Analytics Resources

The Demo Environment

Keyboard Shortcuts

User Segments

Tutorials

Analytics for Marketers

Analytics for Product Managers

Compare Conversion Across Acquisition Sources

Analyze Product Feature Usage

Identify Points of User Friction

Time-based Subscription Analysis

Understand Product Stickiness

Dashboard Tips and Tricks

Optimize User Flow with A/B Testing

APIs

User Segments Export API

Dashboard Filter API

Warehouse Sync

Warehouse Sync User Guide

Historical Data and Warehouse Sync

Data Privacy Controls

Data Subject Requests

Default Service Limits

Feeds

Cross-Account Audience Sharing

Approved Sub-Processors

Import Data with CSV Files

CSV File Reference

Glossary

Video Index

Analytics (Deprecated)

Identity Providers

Single Sign-On (SSO)

Setup Examples

Settings

Debug Console

Data Warehouse Delay Alerting

Introduction

Developer Docs

Introduction

Integrations

Introduction

Rudderstack

Google Tag Manager

Segment

Data Warehouses and Data Lakes

Advanced Data Warehouse Settings

AWS Kinesis (Snowplow)

AWS Redshift (Define Your Own Schema)

AWS S3 Integration (Define Your Own Schema)

AWS S3 (Snowplow Schema)

BigQuery (Snowplow Schema)

BigQuery Firebase Schema

BigQuery (Define Your Own Schema)

GCP BigQuery Export

Snowflake (Snowplow Schema)

Snowplow Schema Overview

Snowflake (Define Your Own Schema)

APIs

Dashboard Filter API (Deprecated)

REST API

User Segments Export API (Deprecated)

SDKs

SDKs Introduction

React Native

iOS

Android

Java

JavaScript

Python

Object API

Developer Basics

Aliasing

Integrations

24i

Event

Aarki

Audience

Abakus

Event

ABTasty

Audience

Actable

Feed

AdChemix

Event

Adikteev

Audience

Event

Adjust

Event

Feed

AdMedia

Audience

Adobe Marketing Cloud

Cookie Sync

Platform SDK Events

Server-to-Server Events

Adobe Audience Manager

Audience

Adobe Campaign Manager

Audience

Adobe Target

Audience

AdPredictive

Feed

AgilOne

Event

Algolia

Event

Airship

Audience

Event

Feed

Alooma

Event

AlgoLift

Event

Feed

Amazon Advertising

Audience

Amazon Kinesis

Event

Amazon Kinesis Firehose

Audience

Event

Amazon Redshift

Data Warehouse

Amazon S3

Event

Amazon SNS

Event

Amazon SQS

Event

Amobee

Audience

Amplitude

Forwarding Data Subject Requests

Event

Ampush

Audience

Event

Analytics

Audience

Event

Forwarding Data Subject Requests

Anodot

Event

Antavo

Feed

AppLovin

Audience

Event

Apptimize

Event

AppsFlyer

Forwarding Data Subject Requests

Event

Feed

Attentive

Event

Feed

Apteligent

Event

Awin

Event

Attractor

Event

Microsoft Azure Blob Storage

Event

Batch

Audience

Event

Bidease

Audience

Bing Ads

Event

Bluecore

Event

Bluedot

Feed

Blueshift

Event

Feed

Forwarding Data Subject Requests

Braze

Audience

Feed

Forwarding Data Subject Requests

Event

Branch S2S Event

Event

Branch

Event

Feed

Forwarding Data Subject Requests

Bugsnag

Event

Button

Audience

Event

Census

Feed

Cadent

Audience

ciValue

Event

Feed

CleverTap

Audience

Event

Feed

comScore

Event

Conversant

Event

Cordial

Audience

Feed

Cortex

Event

Feed

Forwarding Data Subject Requests

Criteo

Audience

Event

Crossing Minds

Event

Custom Feed

Customer.io

Audience

Feed

Event

CustomerGlu

Event

Feed

Didomi

Event

Datadog

Event

Dynalyst

Audience

Databricks

Data Warehouse

Edge226

Audience

Dynamic Yield

Event

Audience

Emarsys

Audience

Everflow

Audience

Epsilon

Event

Facebook

Event

Audience

Facebook Offline Conversions

Event

Fiksu

Event

Audience

Google Analytics for Firebase

Event

Flurry

Event

Flybits

Event

Formation

Event

Feed

ForeSee

Event

Foursquare

Audience

Feed

Google Ad Manager

Audience

FreeWheel Data Suite

Audience

Apptentive

Event

Friendbuy

Event

Google Ads

Audience

Event

Google Analytics

Event

Google BigQuery

Audience

Data Warehouse

Google Analytics 4

Event

Google Cloud Storage

Audience

Event

Google Enhanced Conversions

Event

Google Marketing Platform

Audience

Cookie Sync

Event

Google Marketing Platform Offline Conversions

Event

Heap

Event

Google Tag Manager

Event

Google Pub/Sub

Event

Herow

Feed

Hightouch

Feed

Hyperlocology

Event

Ibotta

Event

Impact

Event

ID5

Kit

InMobi

Audience

Event

InMarket

Audience

Inspectlet

Event

Insider

Audience

Event

Feed

Intercom

Event

ironSource

Audience

iPost

Feed

Audience

Iterable

Audience

Event

Feed

Jampp

Audience

Event

Kafka

Event

Kayzen

Audience

Event

Kissmetrics

Event

Klaviyo

Audience

Event

Kochava

Event

Feed

Forwarding Data Subject Requests

Kubit

Event

LaunchDarkly

Feed

Leanplum

Event

Audience

Feed

Liftoff

Audience

Event

LifeStreet

Audience

LinkedIn Conversions API Integration

LiveLike

Event

Liveramp

Audience

Localytics

Event

MadHive

Audience

mAdme Technologies

Event

Mailchimp

Audience

Event

Feed

Marigold

Audience

Mautic

Audience

Event

MediaMath

Audience

Mediasmart

Audience

Microsoft Azure Event Hubs

Event

Mintegral

Audience

MoEngage

Audience

Event

Feed

Moloco

Audience

Event

Mixpanel

Audience

Forwarding Data Subject Requests

Event

Movable Ink

Event

Monetate

Event

Movable Ink - V2

Event

Multiplied

Event

Nami ML

Feed

Nanigans

Event

myTarget

Audience

Event

Narrative

Audience

Event

Feed

NCR Aloha

Event

OneTrust

Event

Neura

Event

Optimizely

Audience

Event

Oracle Responsys

Audience

Event

Oracle BlueKai

Event

Paytronix

Feed

Persona.ly

Audience

Personify XP

Event

PieEye

Inbound Data Subject Requests

Pilgrim

Event

Feed

Audience

Event

Plarin

Event

Postie

Audience

Event

Pushwoosh

Event

Audience

Primer

Event

Punchh

Audience

Event

Feed

Qualtrics

Event

Quadratic Labs

Event

Rakuten

Event

Audience

Radar

Feed

Event

Regal

Event

Reveal Mobile

Event

Remerge

Audience

Event

Retina AI

Event

Feed

RevenueCat

Feed

Rokt

Audience

Rokt Thanks and Pay+

Event

RTB House

Audience

Event

Sailthru

Audience

Event

Salesforce Mobile Push

Event

Salesforce Email

Event

Feed

Audience

Salesforce Sales and Service Cloud

Event

SendGrid

Feed

Audience

Scalarr

Event

Samba TV

Event

Audience

ShareThis

Audience

Feed

SessionM

Event

Feed

Shopify

Feed

Custom Pixel

SimpleReach

Event

Signal

Event

Singular-DEPRECATED

Event

Singular

Event

Feed

Skyhook

Event

Slack

Event

Smadex

Audience

SmarterHQ

Event

Snapchat

Audience

Event

Snowflake

Data Warehouse

Snapchat Conversions

Event

Snowplow

Event

Splunk MINT

Event

Split

Event

Feed

Sprig

Audience

Event

Statsig

Event

Feed

StartApp

Audience

Stormly

Audience

Event

Talon.One

Audience

Feed

Event

Loyalty Feed

Swrve

Event

Feed

Tapad

Audience

Tapjoy

Audience

Taplytics

Event

Taptica

Audience

Teak

Audience

The Trade Desk

Audience

Cookie Sync

Event

Ticketure

Feed

TikTok Event

Audience

Audience (Deprecated)

Event

Audience Migration

Treasure Data

Audience

Event

Triton Digital

Audience

TUNE

Event

Twitter

Event

Audience

Valid

Event

Voucherify

Event

Audience

Vkontakte

Audience

Vungle

Audience

Webhook

Event

Webtrends

Event

White Label Loyalty

Event

Wootric

Event

Xandr

Audience

Cookie Sync

Yahoo (formerly Verizon Media)

Cookie Sync

Audience

Yotpo

Feed

YouAppi

Audience

Z2A Digital

Audience

Event

Zendesk

Feed

Event

Quantcast

Event

Historical Data and Warehouse Sync

When ingesting historical event data into mParticle with Warehouse Sync, it is important to consider historical data handling, data quality, data retention, and platform limits. Historical data is processed differently by mParticle, so pipelines that ingest it have special considerations and requirements.

This guide outlines key points and best practices for handling historical and large-volume data ingestion.

There are two approaches to ingesting data with Warehouse Sync:

Full syncs: Ingesting data in batches, on demand. Full pipelines re-read the entire result set every time they run and are well-suited for manual syncs of a specific set of data, small tables, or ad-hoc, on-demand replays/retries.
Incremental syncs: Ingesting data incrementally, on a schedule. Incremental pipelines require an iterator column and only fetch rows whose iterator value is greater than the last successful run. This is the recommended approach for ongoing data syncs.

When ingesting historical data, you can use a combination of these two pipeline types to backfill old data and then switch to an ongoing incremental sync.

What is historical data?

As a general rule, any event with a timestamp older than 30 days is considered historical. This data requires special handling to ensure it is processed correctly and made available for long-term use cases like audience segmentation with extended lookback windows.

How mParticle processes historical data

Data that you flag as historical is handled differently from real-time data:

No real-time forwarding: Historical events are not forwarded to any connected event, data warehouse, or audience outputs. This prevents old data from triggering real-time workflows in downstream systems.
Optimized for historical workflows: The data is retained and can power features that rely on long-term activity (for example, audiences with extended look-back windows), though are subject to your account’s data retention policies.
Delayed UI visibility: Because historical uploads bypass the real-time processing stream, these events and the profile updates they generate may take longer to appear in interfaces such as Live Stream or the User Activity view. Generally, historical data is available in the User Activity view within 24 hours of ingestion.
Data retention limits still apply: mParticle enforces data retention policies that define how long your data remains available and if older data is made available for use. Before loading very old events, coordinate with your mParticle account team to ensure your retention configuration can accept them; otherwise, the data may be discarded during ingest.

All other processing rules for identity resolution and storage remain the same for historical data.

Types of historical data you can ingest

A core concept in mParticle is the relationship between user data (attributes and identities) and event data. Understanding this is critical for a successful historical data ingest.

Data is sent to mParticle in batches (multiple events batched together) to optimize throughput. Each batch contains event data, and provides context about the user in the form of user attributes and identities.

Events are actions a user takes at a specific point in time. For example, a product_view event captures that a user looked at a product. Event data is captured and stored as events using mParticle’s event data format.
User Attributes are properties that describe a user, like membership_tier: gold.
User Identities are the identifiers used to recognize a user, like an email address or customer ID.

User attributes and identities are stored in a persistent User Profile, which creates a complete, 360-degree view of your user.

When ingesting historical data, you can include user attributes and identities in the same batch as your events. However, in many backfill scenarios, it’s more effective to ingest them separately. This is particularly true when your goal is to have the final user profile reflect the most recent information about a user, rather than the state they were in when a historical event occurred. For example, you can send a batch containing only user_attributes and user_identities to update a user’s profile to ensure that the profile is accurate and up-to-date, then ingest the users’ events when the profile reflects the most recent information.

Historical data ingestion limits and performance considerations

mParticle enforces service limits specific to Warehouse Sync, including limits on:

The number of active pipelines per account
The number of records ingested per interval type (hourly, daily, weekly, and monthly)
The number of historical records ingested
The number of database columns ingested

Warehouse Sync ingests data at a rate according to your account’s configured limits and expected data volumes. When the Warehouse Sync API or UI shows a “success” status, it means your data has been ingested can be used by downstream features and sent to output integrations. Keep in mind that while recent non-historical data may be available quickly, some downstream features and integrations may require additional processing time before all ingested data is available. This is especially true for large or historical data loads that are processed differently from real-time data and connected outputs which have their own processing times.

Recommended best practices

Review the Default Service Limits before ingesting a large volume of data.
Estimate the total number of records you plan to ingest, both for the initial historical load and for ongoing scheduled runs. Consider whether you will need to re-ingest historical data for testing or validation, and how many new or updated records will be included in each subsequent interval.
Based on your estimates, coordinate with your mParticle Customer Success team if you anticipate exceeding any limits.
- Additionally, communicate your timeline so the large data load can be processed according to your requirements.

Distinguishing batch and event timestamps

When ingesting historical data, it’s essential to distinguish between batch timestamps and event timestamps:

Batch timestamp: Indicates when the batch was created. Setting this is optional and should only be used if you need to control the order in which user attributes are synchronized—specifically, when updates may arrive out of order and must be processed according to the batch’s timestamp.
Event timestamp: Represents when each event actually occurred. Event timestamps must not be later than the batch timestamp and are subject to your account’s data retention window. You can control how the platform processes events based on the event timestamp using the source_info.is_historical field.

Key points

If you are ingesting event data, do not set the batch timestamp. Let mParticle use the event timestamps to determine event order and processing.
If you need to update user attributes based on when they were last changed, set the batch timestamp to match the user attribute’s timestamp, and ingest these updates separately from event data.

This approach ensures that both event and profile data are processed accurately and in the correct order, while avoiding issues with data retention, forwarding, and audience availability.

Strategies for ingesting historical data

There are two common strategies for ingesting historical data efficiently:

Single incremental pipeline with an early from date – Set the from value to the start of the period you want to ingest, and leave until blank. The initial run will backfill the entire range, after which the pipeline will automatically switch to incremental updates on its schedule.
Separate historical full and incremental pipelines – Create a full On Demand pipeline and use a WHERE clause in your data model to limit the dataset (for example, WHERE event_timestamp BETWEEN '2024-01-01' AND '2024-06-30'). Trigger the pipeline whenever you need to ingest another slice, updating the clause between runs as needed. Once you have finished ingesting the historical data, disable the full pipeline and create a new incremental pipeline to sync ongoing new data on a regular schedule, setting the from value to the last date you ingested with the full pipeline.

How to create a pipeline for historical data

This section provides an overview of creating a Warehouse Sync pipeline specifically for ingesting historical data. For a complete, detailed walkthrough of how to create a Warehouse Sync pipeline, refer to the main Warehouse Sync setup guide.

1. Connect to your warehouse

The first step is to connect mParticle to your data warehouse. This process is the same for all Warehouse Sync pipelines.

2. Create a data model

Your data model is the SQL query that defines what data to pull from your warehouse. When ingesting historical data, your query needs special consideration.

Mark data as historical

To ensure mParticle correctly identifies and processes your historical data, you must explicitly flag it when creating your data model. The “Sync Historical Data” setting in your Warehouse Sync pipeline only controls how far back the pipeline reads from your warehouse; it does not automatically mark old data as historical.

To flag data as historical, you must add a boolean column to your SQL query that flags rows as historical, and then map this column to the source_info.is_historical field in your Warehouse Sync configuration.

Example SQL:

Add a CASE statement to your SQL query to create an is_historical column.

SELECT
  *,
  CASE
    WHEN event_timestamp < CURRENT_DATE - INTERVAL '30 days' THEN TRUE
    ELSE FALSE
  END AS is_historical
FROM your_table

Filter the data you ingest

Warehouse Sync provides two complementary ways to control which rows are pulled from your warehouse:

SQL `WHERE` clauses

Add predicates such as WHERE event_timestamp >= DATE '2023-01-01' or a bounded BETWEEN statement inside the query you save with the data model. The filter executes in your warehouse on every run. This is recommended for filtering on fields other than the iterator field, such as event timestamp, event type, or user region, and is useful for validating your pipeline by limiting the dataset to a small subset of data.

Iterator window (`from` / `until`)

Use the Sync Historical Data setting in the UI or use the from and until fields on sync_mode when calling the API to set the minimum and maximum iterator values a pipeline will ever request. This is recommended for a seamless transition between your initial backfill and ongoing incremental syncs, and makes your pipeline runs more easily auditable via the pipeline status APIs.

A row must satisfy both the iterator window and your SQL to be ingested, giving you granular control over the time span and the business logic applied to your data.

Choose an iterator column

When using incremental pipelines in Warehouse Sync, you must specify an iterator column—a timestamp field (such as datetime, date, or Unix timestamp) that tracks which rows have already been processed. This iterator column is essential for reliable incremental updates and should be distinct from your event timestamp field whenever possible.

Best practices for iterator columns:

Use a dedicated iterator column (such as a system timestamp indicating when a row was inserted or updated) for incremental syncs. This is preferred because the time an event occurred (event timestamp) often differs from when the row was added or updated in your warehouse (iterator/system time).
Use the iterator column to track which rows have been processed, ensuring reliable incremental updates.
Use the event timestamp field for filtering in your SQL (for example, in a WHERE clause) and for mapping to the appropriate mParticle field.
When setting up your mappings, you may ignore the iterator column if it’s only used for sync tracking.
You may use the event timestamp field as the iterator column if convenient, but this is not recommended for most use cases.
You can offset the iterator column by a fixed amount to accommodate late-arriving data. For example, if data typically arrives in your warehouse one day after creation, set the pipeline’s delay field to 1d to account for this upstream processing time.

3. Review your mappings

In the “Create data mapping” step of your Warehouse Sync setup, map the is_historical column from your SQL query to the source_info.is_historical field. You should also set a channel for the data. For more details on this process, see Field Transformations.

Example Field Transformation Mapping:

[
  {
    "mapping_type": "column",
    "source": "is_historical",
    "destination": "source_info.is_historical"
  },
  {
    "mapping_type": "static",
    "destination": "source_info.channel",
    "value": "server_to_server"
  }
]

4. Set your sync settings

Your sync settings determine whether your pipeline runs on a schedule (incremental) or on-demand (full), and over what time period. These settings work together with the WHERE clause in your data model to control what data is ingested.

For full sync pipelines, you will typically trigger them on-demand for specific historical ranges defined in your SQL query.
For incremental sync pipelines, you will set a schedule and an iterator window. For a historical backfill, set the from date to the beginning of your historical period. Subsequent runs will automatically pick up where the last run left off.

5. Final review

Before activating your pipeline, review all settings to ensure they match your intended historical ingestion strategy. It’s a best practice to start with a small test batch to validate your configuration before running a large backfill.

Best practices for large historical data backfills

Successfully ingesting large volumes of historical data requires thoughtful preparation. In addition to the specific practices embedded in the steps above, keep the following general guidelines in mind:

Start small: Always begin with a small test batch. This allows you to validate your configuration, surface any issues early, and make adjustments before committing to a full-scale backfill.
Batch your loads: Break your historical data into manageable segments (for example, by year or month). Batching makes it easier to monitor progress, isolate and troubleshoot errors, and prevents overwhelming your pipeline. Keep in mind that very large backfills can take several days, may not display intermediate progress in the mParticle UI, and often cannot be paused or stopped once started.
Evaluate connected systems and features:
- Event Tiers: Use Event Tiers to control which events are stored or evaluated. Not all events may be necessary for replays or analysis, especially if you use Audiences or Calculated Attributes. With Tiered Events, you can collect data without storing it, or store data without evaluating it.
- Forwarding: Since historical data is not forwarded, consider which data is needed for your use case and what is not.
- Audiences: Audiences rely on event timestamps for processing and retention. Avoid setting batch timestamps when ingesting events, as this can cause very old events to be excluded if the batch timestamp falls outside the audience’s retention window. Because audience calculations performed during ingest may be based on incomplete data, it’s best to set up your audience after the backfill is complete to ensure calculations are accurate and include the full historical dataset.
- Calculated Attributes: Calculated Attributes behave similarly to audiences—calculations performed during ingest may be based on partial data, affecting accuracy until all data is loaded.
Review connected outputs: If you use connected outputs, confirm they can handle the increased data volume. For audiences, consider configuring them after the backfill to ensure calculations include the full historical period.
Stagger multiple sources: When syncing from multiple sources, schedule syncs at different times to avoid overloading your pipeline and to reduce processing delays.
Ingest user profiles first: If you are loading both user and event data, always ingest user profiles before events. This ensures user information is current and available for accurate event processing.
Let mParticle set batch timestamps: For event data, do not manually set the batch timestamp. Allow mParticle to use the event timestamps to maintain correct sequencing and processing.
Coordinate with mParticle: Before starting a large backfill (especially for data older than 30 days) notify your mParticle customer success team. They can help you to prepare and monitor your ingest, and verify that your account limits are sufficient for the expected data volume.
Choose the right pipeline type: Use incremental pipelines for ongoing syncs and seamless transition between loading your historical data and ongoing syncs, and reserve full pipelines for ad-hoc, on-demand replays where you want a complete refresh. Remember that a pipeline’s sync_mode can’t be changed after creation.
Filter Intelligently: Combine SQL WHERE clauses in your data model with pipeline from/until iterator windows to precisely control which rows are ingested and avoid overlapping backfill runs.

Work with your customer success team

Your mParticle customer success team can:

Advise on best practices for large ingests
Help adjust account limits if necessary
Provide guidance on error handling and observability

Engage your Customer Success team early in the planning process for large or unusual data ingests.

Additional Resources

Was this page helpful?

Last Updated: August 8, 2025

Historical Data and Warehouse Sync

What is historical data?

How mParticle processes historical data

Types of historical data you can ingest

Historical data ingestion limits and performance considerations

Recommended best practices

Distinguishing batch and event timestamps

Key points

Strategies for ingesting historical data

How to create a pipeline for historical data

1. Connect to your warehouse

2. Create a data model

Mark data as historical

Filter the data you ingest

SQL WHERE clauses

Iterator window (from / until)

Choose an iterator column

3. Review your mappings

4. Set your sync settings

5. Final review

Best practices for large historical data backfills

Work with your customer success team

Additional Resources

SQL `WHERE` clauses

Iterator window (`from` / `until`)