Backfilling is the process of importing previously blocked data from a Quarantine Output, optionally transforming it, and then uploading it back into mParticle. We provide instructions and helper scripts for you to backfill blocked data to mParticle’s Events API below. You cannot replay blocked data through the UI.
Read more about our Blocking feature here.
Replaying event attributes requires replaying of events
Replaying event attributes is not possible without replaying their associated events, which can lead to event duplication.
Avoid additional MTU charges
If the backfilled MPID and the original MPID do not match, the user will be counted twice and the number of unique MPIDs that determines your mParticle bill will be impacted.
Sooner is better than later
We advise replaying data no longer than 2 weeks from the date it was quarantined. Many downstream tools will not accept data over a certain age. The sooner you replay data, the better.
Batch and event timestamps
To send data to mParticle via our Events API, events are stored in a batch (see our Events API docs pages for additional detail). Both mParticle batches and events have a timestamp attached to them. To ensure that events are backfilled with the original timestamp, it’s essential to preserve the value stored in the timestamp_unixtime_ms
field that each event object contains (the timestamp attached at the batch level can be ignored).
Avoid batch deduplication
To avoid batches from being deduplicated in mParticle’s internal data pipeline, make sure to remove the batch_id
from the blocked batch before backfilling it to mParticle.
Backfilling data requires some coding skills
To fix and replay data, you need to know how to code.
Backfilling blocked data is non-trivial because you typically are interested in backfilling data to several downstream event integrations.
Based on your unique set of target event integrations, you should devise a strategy for your data backfill. The following questions will guide your backfill strategy:
Which integrations do I need to backfill?
Different integrations have different limitations when it comes to receiving historical data. Establish the limitation of a target integration by reading their developer docs or by sending a small amount of test data through an mParticle connection.
Do I need to backfill unplanned event attributes?
Event attributes cannot be replayed without their associated events. You’ll need a strategy (e.g. deleting previously sent yet incomplete events) to avoid event duplication if you want to replay blocked event attributes.
Which mParticle Input should I use to backfill my data?
The cleanest solution is typically to create a new Custom Feed for the purpose of your backfill. You can connect only the integrations that you want to backfill to that feed and then tear it down again once the backfill is complete.
However, some integrations are not available through the Custom Feed Input. In those cases, you will need to either (i) use the keys and secret of the original Input (e.g. Web) in our backfill script or (ii) send data directly to the integration’s API (after transforming it to match their data model).
Once you have a strategy for your backfill, here are the steps to backfill your data:
context
node. Within the context node, you will see a node labeled block_metadata
. This node contains the data you have blocked. Reference our sample data below to understand the complete data structure. block_metadata
node. Was this page helpful?