Warehouse Sync can be used to ingest two types of data:
If you want to ingest event data, you must use the Field Transformations API to map the source data coming from your warehouse to the mParticle event data schema.
A field transformation maps external data (such as a column, row, field, or more complex data object) to an event attribute or field within the mParticle platform. When ingesting data through a warehouse sync pipeline, a field transformation tells mParticle exactly where to store each new piece of data within the context of the JSON schema, the overarching definition for how data is organized in mParticle.
Field transformations are JSON formatted specifications created using the Field Transformations API, a subcomponent of the mParticle Platform API. The Field Transformations API is grouped with the Platform API instead of the Warehouse Sync API because its functionality is not necessarily limited to Warehouse Sync, and future mParticle features may leverage field transformations. The Field Transformations API simply provides a structured method of mapping one data object to another.
Imagine the following simple data table and mParticle JSON data schema:
Example source database table:
eventId | sessionId | timeStamp | eventType |
---|---|---|---|
1234 | 5678 | 1402521613976 | screen_view |
… | … | … | … |
Example mParticle JSON data schema:
{
"events": [
{
"data": {
"event_id": 1234,
"session_id": 5678,
"timestamp_unixtime_ms": 1402521613976
},
"event_type": "screen_view"
}
]
}
When ingesting this data through a warehouse sync pipeline, we need to map each source column of our table to the appropriate fields in the mParticle JSON schema:
Source column name | Destination field name |
---|---|
eventId |
event_id |
sessionId |
session_id |
timeStamp |
timestamp_unixtime_ms |
eventType |
event_type |
The field transformation would be:
{
"id": "example-field-transformation-id",
"name": "Example Field Transformation",
"destination_type": "event_batch",
"mappings": [
{
"mapping_type": "column",
"source": "$eventId",
"destination": "events[].data.event_id"
},
{
"mapping_type": "column",
"source": "$sessionId",
"destination": "events[].data.session_id"
},
{
"mapping_type": "column",
"source": "$timeStamp",
"destination": "events[].data.timestamp_unixtime_ms"
},
{
"mapping_type": "column",
"source": "$eventType",
"destination": "events[].event_type"
}
],
"created_on": "2023-11-14T21:15:43.182Z",
"created_by": "developer@example.com",
"last_modified_on": "2023-11-14T21:15:43.182Z",
"last_modified_by": "developer@example.com"
}
Note that each field mapping is listed as an individual item within the array called mappings
.
When we refer to the source field name in the mParticle JSON schema, we use a simplified JSON path that reflects the nested structure of events data. For example, events[].data.event_id
refers to the field called event_id
in the data
object that sits within the events
array. You can find a detailed explanation of the JSON path format in the Field Transformation API reference.
Source data fields and their destinations in mParticle are expressed using the mappings
array in a field transformation. For every data object (either a column, single value, or array) you want to map, include a separate configuration object in the mappings
array.
Each mapping object in the mappings
array can be configured with the following settings:
mapping_type
: specifies the way the source data is mapped. Options include:
column
: maps a column in your database to a destination in mParticlestatic
: maps the value given to the value
property to a destination field in mParticleignore
: excludes the source data defined for source
from being ingestedsource
: the name of the column or field being mapped fromdestination
: the name of the field in mParticle being mapped tovalue
: used with a static
or column
mapping type. The value assigned to this property will be mapped directly to the mParticle field set in destination
There are four steps to creating and using a field transformation:
This first step is accomplished when setting up a warehouse sync pipeline. Part of the warehouse sync configuration requires you to create a data model. This data model is a SQL query that mParticle sends to your warehouse to retrieve the names of the data columns and fields in your warehouse that your pipeline will ingest.
These column and field names are what you supply for the values of the source
setting of a mapping.
For more information about writing SQL queries for your data model, see the Warehouse Sync SQL Reference.
Your field destinations are the names of the fields you want to map your source data as specified in the mParticle JSON data structure.
Since data in mParticle is structured as a series of nested JSON objects and arrays, mappings refers to these fields using simplified JSON paths.
For every column of data you plan to ingest, you must create a mapping so that mParticle can determine where to put the data in that column.
Mappings are configured using a combination of the mapping_type
, source
, destination
, and value
settings.
Example mapping:
{
"mapping_type": "column",
"source": "$eventId",
"destination": "events[].data.event_id"
}
To learn about the specific mapping settings, see Mapping object settings in the Field Transformations API reference.
Once you have completed writing each mapping object in your mappings array for all of your source data fields, you can create a field transformation by sending a POST
API request to the endpoint located at:
https://api.mparticle.com/platform/v2/workspaces/{workspaceId}/transformations/fields
The body of your API request must contain:
id
for your field transformation. This ID is referenced when creating your warehouse sync pipeline.name
for your field transformation.destination_type
for your field transformation. Currently, the only valid value is event_batch
.mappings
array you created in step 3.Example field transformation request body:
{
"id": "unique-id",
"name": "your-field-transformation-name",
"destination_type": "event_batch",
"mappings": [
{
"mapping_type": "column",
"source": "your-column-name",
"destination": "mparticle-field"
}
]
}
When creating a warehouse sync pipeline, you can use the ID of the field transformation you just created as the value for field_transformation_id
.
All data in mParticle is stored in either list fields or single-value fields. Single-value fields contain things like transaction IDs, email addresses, or product names. List fields contain lists of data objects, such as ecommerce products or error data.
When mParticle ingests commerce or crash event data from your warehouse, data that is mapped to a list field can be grouped within a single “event” according to a shared unique ID that you specify (like a transaction ID for a commerce product action).
To group commerce or crash data into a list field in mParticle, you must:
events[].data.source_message_id
field in mParticle.Map the remaining columns in your source data to their respective fields in mParticle. Note the following:
The list fields in the mParticle JSON schema that commerce and crash event data can be mapped to are:
events[].data.product_impressions
events[].data.product_impressions[].products
events[].data.product_impressions[].product_impressions_list
to further segment product impressions into product impression lists.events[].data.promotion_action.promotions
events[].data.product_action.products
events[].data.shopping_cart.products
events[].data.breadcrumbs
source_message_id
of the crash_report
event type.mParticle groups event data into list fields if, and only if, a shared unique ID (such as a transaction ID or promotion ID) is mapped from your source data to the events[].data.source_message_id
field in mParticle.
As mParticle ingests data, any time a unique set of source field values are mapped to a list field, a new object is added to that list.
For example, if a source field is mapped to events[].data.product_action.products[].name
, then a new product object will be added to events[].data.product_action.products[]
whenever unique value for name
is found.
Imagine that an ecommerce customer purchases three different items in the same transaction. If Warehouse Sync were to ingest that event data without the shared transaction ID, it would create three separate events with product
lists in mParticle, one for each of the three purchased items.
However, if a mapping exists between the source transaction ID and the source_message_id
, Warehouse Sync creates a single product
list containing all three items.
Each row sharing the same events[].data.source_message_id
must able to be grouped within the same event, otherwise your pipeline will return an error. In the source data below, each row sharing the same transaction_id
of 1
must have the same value for action
, which is purchase
.
transaction_id | action | name | price |
---|---|---|---|
1 | purchase |
shirt | 20.00 |
1 | purchase |
hat | 10.00 |
1 | purchase |
scarf | 5.00 |
[
{
"mapping_type": "column",
"source": "transaction_id",
"destination": "events[].data.source_message_id"
},
{
"mapping_type": "column",
"source": "action",
"destination": "events[].data.product_action.action"
},
{
"mapping_type": "column",
"source": "name",
"destination": "events[].data.product_action.products[].name"
},
{
"mapping_type": "column",
"source": "price",
"destination": "events[].data.product_action.products[].price"
}
]
{
"events": [
{
"data": {
"source_message_id": "1",
"product_action": {
"action": "purchase",
"transaction_id": 1,
"products": [
{
"name": "shirt",
"price": "20.00"
},
{
"name": "hat",
"price": "10.00"
},
{
"name": "scarf",
"price": "5.00"
}
]
}
}
}
]
}
Without mapping transaction_id
to source_message_id
, the output would instead be:
{
"events": [
{
"data": {
"product_action": {
"action": "purchase",
"products": [
{
"name": "shirt",
"price": "20.00"
}
]
}
}
},
{
"data": {
"product_action": {
"action": "purchase",
"products": [
{
"name": "hat",
"price": "10.00"
}
]
}
}
},
{
"data": {
"product_action": {
"action": "purchase",
"products": [
{
"name": "scarf",
"price": "5.00"
}
]
}
}
}
]
}
event_id | impression_group_id | product_impressions | product_name | product_price |
---|---|---|---|---|
1 | 1 | “banner ad” | “shirt” | 20.00 |
1 | 1 | “banner ad” | “hat” | 10.00 |
1 | 2 | “video ad” | “scarf” | 5.00 |
[
{
"mapping_type": "column",
"source": "event_id",
"destination": "events[].data.source_message_id"
},
{
"mapping_type": "column",
"source": "product_impressions",
"destination": "events[].data.product_impressions[].product_impressions_list"
},
{
"mapping_type": "column",
"source": "product_impressions",
"destination": "events[].data.product_impressions"
},
{
"mapping_type": "column",
"source": "product_name",
"destination": "events[].data.product_impressions[].products[].name"
},
{
"mapping_type": "column",
"source": "price",
"destination": "events[].data.product_impressions[].products[].price"
}
]
{
"events": [
{
"data": {
"source_message_id": "1",
"product_impressions": [
{
"product_impression_list": "banner ad",
"products": [
{
"name": "shirt",
"price": "20.00"
},
{
"name": "hat",
"price": "10.00"
}
]
},
{
"product_impression_list": "video ad",
"products": [
{
"name": "scarf",
"price": "5.00"
}
]
}
]
}
}
]
}
Following are some field transformation examples for common use cases:
first_name | premium_start_date | premium_discounts_applied | free_trial_expiration | free_trial_ad_impressions |
---|---|---|---|---|
“Bob” | 2023-11-29 |
["blackfriday20"] |
null | null |
[
{
"mapping_type": "column",
"source": "premium_*",
"destination": "user_attributes.*",
"ignore_when": "$null"
},
{
"mapping_type": "column",
"source": "free_trial_*",
"destination": "user_attributes.*",
"ignore_when": "$null"
}
]
{
"user_attributes": {
"premium_start_date": "2023-11-29",
"premium_discounts_applied": ["blackfriday20"]
}
}
first_name | premium_start_date | premium_discounts_applied | free_trial_expiration | free_trial_ad_impressions |
---|---|---|---|---|
“Bob” | 2023-11-29 |
["cybermonday24"] |
[
{
"mapping_type": "column",
"source": "premium_*",
"destination": "user_attributes.*",
"ignore_when": "$empty"
},
{
"mapping_type": "column",
"source": "free_trial_*",
"destination": "user_attributes.*",
"ignore_when": "$empty"
}
]
{
"user_attributes": {
"premium_start_date": "2023-11-29",
"premium_discounts_applied": ["cybermonday24"]
}
}
first_name | favorite_store_1 | favorite_store_2 | favorite_store_3 |
---|---|---|---|
“Bob” | “target” | “old navy” | “walmart” |
{
"mapping_type": "column",
"source": "favorite_store_*",
"destination": "user_attribute.favorite_stores[]",
"value": "{{ value | upcase }}"
}
{
"user_attributes": {
"favorite_stores": [
"TARGET",
"OLD NAVY",
"WALMART"
]
}
}
Imagine the following object of data within a column called profile_data
:
profile_data |
---|
{"customer_id": "12345", "first_name": "Bob", "ua_shirt_size": "M", "ua_favorite_color": "blue"} |
Using the column
mapping type and the *
wildcard, we create the following mapping:
[
{
"mapping_type": "column",
"source": "profile_data.customer_id",
"destination": "user_identities.customer_id"
},
{
"mapping_type": "column",
"source": "profile_data.fname",
"destination": "user_attributes.$firstname"
},
{
"mapping_type": "column",
"source": "profile_data.ua_*",
"destination": "user_attributes.*"
}
]
{
"user_identities": {
"customer_id": "12345"
},
"user_attributes": {
"$firstname": "Bob",
"ua_shirt_size": "M",
"ua_favorite_color": "blue"
}
}
Was this page helpful?