mParticle released version 2 of the Warehouse Sync API on October 31, 2023. All Warehouse Sync pipelines created using v1 of the API were automatically migrated to v2, so you do not need to take any action to update or migrate your existing pipelines.
This doc highlights the key differences between the v1 and v2 APIs by resource group.
v2 has removed, reorganized, and renamed many of the configuration fields specific to Snowflake, BigQuery, and Redshift. Navigate between each warehouse provider using the tabs below to see tables illustrating these changes side-by-side.
v1 Field | v2 Field | Change description |
---|---|---|
id |
id |
No change |
name |
name |
No change |
workspace_id |
- | Removed in v2 |
- | state |
New field. Can be new , active or inactive |
is_faulted |
status |
Renamed |
type |
service_provider |
Renamed |
- | config |
v2 uses a JSON objected called config to contain the warehouse specific config settings |
source_account_id |
account_identifier |
Renamed |
region |
region |
No change |
warehouse |
warehouse |
No change |
database |
database |
No change |
role |
role |
No change |
user |
user |
No change |
password |
password |
No change |
snowflake_aws_iam_user_arn |
aws_iam_user_arn |
Renamed |
snowflake_aws_external_id |
aws_external_id |
Renamed |
created_on |
created_on |
No change |
created_by |
created_by |
No change |
last_modified_on |
last_modified_on |
No change |
last_modified_by |
last_modified_by |
No change |
{
"id": "example-connection",
"name": "Example Connection",
"workspace_id": 5328,
"is_faulted": false,
"type": "snowflake",
"source_account_id": "gd1234",
"region": "us-central1.gcp",
"warehouse": "compute_wh",
"database": "indicative",
"role": "mp_role",
"user": "mp_user",
"password": "************",
"snowflake_aws_iam_user_arn": "arn:aws:iam::123456:user/externalstages/abcdefg",
"snowflake_aws_external_id": "GD1234=2_abcdefg==",
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
{
"id": "example-connection",
"name": "Example Snowflake Connection",
"state": "active",
"status": "healthy",
"service_provider": "Snowflake",
"config": {
"account_identifier": "gd12345",
"region": "us-central1.gcp",
"warehouse": "compute_wh",
"database": "my_database",
"role": "mparticle_role",
"user": "mparticle_user",
"password": "************",
"aws_iam_user_arn": "arn:aws:iam::123456:user/externalstages/abcdefg",
"aws_external_id": "GD1234=2_abcdefg=="
},
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
v1 Field | v2 Field | Change description |
---|---|---|
id |
id |
No change |
name |
name |
No change |
workspace_id |
- | Removed in v2 |
- | state |
New field. Can be new , active or inactive |
is_faulted |
status |
Renamed |
type |
service_provider |
Renamed |
- | config |
v2 uses a JSON objected called config to contain the warehouse specific config settings |
region |
region |
No change |
service_account_id |
service_account_id |
No change |
service_account_key |
service_account_key |
No change |
project_id |
project_id |
No change |
dataset_id |
dataset_id |
No change |
created_on |
created_on |
No change |
created_by |
created_by |
No change |
last_modified_on |
last_modified_on |
No change |
last_modified_by |
last_modified_by |
No change |
{
"id": "example-connection",
"name": "Example Connection",
"workspace_id": 1234,
"is_faulted": false,
"type": "bigquery",
"region": "us-east4",
"service_account_id": "mp_service",
"service_account_key": "************",
"project_id": "mp-project",
"dataset_id": "mp-dataset",
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
{
"id": "example-bigquery-connection",
"name": "Example BigQuery Connection",
"state": "active",
"status": "healthy",
"service_provider": "BigQuery",
"config": {
"region": "us-east1",
"project_id": "my-gcp-project",
"dataset_id": "my-dataset",
"service_account_id": "mparticle-account@my-gcp-project.iam.gserviceaccount.com",
"service_account_key": "************"
},
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
v1 Field | v2 Field | Change description |
---|---|---|
id |
id |
No change |
name |
name |
No change |
workspace_id |
- | Removed in v2 |
- | state |
New field. Can be new , active or inactive |
is_faulted |
status |
Renamed |
type |
service_provider |
Renamed |
- | config |
v2 uses a JSON objected called config to contain the warehouse specific config settings |
database |
database |
No change |
user |
user |
No change |
password |
password |
No change |
redshift_aws_iam_role_arn |
aws_iam_role_arn |
Renamed |
host |
host |
No change |
port |
port |
No change |
created_by |
created_by |
No change |
last_modified_on |
last_modified_on |
No change |
last_modified_by |
last_modified_by |
No change |
{
"id": "example-connection",
"name": "Example Connection",
"workspace_id": 5328,
"is_faulted": false,
"type": "redshift",
"database": "mp",
"user": "mp_user",
"password": "************",
"redshift_aws_iam_role_arn": "arn:aws:iam::1234567:role/mparticle_redshift_role",
"host": "cluster.abcd1234.us-east-1.redshift.amazonaws.com",
"port": "5439",
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
{
"id": "example-redshift-connection",
"name": "Redshift Connection",
"state": "active",
"status": "healthy",
"service_provider": "Redshift",
"config": {
"database": "dev",
"host": "dwi-test.ab123yyxwwzz.us-east-1.redshift.amazonaws.com",
"port": "5439",
"user": "mParticle",
"password": "************",
"aws_iam_role_arn": "arn:aws:iam::123456789:role/mParticle_role"
},
"created_on": "2023-02-03T23:53:08.413",
"created_by": "developer@mparticle.com",
"last_modified_on": null,
"last_modified_by": null
}
The main difference between data models in v1 and v2 is that v2 data models are defined as SQL statements within a new JSON object called config
. Each of the four timestamp configuration fields in v1 have also been removed, because v2 Warehouse Sync pipelines allow you to define what data to include in your syncs using any column in your database you want using the sync_mode
fields of your pipeline.
v1 Field | v2 Field | Change description |
---|---|---|
id |
id |
No change |
name |
name |
No change |
workspace_id |
- | Removed in v2 |
- | state |
New field. Can be active or inactive . |
- | status |
New field |
type |
type |
No change |
- | config |
v2 uses a JSON objected called config to contain the data model’s SQL query |
sql_query |
sql_query |
sql_query has been moved into the JSON object called config |
load_timestamp_field_type |
- | Removed in v2 |
load_timestamp_field_name |
- | Removed in v2 |
load_timestamp_field_time_zone |
- | Removed in v2 |
load_timestamp_field_time_offset |
- | Removed in v2 |
created_on |
created_on |
No change |
created_by |
created_by |
No change |
last_modified_on |
last_modified_on |
No change |
last_modified_by |
last_modified_by |
No change |
{
"id": "example-data-model",
"name": "Example Data Model",
"workspace_id": 5328,
"type": "sql",
"sql_query": "SELECT email AS email, COUNT(id) AS \"count_of_open_tickets\", LAST_UPDATED_DATE_TIME FROM mp.demo_service.tickets WHERE t.status = 'open'",
"load_timestamp_field_type": "timestamp_ltz",
"load_timestamp_field_name": "LAST_UPDATED_DATE_TIME",
"load_timestamp_field_time_zone": null,
"load_timestamp_field_time_offset": 0,
"created_on": "2022-09-16T16:58:47.317",
"created_by": "developer@mparticle.com",
"last_modified_on": "2022-11-04T16:48:21.507",
"last_modified_by": "developer@mparticle.com"
}
{
"id": "string",
"name": "example-data-model",
"state": "active",
"status": "valid",
"errors": [
{
"message": "string"
}
],
"type": "sql",
"config": {
"sql_query": "SELECT email AS email, COUNT(id) AS \"count_of_open_tickets\", LAST_UPDATED_DATE_TIME FROM mp.demo_service.tickets WHERE t.status = 'open'"
},
"created_on": "2023-10-24T21:05:19.281Z",
"created_by": "developer@example.com",
"last_modified_on": "2023-10-24T21:05:19.281Z",
"last_modified_by": "developer@example.com"
}
The main difference between v1 and v2 pipelines is how you configure the type, or “sync mode”, of your Warehouse Sync pipeline, and the schedule of your pipeline.
In v2, all of your sync mode settings are contained in a single JSON object called sync_mode
. There are two basic types of syncs in v2:
In v1, mParticle would determine what data to sync from your warehouse into mParticle by looking at a column in your warehouse that had to contain timestamps. This allowed mParticle to recognize data of a certain age in your warehouse that should either be included or excluded in your syncs according to your settings.
In v2, you can use any column in your warehouse to define what data to include in your syncs, as long as that column contains a datatype that can be incremented. This is defined by the iterator
and iterator_data_type
fields. For most use cases, the best iterator will still be a column containing timestamps, but you can use any column that contains easily compared integers, numbers, dates, timestamps, or strings.
The from
and unil
fields allow you to specify what data to include in your syncs. For example, a sync could include all data from
an iterator value of 1 January 2020
until
1 February 2023
. Note that the from
and until
fields are not used to schedule when syncs occur, but to define a range of rows in your database to sync.
In v2, all of your scheduling settings are contained in a single JSON object called schedule
. All syncs in v2 can be scheduled to be ran:
type
in the schedule
object to once
type
to on-demand
type
to interval
For a complete reference of the difference between v1 and v2 pipelines, refer to the following table:
v1 Field | v2 Field | Change description |
---|---|---|
id |
id |
No change |
name |
name |
No change |
workspace_id |
- | Removed in v2 |
is_active |
state |
is_active and is_draft are replaced by state in v2 |
is_draft |
- | Removed in v2 |
connection_id |
connection_id |
No change |
data_model_id |
data_model_id |
No change |
partner_feed_id |
partner_feed_id |
No change |
partner_feed_key |
partner_feed_key |
No change |
- | sync_mode |
v2 uses a JSON object named sync_mode to contain the settings for which data is synced between your database and mParticle |
- | type |
New field. Can be incremental or full |
- | iterator_field |
New field |
- | iterator_data_type |
New field. Since iterator_field can use any datatype as long as it’s an incremental datatype, you must specify the the datatype mParticle should expect |
- | from |
New field. Specifies the initial value of iterator_field from which data will begin syncing |
- | until |
New field. Specifies the final value of iterator_field that will terminate a sync |
- | schedule |
v2 uses a JSON object named schedule to contain the sync scheduling settings |
schedule_interval |
type |
type replaces schedule_interval . Can beinterval , once , or on-demand |
- | frequency |
New field. Only used if type is set to interval . Can be hourly , daily , weekly , monthly |
- | delay |
New field. Determines amount of time to wait before a scheduled interval sync |
schedule_start_time |
start |
start replaces schedule_start_time . Specifies the timestamp of the initial run of a scheduled sync |
schedule_end_time |
end |
end replaces schedule_end_time . Specifies the timestamp of final run of a scheduled sync |
plan_id |
data_plan_id |
Renamed |
plan_version |
data_plan_version |
Renamed |
environment |
environment |
No change |
Was this page helpful?