Warehouse Sync API v2 Migration

mParticle released version 2 of the Warehouse Sync API on October 31, 2023. All Warehouse Sync pipelines created using v1 of the API were automatically migrated to v2, so you do not need to take any action to update or migrate your existing pipelines.

This doc highlights the key differences between the v1 and v2 APIs by resource group.

Connections

v2 has removed, reorganized, and renamed many of the configuration fields specific to Snowflake, BigQuery, and Redshift. Navigate between each warehouse provider using the tabs below to see tables illustrating these changes side-by-side.

Snowflake

v1 Field v2 Field Change description
id id No change
name name No change
workspace_id - Removed in v2
- state New field. Can be new, active or inactive
is_faulted status Renamed
type service_provider Renamed
- config v2 uses a JSON objected called config to contain the warehouse specific config settings
source_account_id account_identifier Renamed
region region No change
warehouse warehouse No change
database database No change
role role No change
user user No change
password password No change
snowflake_aws_iam_user_arn aws_iam_user_arn Renamed
snowflake_aws_external_id aws_external_id Renamed
created_on created_on No change
created_by created_by No change
last_modified_on last_modified_on No change
last_modified_by last_modified_by No change

Example v1 Snowflake connection

{
  "id": "example-connection",
  "name": "Example Connection",
  "workspace_id": 5328,
  "is_faulted": false,
  "type": "snowflake",
  "source_account_id": "gd1234",
  "region": "us-central1.gcp",
  "warehouse": "compute_wh",
  "database": "indicative",
  "role": "mp_role",
  "user": "mp_user",
  "password": "************",
  "snowflake_aws_iam_user_arn": "arn:aws:iam::123456:user/externalstages/abcdefg",
  "snowflake_aws_external_id": "GD1234=2_abcdefg==",
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Example v2 Snowflake connection

{
  "id": "example-connection",
  "name": "Example Snowflake Connection",
  "state": "active",
  "status": "healthy",
  "service_provider": "Snowflake",
  "config": {
    "account_identifier": "gd12345",
    "region": "us-central1.gcp",
    "warehouse": "compute_wh",
    "database": "my_database",
    "role": "mparticle_role",
    "user": "mparticle_user",
    "password": "************",
    "aws_iam_user_arn": "arn:aws:iam::123456:user/externalstages/abcdefg",
    "aws_external_id": "GD1234=2_abcdefg=="
  },
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Google BigQuery

v1 Field v2 Field Change description
id id No change
name name No change
workspace_id - Removed in v2
- state New field. Can be new, active or inactive
is_faulted status Renamed
type service_provider Renamed
- config v2 uses a JSON objected called config to contain the warehouse specific config settings
region region No change
service_account_id service_account_id No change
service_account_key service_account_key No change
project_id project_id No change
dataset_id dataset_id No change
created_on created_on No change
created_by created_by No change
last_modified_on last_modified_on No change
last_modified_by last_modified_by No change

Example v1 BigQuery connection

{
  "id": "example-connection",
  "name": "Example Connection",
  "workspace_id": 1234,
  "is_faulted": false,
  "type": "bigquery",
  "region": "us-east4",
  "service_account_id": "mp_service",
  "service_account_key": "************",
  "project_id": "mp-project",
  "dataset_id": "mp-dataset",
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Example v2 BigQuery connection

{
  "id": "example-bigquery-connection",
  "name": "Example BigQuery Connection",
  "state": "active",
  "status": "healthy",
  "service_provider": "BigQuery",
  "config": {
    "region": "us-east1",
    "project_id": "my-gcp-project",
    "dataset_id": "my-dataset",
    "service_account_id": "mparticle-account@my-gcp-project.iam.gserviceaccount.com",
    "service_account_key": "************"
  },
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Amazon Redshift

v1 Field v2 Field Change description
id id No change
name name No change
workspace_id - Removed in v2
- state New field. Can be new, active or inactive
is_faulted status Renamed
type service_provider Renamed
- config v2 uses a JSON objected called config to contain the warehouse specific config settings
database database No change
user user No change
password password No change
redshift_aws_iam_role_arn aws_iam_role_arn Renamed
host host No change
port port No change
created_by created_by No change
last_modified_on last_modified_on No change
last_modified_by last_modified_by No change

Example v1 Redshift connection

{
  "id": "example-connection",
  "name": "Example Connection",
  "workspace_id": 5328,
  "is_faulted": false,
  "type": "redshift",
  "database": "mp",
  "user": "mp_user",
  "password": "************",
  "redshift_aws_iam_role_arn": "arn:aws:iam::1234567:role/mparticle_redshift_role",
  "host": "cluster.abcd1234.us-east-1.redshift.amazonaws.com",
  "port": "5439",
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Example v2 Redshift connection

{
  "id": "example-redshift-connection",
  "name": "Redshift Connection",
  "state": "active",
  "status": "healthy",
  "service_provider": "Redshift",
  "config": {
    "database": "dev",
    "host": "dwi-test.ab123yyxwwzz.us-east-1.redshift.amazonaws.com",
    "port": "5439",
    "user": "mParticle",
    "password": "************",
    "aws_iam_role_arn": "arn:aws:iam::123456789:role/mParticle_role"
  },
  "created_on": "2023-02-03T23:53:08.413",
  "created_by": "developer@mparticle.com",
  "last_modified_on": null,
  "last_modified_by": null
}

Data models

The main difference between data models in v1 and v2 is that v2 data models are defined as SQL statements within a new JSON object called config. Each of the four timestamp configuration fields in v1 have also been removed, because v2 Warehouse Sync pipelines allow you to define what data to include in your syncs using any column in your database you want using the sync_mode fields of your pipeline.

v1 Field v2 Field Change description
id id No change
name name No change
workspace_id - Removed in v2
- state New field. Can be active or inactive.
- status New field
type type No change
- config v2 uses a JSON objected called config to contain the data model’s SQL query
sql_query sql_query sql_query has been moved into the JSON object called config
load_timestamp_field_type - Removed in v2
load_timestamp_field_name - Removed in v2
load_timestamp_field_time_zone - Removed in v2
load_timestamp_field_time_offset - Removed in v2
created_on created_on No change
created_by created_by No change
last_modified_on last_modified_on No change
last_modified_by last_modified_by No change

Example v1 data model

{
    "id": "example-data-model",
    "name": "Example Data Model",
    "workspace_id": 5328,
    "type": "sql",
    "sql_query": "SELECT email AS email, COUNT(id) AS \"count_of_open_tickets\", LAST_UPDATED_DATE_TIME FROM mp.demo_service.tickets WHERE t.status = 'open'",
    "load_timestamp_field_type": "timestamp_ltz",
    "load_timestamp_field_name": "LAST_UPDATED_DATE_TIME",
    "load_timestamp_field_time_zone": null,
    "load_timestamp_field_time_offset": 0,
    "created_on": "2022-09-16T16:58:47.317",
    "created_by": "developer@mparticle.com",
    "last_modified_on": "2022-11-04T16:48:21.507",
    "last_modified_by": "developer@mparticle.com"
}

Example v2 data model

{
    "id": "string",
    "name": "example-data-model",
    "state": "active",
    "status": "valid",
    "errors": [
        {
          "message": "string"
        }
    ],
    "type": "sql",
    "config": {
      "sql_query": "SELECT email AS email, COUNT(id) AS \"count_of_open_tickets\", LAST_UPDATED_DATE_TIME FROM mp.demo_service.tickets WHERE t.status = 'open'"
    },
    "created_on": "2023-10-24T21:05:19.281Z",
    "created_by": "developer@example.com",
    "last_modified_on": "2023-10-24T21:05:19.281Z",
    "last_modified_by": "developer@example.com"
}

Pipelines

The main difference between v1 and v2 pipelines is how you configure the type, or “sync mode”, of your Warehouse Sync pipeline, and the schedule of your pipeline.

Sync mode

In v2, all of your sync mode settings are contained in a single JSON object called sync_mode. There are two basic types of syncs in v2:

  • Incremental syncs: these pipelines ingest data from your warehouse into mParticle in discrete segments on a recurring schedule.
  • Full syncs: these pipelines ingest all of the data you specify using your sync mode settings each time your pipeline is ran.

In v1, mParticle would determine what data to sync from your warehouse into mParticle by looking at a column in your warehouse that had to contain timestamps. This allowed mParticle to recognize data of a certain age in your warehouse that should either be included or excluded in your syncs according to your settings.

In v2, you can use any column in your warehouse to define what data to include in your syncs, as long as that column contains a datatype that can be incremented. This is defined by the iterator and iterator_data_type fields. For most use cases, the best iterator will still be a column containing timestamps, but you can use any column that contains easily compared integers, numbers, dates, timestamps, or strings.

The from and unil fields allow you to specify what data to include in your syncs. For example, a sync could include all data from an iterator value of 1 January 2020 until 1 February 2023. Note that the from and until fields are not used to schedule when syncs occur, but to define a range of rows in your database to sync.

Sync schedule

In v2, all of your scheduling settings are contained in a single JSON object called schedule. All syncs in v2 can be scheduled to be ran:

  • Once, by setting type in the schedule object to once
  • On-demand, by setting type to on-demand
  • Incrementally, according to your schedule settings, by setting type to interval

For a complete reference of the difference between v1 and v2 pipelines, refer to the following table:

v1 Field v2 Field Change description
id id No change
name name No change
workspace_id - Removed in v2
is_active state is_active and is_draft are replaced by state in v2
is_draft - Removed in v2
connection_id connection_id No change
data_model_id data_model_id No change
partner_feed_id partner_feed_id No change
partner_feed_key partner_feed_key No change
- sync_mode v2 uses a JSON object named sync_mode to contain the settings for which data is synced between your database and mParticle
- type New field. Can be incremental or full
- iterator_field New field
- iterator_data_type New field. Since iterator_field can use any datatype as long as it’s an incremental datatype, you must specify the the datatype mParticle should expect
- from New field. Specifies the initial value of iterator_field from which data will begin syncing
- until New field. Specifies the final value of iterator_field that will terminate a sync
- schedule v2 uses a JSON object named schedule to contain the sync scheduling settings
schedule_interval type type replaces schedule_interval. Can beinterval, once, or on-demand
- frequency New field. Only used if type is set to interval. Can be hourly, daily, weekly, monthly
- delay New field. Determines amount of time to wait before a scheduled interval sync
schedule_start_time start start replaces schedule_start_time. Specifies the timestamp of the initial run of a scheduled sync
schedule_end_time end end replaces schedule_end_time. Specifies the timestamp of final run of a scheduled sync
plan_id data_plan_id Renamed
plan_version data_plan_version Renamed
environment environment No change

Was this page helpful?