A SQL query can be provided as part of the Data Model request body with type
set to sql
. mParticle passes your SQL query to your warehouse,
so be sure to use valid syntax for your warehouse. For example, each warehouse has slightly different ways to utilize SQL functions with
different parameters.
The data in your warehouse is mapped to the user profile in mParticle according to the column names in the SQL you provide in the data model request following these rules:
""
or []
to preserve casing.The following SQL commands are fully supported:
The following SQL commands are not supported:
User attribute pipelines can be created without field transformations to allow for simpler mappings based on column names in your SQL query.
Every column is mapped to a user attribute except:
load_timestamp_field_type
field (case-insensitive) in the data model requestFor a row to be associated to a profile, the following reserved column names are mapped to the mParticle User Identities or Device Identities. Additional columns are mapped as custom user attributes.
The following column names in your warehouse (case-insensitive) will be mapped to mParticle user identities automatically:
mpid
customerid
,customer_id
facebook
twitter
google
microsoft
yahoo
email
facebook_custom_audience_id
other
other_id_2
other_id_3
other_id_4
other_id_5
other_id_6
other_id_7
other_id_8
other_id_9
other_id_10
mobile_number
phone_number_2
phone_number_3
The following column names in your warehouse (case-insensitive) will be mapped to mParticle device identities automatically:
android_uuid
ios_advertising_id
push_token
ios_idfv
android_advertising_id
amp_id
roku_advertising_id
roku_publisher_id
microsoft_advertising_id
microsoft_publisher_id
fire_advertising_id
mp_deviceid
mParticle’s demo database has a few user attributes and alongside a prediction for the likelihood they may purchase. The following query will import this alongside their favorite categories as a simple list and apply it to the corresponding Profile according to their customer_id.
SELECT
a.date_updated,
a.customer_id,
a.firstname AS "$firstname",
c.propensity_to_buy AS "propensity_to_buy",
ARRAY_AGG(f.value) WITHIN GROUP (ORDER BY f.value ASC) AS "favorite_categories"
FROM mp.demo.attr a
JOIN mp.demo.calc c ON a.customer_id = c.customer_id
JOIN mp.demo.favs f ON a.customer_id = f.customer_id
GROUP BY
a.date_updated,
a.customer_id,
a.firstname,
c.propensity_to_buy
mParticle’s demo ticket database contains details about the number of requests a user has made. The following query will import the number of open tickets to that user’s Profile according to their e-mail address.
SELECT
u.email AS email,
COUNT(t.id) AS "count_of_open_tickets"
FROM mp.demo_service.tickets t
JOIN mp.demo_service.users u
ON u.id = t.requester_user_id
WHERE t.status = 'open'
mParticle materializes your query by wrapping it in an outer SELECT statement to build more complex statements to execute against your data warehouse. These are the queries you will see looking at the history/audit logs in your data warehouse. For example, assume that a data model has the following query:
SELECT
a.scanned_timestamp_ms,
c.propensity_to_buy
FROM demodw.demo.mp_dw_demo_attr a
JOIN demodw.demo.mp_dw_demo_calc c ON a.customer_id = c.customer_id
The query will be wrapped into (but not limited to) queries such as:
Query the number of rows in your provided data model. The values in the filter predicate are available as data_interval_start
and data_interval_end
in the pipeline run status API:
SELECT COUNT(*)
FROM
(
SELECT a.scanned_timestamp_ms, c.propensity_to_buy
FROM demodw.demo.mp_dw_demo_attr a
JOIN demodw.demo.mp_dw_demo_calc c ON a.customer_id = c.customer_id
)
WHERE SCANNED_TIMESTAMP_MS BETWEEN '2023-03-01 14:28:55+0000' AND '2023-03-01 14:41:17+0000'
Query the number of columns in your provided data model:
SELECT *
FROM
(
SELECT a.scanned_timestamp_ms, c.propensity_to_buy
FROM demodw.demo.mp_dw_demo_attr a
JOIN demodw.demo.mp_dw_demo_calc c ON a.customer_id = c.customer_id
)
LIMIT 0
Query generated from a scheduled sync run. The values in the filter predicate are available as data_interval_start
and data_interval_end
in the pipeline run status API.
SELECT OBJECT_CONSTRUCT_KEEP_NULL(*)
FROM
(
SELECT a.scanned_timestamp_ms, c.propensity_to_buy
FROM demodw.demo.mp_dw_demo_attr a
JOIN demodw.demo.mp_dw_demo_calc c ON a.customer_id = c.customer_id
)
WHERE SCANNED_TIMESTAMP_MS BETWEEN '2023-03-01 14:28:55+0000' AND '2023-03-01 14:41:17+0000'
Was this page helpful?