BigQuery (Define Your Own Schema)

Prerequisites

You must grant ‘bigquery.dataViewer’ access to Analytics’ service account for your BigQuery project. In order to perform the following steps you must have administrative access to the BigQuery console as well as your BigQuery database.

For this self-service integration, we also have some data requirements:

  1. All of your events must be unified into one singular table as opposed to having separate tables for each event type.
  2. By definition, an event must have a user_id, event_name, and timestamp field. The fields do not have to be named as such, and any additional fields will be treated as event properties.
  3. There can only be a maximum of one authenticatedID field and one unauthenticatedID field for aliasing.
  4. The event timestamp must be in UTC
  5. JSON fields must be pre-parsed and flattened into their own fields.
  6. All joins must be done beforehand.
  7. Sharded tables, meaning if your BigQuery tables end with the _MMDDYYYY format are not currently supported.

We can still support any integrations that do not meet the above requirements, but you will need to get in touch with a product specialist. Additionally, if there are additional enrichments required such as joining with user property tables or deriving custom user_ids, please contact us.

Adding a Data Source In Analytics

  1. In Analytics, click on the gear icon and select Project Settings. Project Settings
  2. Select the Data Sources tab. Data Sources
  3. Select New Data Source. New Data Source
  4. Select Connect via Data Warehouse or Lake. Connect via Data Warehouse or Lake
  5. Select BigQuery as your data connection and Define your own schema as the connection schema and click Connect. BigQuery connect
  6. You should see this Google BigQuery overview screen. Click Next. Google BigQuery overview

Connection Information

BigQuery connection
  1. Open the BigQuery console on Google Cloud Platform and Select a project.
  2. Enter the GCP Project ID containing your data. GCP Project ID
  3. Enter the Dataset Name. Dataset name
  4. Enter the Table Name and click Next in Analytics. Table name

Grant Permissions

Grant permissions
  1. This integration works by sharing the dataset with Analytics’ service account and only requires read-only access to that dataset. Analytics takes on the cost of the query and caches this data in Analytics’ proprietary analytics engine.

    1. Within the BigQuery Console, select your Project and your dataset from the previous section.
    2. Click on Share Dataset. Share Dataset
    3. In the Dataset Permissions panel, in the Add Members field, place the user below.

      integrations@indicative-988.iam.gserviceaccount.com

    4. In the Select a Role dropdown, select BigQuery Data Viewer and click Add. BigQuery Data Viewer

Data Loading

Data loading
  1. Load Timestamp Field
    Select the field used to identify new data. We recommend using a timestamp that denotes when the event was published, not the actual event timestamp to allow for late data to be collected. This will not impact your analyses since we reference the event timestamp for our queries. If you select to load data every 3, 6, or 12 hours, make sure to select a load timestamp field with at least hour precision (not a date only field).

    For example, if an event with an event timestamp of 12/1 was published to the table on 12/3, this will not be collected unless we use the publishing timestamp since every daily extract would look for events that occurred on 12/3. Using the publishing timestamp will allow us to extract all new data that was published to the table on a nightly basis.

  2. Start Date
    Select the date from where Analytics should load your data from.

    ::: success If your data history exceeds 1 billion events, a Solutions Engineer will contact you to assist with the integration. :::

  3. Schedule Interval
    Select the frequency to make new data available in Analytics.
  4. Processing Delay
    Select when we should start extracting your data in UTC. This time should be when all of your previous day’s data is fully available in your table for extraction.

Event Modeling

Event modeling
  1. Events Field
    Select the field that should be used for your Analytics event names. We recommend choosing a field that will result in 20-300 unique values.
  2. Timestamp Field
    Select the field that represents the time that the event was performed. This timestamp is the field that will be used for querying.

::: success After this step, we will perform a few checks on your data with the model that you provided. The checks are:

  • Valid event field (Do at least 80% of your records have a value for the event field?
  • Valid timestamp field (Do at least 80% of your records have a value for the timestamp field?
  • Total number of unique events. We recommend 20-300 unique events and limit it to 2000. :::

User Modeling

After some basic checks, we can define your users within your data. For more information on User Identification (Aliasing), please refer to this article.

  1. If you choose to enable Aliasing:

    1. Unauthenticated ID - Input the field used to identify anonymous users.
    2. Authenticated ID - Input the field used to identify known users.
  2. If you choose to disable Aliasing, press Disabled:

    1. Unauthenticated ID - Enter the field used to identify your users. All users must have a value for this field.

If you have a non-null value that represents null UserID values, please click on the Show Advanced button. In this field, please enter these non-null values.

::: success After this step, we will perform additional checks on your data with the user model that you provided. The checks are:

  • User Hotspot (Is there a single UserID that represents over 40% of your records?)
  • Anti-Hotspot (Does your data have too many unique userIDs? A good events table contains multiple events per user)
  • Aliasing - Too many unauthenticated IDs for a single authenticated userID - Too many authenticated IDs for a single anonymous ID :::

Assisted Modeling

Assisted modeling

You should see a summary of your data based on the last 7 days in two main blocks. You should only be concerned if the margin of error is significant. If so, please reach out to a product specialist:

  1. Events Summary
    You should see a daily breakdown of your Total Event Count, and the number of Unique Event Names. If there are certain events to exclude, please click on the Exclude checkbox for those events.

    If you would like to exclude any events by regex or property value, please contact a product specialist.

  2. Properties Summary
    Here you will see the number of Unique Property Names. If there are certain properties to exclude, please click on the Exclude checkbox for those events.

    If you require more advanced configurations such as parsing out JSON fields, creating derived properties, or excluding properties based on regex, please contact a product specialist.

  3. Users Summary
    This section lists the number of Unique users seen. If the numbers do not look correct, please go back to the User Modeling section to confirm that the correct ID was chosen.

Waiting For Data

Waiting for data

If you see this screen, you’re all done! You should see your data in Analytics within 48-72 hours and will be notified by email.

Was this page helpful?