AWS S3 Integration (Define Your Own Schema)

The S3 integration with Analytics is available for Enterprise customers only. If interested, please contact us. You are required to grant S3 access to Analytics by editing the IAM policy of an existing S3 bucket.

In order to perform the following steps you must have administrative access to the AWS Console as well as your S3 database.

If you are integrating Snowplow data via AWS S3, please use our Snowplow Integration documentation instead.

Instructions

You are required to configure an IAM policy to grant Analytics programmatic access to the respective S3 bucket.

In Analytics

  1. In Analytics, click on the gear icon and select Project Settings. Project Settings
  2. Select the Data Sources tab. Data Sources
  3. Select New Data Source. New Data Source
  4. Select Amazon S3 and Define your own schema. Click Connect. Amazon S3
  5. You should see this screen. S3 Screen
  6. Click Next.

Connection Information

Connection Information

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  2. For Source Format, select the file format of the data in your S3 bucket.
  3. For Bucket Name, enter the name of the S3 bucket that we should connect to. Click on the Show Me on the right to see how to get this information.
  4. For File Path, enter the file path corresponding to the data you want to use in Analytics. Click on the Show Me on the right to see how to get this information.
  5. Click Next.

Grant Permissions

  1. In this section, click on the box that contains the policy to copy to your clipboard. You will need to use this in step 4 of this section. Grant Permissions
  2. Go back to the AWS Console. Select the bucket and click on the Permissions tab. Permissions Tab
  3. Click on Bucket Policy. Bucket Policy
  4. Enter the copied policy from step 1 into the editor and click Save.
  5. Click Next in Analytics.

Event Modeling

Event Modeling

  1. Events Field - Enter the name of the field that should be used to derive your Analytics event names.
  2. Timestamp - Enter the name of the field that should be used for querying in Analytics.
  3. Click Next.

User Modeling

User Modeling

For more information on User Identification (Aliasing), please refer to this article.

  1. If you choose to enable Aliasing:

    • Unauthenticated ID - Input the field used to identify anonymous users.
    • Authenticated ID - Input the field used to identify known users.
  2. If you choose to disable Aliasing, press Disabled:

    • Unauthenticated ID - Enter the field used to identify your users. All users must have a value for this field.
  3. Press Next.

Scheduling

Scheduling

  1. Select the Schedule Interval to adjust the frequency at which new data is available in Analytics.
  2. Set the Schedule Time for when the data should be extracted from your BigQuery environment. It is critical that 100% of the data is available by this time to avoid loading partial data.
  3. Select Save.

Waiting for Data

Waiting for Data

Advanced Settings

For additional advanced settings such as excluding certain events and properties, please refer to this page.

If you have any questions or concerns about the above Integration, please contact your Customer Support Manager, or email support@mparticle.com.

Was this page helpful?