Guides

Import Data with CSV Files

This topic contains reference information and instructions for advanced techniques when importing data using CSV files.

Configuration Settings

Configuration Name	Data Type	Default	Description
Setting Name	string		If the event type is custom event, and every row in your file has the same event name, specify the name for all events. If also set in column `events.data.event_name`, the data in the file takes precedence.
Custom Event Type	string		If Event Type is Custom Event, and every row in your file is the same Event Type, use this to set the Type for all events. If also set in `events.data.custom_event_type`, the data in the file takes precedence.
Custom Manifest	string		To override the default format, include a custom manifest.
Expect Encrypted Files	Boolean	`false`	Only encrypted OR unencrypted files can be accepted, but not both. You must use PGP encryption with mParticle’s public key. See Drop encrypted files

Use a custom manifest

A custom manifest allows you to use files created by other systems without transformation. In the mParticle UI, when you’re configuring the Custom CSV input, you can provide a JSON manifest to describe how you want to map your CSV data to mParticle’s fields in one of two ways:

With a header row: Map your column names to mParticle’s column names.
Without a header row: Map your columns, which must appear in a specific order, to mParticle’s column names.

In order to guarantee that the new/changed manifest applies to your CSV files, please ensure a gap of about 5 minutes between the manifest change and uploading CSV files.

With a header row

If your CSV has a header row, you can map the column names in your header directly to mParticle fields. The manifest must set "hasHeaderRow": true and contain an array of columns objects, each giving a column name, an action ("keep" or "ignore") and a target mParticle field. The order of entries in the column array doesn’t need to match the order of columns in the CSV file.

Example with a header row

Assume that you drop the following manifest on the SFTP server:

{
  "hasHeaderRow": true, // REQUIRED
  "columns": [
    {
      "column": "Event Name",
      "action": "keep",
      "field": "events.data.event_name"
    },
    {
      "column": "Custom Type",
      "action": "keep",
      "field": "events.data.custom_event_type"
    },
    {
      "column": "Email",
      "action": "keep",
      "field": "user_identities.email"
    },
    {
      "column": "Facebook",
      "action": "ignore",
      "field": "user_identities.facebook"
    },
    {
      "column": "Home City",
      "action": "keep",
      "field": "user_attributes.$city"
    },
    {
      "column": "Category",
      "action": "keep",
      "field": "events.data.custom_attributes.category"
    },
    {
      "column": "Destination",
      "action": "keep",
      "field": "events.data.custom_attributes.destination"
    },
    {
      "column": "Time",
      "action": "keep",
      "field": "events.data.timestamp_unixtime_ms"
    },
    {
      "column": "Environment",
      "action": "keep",
      "field": "environment"
    }
  ]
}

Then assume that you drop a CSV file with the following header row and one row of data:

Event Name,Custom Type,Email,Facebook,Home City,Category,Destination,Time,Environment Viewed Video,other,h.jekyll.md@example.com,h.jekyll.md,London,Destination Intro,Paris,1466456299032,development

The resulting batch will be:

{
    "events" :
    [
        {
            "data" : {
                "event_name": "Viewed Video",
                "custom_event_type": "other",
                "custom_attributes": {
                    "category": "Destination Intro",
                    "destination": "Paris"                   
                },
                "timestamp_unixtime_ms": "1466456299032"
            },
            "event_type" : "custom_event"
        }
    ],
    "user_attributes" : {
        "$city": "London"
    },
    "user_identities" : {
        "email": "h.jekyll.md@example.com"
    },
    "environment" : "development"
}

Without a header row

If your CSV does not have a header row, you can map columns to mParticle fields by the order they appear in the CSV.

The manifest must include "hasHeaderRow": false and contain an array of columns objects, each giving an action ("keep" or "ignore") and a target mParticle field. For this method to work, you must be able to guarantee the same column order in each CSV file you upload.

The following is an example of the same custom manifest as the previous example, but without a header row:

{
    "hasHeaderRow": false, // REQUIRED
    "columns": [{
   		 "action": "keep",
   		 "field": "events.data.event_name"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "events.data.custom_event_type"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "user_identities.email"
   	 },
   	 {
   		 "action": "ignore",
   		 "field": "user_identities.facebook"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "user_attributes.$city"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "events.data.custom_attributes.category"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "events.data.custom_attributes.destination"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "timestamp_unixtime_ms"
   	 },
   	 {
   		 "action": "keep",
   		 "field": "environment"
   	 }
    ]
}

Encrypted files

The Custom CSV Feed can accept files encrypted with PGP, using mParticle’s Public Key. You can use software like GPG Tools for OS X or Windows. Never use web-based tools to encrypt, hash, or encode data.

To enable encryption, set the configuration setting Expect Encrypted Files to true.

It’s most efficient to send multiple files in a zipped, or gzipped format. Encrypt the final ZIP file instead of each individual file.

Use the public encryption key corresponding to your pod:

US1

EU1

AU1

US2

Sensitive data

Use the following public encryption key for sending sensitive data.

Processing speed

For files within recommended size limits, processing speed is consistent with the Events API: about 270 rows per second. To increase the processing speed for your CSV feed, contact mParticle Support.

Processing Behavior

Files aren’t guaranteed to be processed in sequence; files are not linked to one another. Each file is independent and there’s no way to indicate if two files were split from a master file.
You can observe how much data has been processed using Data Master and your outbound connections. There is no notification.
Once dropped, files start processing at any time. Deleting a file from the dropped folder is not a guarantee that it won’t be processed. Overwriting files can lead to partial or incomplete imports, or other errors may occur.
Rows may be batched together for processing. Thus, there may be fewer processed batches than rows. The only fields not considered unique identifiers for batching are event-specific, such as event name, custom event attributes, and the batch-level timestamp. If two rows have the exact same set of attributes and identifiers otherwise, then they may be batched together for processing.
Each file is processed beginning to end. A file is never split or read asynchronously. Header mappings are on a per-configuration basis and are applied to all potential files (if their headers are not valid already). There is no way to associate a mapping with either a filename or filename pattern.
Processed files are deleted within 30 days.

Was this page helpful?