Our S3 monitor has been a long-standing method of getting data into DataSet from an S3 bucket. This method utilizes SQS, so when a bucket is updated, SQS alerts DataSet and data is synced to DataSet.
This is useful for bringing large amounts of random data into DataSet like a text file or a CSV.
This tutorial will go through the setup of both the AWS and the DataSet side. This is meant to be used in conjunction with the full docs (US, Skylight US)
Solution
Pre Reqs
1. DataSet Account with Full Access or Skylight
2. Aws Account with access to create S3 Bucket, and SQS Queue, IAM Role, and Policy
Step 1: Configure AWS
Step 1a: Setup S3 Bucket
2. we will configure notifications later
Step 1b: Configure SQS Queue
- Create SQS Queue
-
In the Amazon SQS console, in the Queues list, choose the queue name.
-
On the Access policy tab, choose Edit.
- See documentation here for policy and more detail.
- TROUBLESHOOTING TIP - in some cases you may want to try
"Action": [ "sqs:SendMessage" ],
- TROUBLESHOOTING TIP - in some cases you may want to try
-
Choose Save.
- The SQS queue you created is another resource in your AWS account, and it has a unique Amazon Resource Name (ARN). You will need this ARN in the next step. The ARN will be of the following format:
arn:aws:sqs:
aws-region
:account-id
:queue-name
Step 1c: Configure IAM
- Make a note of your AWS account ID (a 12-digit number). You can find it near the top of the AWS My Account page.
- Log in to the Amazon AWS console. From the Services menu, choose "IAM".
- Go to the Roles list.
- Click "Create Role".
- Under "Select type of trusted entity" select "Another AWS account".
- For "Account ID" enter "913057016266".
- Under options check "Require external ID" and enter the value "Login to View -
- Click "Next: Permissions", then "Create policy", this will open in a new tab.
- Select the following values:
Effect: Allow
AWS Service: Amazon S3
Actions: check GetObject
Amazon Resource Name: arn:aws:s3:::bucket-name/*
Replace bucket-name with the name of the S3 bucket you specified when setting up bucket access logging. - Click "Add additional permissions".
- Update the form with the following values:
Effect: Allow
AWS Service: Amazon SQS
Actions: check GetQueueAttributes, DeleteMessage, and ReceiveMessage
Amazon Resource Name: arn:aws:sqs:us-east-1:account-id:queue-name
Replace account-id with your 12-digit AWS account ID, without hyphens. Replace bucket-name with the name of the SQS queue you subscribed to the S3 bucket. - Note: If the contents of your S3 bucket are encrypted you will need to also add "KMS" permissions to this policy.
- Click "Review policy", name it, then click "Create policy".
- Return to the create role tab and select your newly created policy and hit "Next".
- Skip past adding tags and give your role a name, then hit "Create role".
- Take note of your Role ARN - (This will be used to configure access on the DataSet side)
Step 1d: Setup S3 Bucket notifications
- Go back to the S3 Bucket you just created.
- Navigate to Properties then Event Notifications then Create Event Notification
- Give it a name and select All object create events
- under Destination select SQS, select from the dropdown or enter the ARN
Step 2: Configure DataSet
- Login to DataSet
- Select your name from the top right corner, navigate to Monitors then click Edit Json
- Enter this information as a new object - see full docs for more info
{
type: "s3Bucket",
region: "us-east-1",
roleToAssume: "arn:aws:iam::account_id:role/role",
queueUrl: "https://sqs.us-east-1.amazonaws.com/account_id/sqs_queue_id"
fileFormat: "text",
hostname: "s3",
parser: "s3-parser"
}
Comments
0 comments
Please sign in to leave a comment.