Introduction
This tutorial will walk through setting up the DataSet Pub/Sub monitor with GCP logs. We have made the setup of PubSub rather simple.
As an effort to improve our out-of-the-box support for different systems with DataSet Monitors, users can now set up the Pub/Sub DataSet Monitor to pull in messages and retain them in DataSet.
Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It is equally effective as messaging-oriented middleware for service integration or as a queue to parallelize tasks.
There are many use cases that are supported by this integration, including:
- Ingestion user interaction and server events If you would like to pull your GCP Stackdriver (Google Cloud operations suite), logs, or events, to get a high-level view of your cloud within DataSet, you can do so
- Real-time event distribution You can send one stream of data to feed multiple systems. DataSet included.
- Replicating data among databases. Pub/Sub is commonly used to distribute change events from databases. You can build visibility into your databases using Pub/Sub and DataSet.
PreReqs:
- DataSet Account
- Full Access DataSet Permission
- GCP Account
-
GCP Permission to add Pub/Sub Subscriber permissions to topic
Solution Architecture
Here is the high level architecture of the solution we will be discussing.
Instructions
-
Sign in to your Google Admin console.
-
In the Admin console, go to Menu → Account → Account settings → Legal and compliance.
-
Click Sharing options.
-
To share data, click Enabled.
-
To turn off sharing, click Disabled. No new data is shared with Google Cloud services. Existing shared data is deleted according to the Google Cloud admin activity audit log retention period.
-
Click Save.
-
Create a PubSub Topic and Subscription.
-
Log in to GCP.
-
Go to PubSub > Topics
-
Create a new Topic with the default subscription. Keep the subscription at “Pull”.
-
Go to your Pub/Sub topic and select permissions. Add scalyr@scalyr-gcp-integrations.iam.gserviceaccount.com, with the role of “Pub/Sub Subscriber”.
- Create a subscription.
- Navigate to your Pub/Sub topic subscription and select permissions.
- Add scalyr@scalyr-gcp-integrations.iam.gserviceaccount.com and give it the role of “Pub/Sub Subscriber”.
-
Note the "Subscription name" and "Topic name". The names are paths, with the syntax:
Subscription name: "<your project id>/subscriptions/<your subscription id>"
Topic name: "<your project id>/topics/<your topic id>" -
Set Subscription Parameters
-
Delivery Type | Pull |
Expiration period | Never expire |
Message Retention | 24 hours/1 day should be safe |
Acknowledgment deadline | 300 seconds |
8. Save your project id, subscription id, and your topic id.
9. From the Dataset Console, edit /scalyr/monitors (US|EU) and add a new monitor
{
type: "pubsub",
projectId: "<your project id>",
subscriptionId: "<your subscription id>",
topicId: "<your topic id>",
hostname: "pubsub",
parser: "<scalyr parser name>",
logFile: "<topicId>",
executionIntervalMinutes: 1.0, //optionally change polling to 1 minute
timeoutSeconds: 60.0, // should be > 15 seconds
}
10. Create an aggregated sink to pull from. This needs to be done at the Organization level, instead of the project.
11. Select the Organizational level from the drop-down next to the menu.
- Go to IAM to ensure that the user is a Logging admin at the organizational level.
- Go to Log Router → Create Sink
- Add a Name, select Next.
- Select Cloud PubSub Topic → Use Cloud Pub/Sub topic in project.
- For Sink destination, add pubsub.googleapis.com/{Topic Name}, Select Next. If you do not see the option, you are not at the Organizational level.
- Select “Include logs ingestion by the organization and all child resources”.
- Select Next and Save. You should see logs in Dataset within a few minutes.
Comments
0 comments
Please sign in to leave a comment.