This tutorial will walk through setting up the DataSet Pub/Sub monitor with GCP logs. We have made the setup of PubSub rather simple.
As an effort to improve our out-of-the-box support for different systems with DataSet Monitors, users can now set up the Pub/Sub DataSet Monitor to pull in messages and retain them in DataSet.
Pub/Sub is used for streaming analytics and data integration pipelines to ingest and distribute data. It is equally effective as messaging-oriented middleware for service integration or as a queue to parallelize tasks.
There are many use cases that are supported by this integration, including:
- Ingestion user interaction and server events If you would like to pull your GCP Stackdriver (Google Cloud operations suite), logs, or events, to get a high-level view of your cloud within DataSet, you can do so
- Real-time event distribution You can send one stream of data to feed multiple systems. DataSet included.
- Replicating data among databases. Pub/Sub is commonly used to distribute change events from databases. You can build visibility into your databases using Pub/Sub and DataSet.
- DataSet Account
- Full Access DataSet Permission
- GCP Account
GCP Permission to add Pub/Sub Subscriber permissions to topic
Solution Architecture
Here is the high level architecture of the solution we will be discussing.
Sign in to your Google Admin console.
In the Admin console, go to Menu → Account → Account settings → Legal and compliance.
Click Sharing options.
To share data, click Enabled.
To turn off sharing, click Disabled. No new data is shared with Google Cloud services. Existing shared data is deleted according to the Google Cloud admin activity audit log retention period.
Click Save.
Create a PubSub Topic and Subscription.
Log in to GCP.
Go to PubSub > Topics
Create a new Topic with the default subscription. Keep the subscription at “Pull”.
Go to your Pub/Sub topic and select permissions. Add, with the role of “Pub/Sub Subscriber”.
- Create a subscription.
- Navigate to your Pub/Sub topic subscription and select permissions.
- Add and give it the role of “Pub/Sub Subscriber”.
Note the "Subscription name" and "Topic name". The names are paths, with the syntax:
Subscription name: "<your project id>/subscriptions/<your subscription id>"
Topic name: "<your project id>/topics/<your topic id>" -
Set Subscription Parameters
Delivery Type | Pull |
Expiration period | Never expire |
Message Retention | 24 hours/1 day should be safe |
Acknowledgment deadline | 300 seconds |
8. Save your project id, subscription id, and your topic id.
9. From the Dataset Console, edit /scalyr/monitors (US|EU) and add a new monitor
type: "pubsub",
projectId: "<your project id>",
subscriptionId: "<your subscription id>",
topicId: "<your topic id>",
hostname: "pubsub",
parser: "<scalyr parser name>",
logFile: "<topicId>",
executionIntervalMinutes: 1.0, //optionally change polling to 1 minute
timeoutSeconds: 60.0, // should be > 15 seconds
10. Create an aggregated sink to pull from. This needs to be done at the Organization level, instead of the project.
11. Select the Organizational level from the drop-down next to the menu.
- Go to IAM to ensure that the user is a Logging admin at the organizational level.
- Go to Log Router → Create Sink
- Add a Name, select Next.
- Select Cloud PubSub Topic → Use Cloud Pub/Sub topic in project.
- For Sink destination, add{Topic Name}, Select Next. If you do not see the option, you are not at the Organizational level.
- Select “Include logs ingestion by the organization and all child resources”.
- Select Next and Save. You should see logs in Dataset within a few minutes.
Please sign in to leave a comment.