DataSet gives you the opportunity to modify or delete logs at a number of stages throughout the ingestion pipeline before logs are stored in DataSet and available for searching.
Within the Scalyr Agent:
- Sampling: match lines via a pattern and specify a range from 0-1, where 0 means no lines are retained, 0.1 means 10% of lines are retained, and 1 means all lines are retained. In the Linux agent, you have the option to specify a lineGrouper, to group stack traces together before sampling. Read more: Linux agent | K8S agent
- Redaction: match lines via a pattern and rewrite a portion of the line, either with static text or a hash. Useful for redacting sensitive information before it leaves your servers. Read more: Help center | Linux agent | K8s agent
Within the DataSet UI:
- Log Processing - Data Scrubbing: gives you a convenient UI for rewriting log lines, also useful when data is coming from a source other than the Scalyr Agent. Specify a filter to match certain log lines, and then match certain text within the lines and rewrite it. Read more.
- Parsing: provides a wide variety of tools for transforming logs before they are stored, including the ability to group lines together (e.g. stack traces), discard lines, parse fields from log lines, associate data between different lines, and rewrite fields. Read more: Product docs | Help center articles
- Cost Management: allows you to define log categories and a planned log volume for each category. Receive notifications if categories exceed their planned volume. Define discard rules to eliminate logs that are usually not needed in order to save on cost, but with the option to re-enable in the DataSet UI. For example, discard production debug logs normally, but turn off the filter to capture additional detail when troubleshooting. Read more.