DataSet offers various methods for events redaction. Depending on the use cases and ingestion methods, redaction can be configured via parsers, scrubbing rule, or cost management. And yet, DataSet doesn't have a feature for only "keeping" selected events from a specific application log.
DataSet is a platform for collecting all types of data and it enables users to act on those data. While filtering sensitive information is a high-demand feature, we're rarely asked to discard everything except a small portion of the application messages.
The only scenario this use case has been brought up is due to cost-saving purposes. Even though DataSet doesn't have a feature specifically built for it, the agent's sampling_rules
can be configured to ingest events with certain strings.
For instance, if you want to discard all log types except INFO, ERROR, WARN, adding the below agent config snippet to the application's log section gets the job done.
Sample logs
2022-Jan-18 13:37:18,503 DEBUG TaskManager - Awaiting stop of: com.zendesk.maxwell.monitoring.MaxwellHTTPServerWorker@1c783dfa
2022-Jan-18 13:37:18,503 INFO TaskManager - Stopped all tasks
2022-Jan-18 13:37:18,505 ERROR MaxwellContext - Shutdown failed
Exception in thread "main" java.lang.NullPointerException
at com.example.myproject.Book.getTitle(Book.java:16)
at com.example.myproject.Author.getBookTitles(Author.java:25).
at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
2022-Jan-18 13:37:18,503 INFO TaskManager - Restart
Agent config
lineGroupers: [ { start: "^\\d+-\\w+-\\d+ \\d+:\\d+:\\d+,\\d+ ", haltBefore: "^\\d+-\\w+-\\d+ \\d+:\\d+:\\d+,\\d+ " } ] sampling_rules: [ { match_expression: "^\\d+-\\w+-\\d+ \\d+:\\d+:\\d+,\\d+ INFO", sampling_rate: 1 }, { match_expression: "^\\d+-\\w+-\\d+ \\d+:\\d+:\\d+,\\d+ ERROR", sampling_rate: 1 }, { match_expression: "^\\d+-\\w+-\\d+ \\d+:\\d+:\\d+,\\d+ WARN", sampling_rate: 1 }, { match_expression: ".*", sampling_rate: 0 }
]
Note that the agent processes each sampling rule by order, so placing ".*" to filter out everything at the end is the trick for this solution.
Please refer to the article "configuring sampling rule" for additional information.
Comments
0 comments
Please sign in to leave a comment.