Ingestion testing with the Agent – DataSet Customer Portal

Introduction

When adding or updating a regex to the Agent's logs section, we recommend testing it before modifying your production configuration. The Agent uses Python's "re" regular expressions library for internal processes and is capable of performing functions like lineGroupers, scrubbing (redaction rules), and discards (sampling) before logs are uploaded to DataSet. Consequently, these Agent functions may reduce network traffic / egress charges.

Once logs have been uploaded to DataSet, UI-based discard rules (User Menu -> "Billing & Usage" -> Discard Filters section), scrubbing rules (User Menu -> "Manage Logs" -> "Log Processing"), or parsers may be applied. The Agent can be used to simulate log ingestion to verify that these rules are working as expected.

I run the latest Agent in a VM and upload logs to a test account, but any console environment (ex. EC2, a Linux box, etc.) where you have access to the Agent, a text editor, and the necessary user permissions should suffice.

Test Sampling Rules

A customer inquired about whether the following sampling rules work as expected. As you can see from the configuration, these sampling rules will discard any matches since the sampling_rate is 0:

log.config.scalyr.com/sampling_rules.0.match_expression: "(PROTOCOL\\(V2\\)|desired protocol magic|TCP: new client|IDENTIFY:)"
log.config.scalyr.com/sampling_rules.0.sampling_rate: "0"
log.config.scalyr.com/sampling_rules.1.match_expression: "state rdy"
log.config.scalyr.com/sampling_rules.1.sampling_rate: "0"
log.config.scalyr.com/sampling_rules.2.match_expression: "200 GET /(ping|metrics|stats)"
log.config.scalyr.com/sampling_rules.2.sampling_rate: "0"
log.config.scalyr.com/sampling_rules.3.match_expression: "sending heartbeat"
log.config.scalyr.com/sampling_rules.3.sampling_rate: "0"

Agent Configuration

In order to test these rules, I set up redaction_rules instead of sampling_rules to verify that the individual regular expressions are working as expected. If a match is found, it is replaced by a value of blah[1-2].

...
  {
    path: "/var/log/test.log",
    attributes: { parser: "test" },
    redaction_rules: [
    {
      match_expression: ".*(PROTOCOL\\(V2\\)|desired protocol magic|TCP: new client|IDENTIFY:).*",
      replacement: "blah1"
    },
    {
      match_expression: ".*200 GET \/(ping|metrics|stats).*",
      replacement: "blah2"
    }
  ]
}
...

Create Sample Log Events

No sample log events were available, so I created my own. I assembled the log events in a file (/tmp/blah.txt in this example):

PROTOCOL(V2) this is a log
this is a log PROTOCOL(V2) whee
protocol(v2) this is a log
[log stuff] desired protocol magic blah
[more log stuff] TCP: new client test test
HELLO IDENTIFY: yourself
--test--
[time] 200 GET /ping
[time] 200 GET /metrics
[time] [protocol] 200 GET /stats
200 GET /PING
--test--

Concatenate Agent monitored log file

Append the log events to the Agent monitored log file (/var/log/test.log):

$ cat /tmp/blah.txt >> /var/log/test.log

This enables multiple log events to be tested simultaneously.

Confirm results in DataSet

Query: logfile='/var/log/test.log'

Note: The log filename was truncated in this screenshot

As you can see, it's easy to identify the log events which were matched and modified by the redaction rules. Consequently, these regular expressions will work as expected when added to the Agent configuration as sampling_rules.

Conclusion

The Agent can be used to upload logs to a DataSet account for testing purposes. This simplifies the process of testing regular expressions an Agent-based functions on log events prior to their implementation in production environments.

The above method can also be used when setting up scrubbing, discard rules, or parsers in the DataSet UI. When doing so, you would perform the following steps:

Set up the Agent - Designate a log file that the Agent will monitor and upload events from
Create sample log events - You can either create your own test logs, copy events from the file where they originated, or access the DataSet "Search" page and copy individual log events as needed (Hint: Click the log event you wish to copy. From the "Inspect Log Line" dialog, click the "Copy" button beside the log event). This will copy the entire unmodified log event to your clipboard.
Update the log file that is monitored by the Agent (be sure that you are appending log events to the file, as the Agent reads log files incrementally and stores the cursor location)
Verify the results in DataSet with a search query. Are they formatted / parsed / discarded as expected?