Troubleshooting the Scalyr Agent – DataSet Customer Portal

Introduction

I wrote this article to provide some insight into troubleshooting the Scalyr Agent. In most cases, a repeatable set of steps can be used to identify potential areas of concern.

The Scalyr Agent automatically reloads configuration files that have been updated once every 30s-1min. If an error is found, the Agent will attempt to use the last known-good configuration. This ensures that the Agent is not abruptly terminated if a bad configuration is deployed. However, it is possible that the Agent will not successfully restart in cases of a critical configuration error. As you will learn from this article, diagnosing these errors is relatively straightforward. I've also included some of the best practices that the Support Team has developed.

Scalyr Agent Status

The immediate status of the Scalyr Agent can be determined by accessing the console and typing:

scalyr-agent-2 status -v

Note: We recommend including the complete output of the status command when filing a ticket with the Support team, as this expedites our investigation.

The status -v command returns the following sections:

Agent Configuration

Displays the configuration file(s) that are presently loaded by the Scalyr Agent. Critical formatting issues within the configuration file(s) will be displayed here when they occur. We recommend checking this section after making modifications (especially when working with escaped characters).

Log Transmission

Includes the host machine's connection status along with the last message returned by the DataSet API endpoint. This is the first section to check if you are wondering about connectivity or authentication (API key) issues.

All actively monitored logs (either individually or by wildcard) are listed here. If your log was recently modified by an application, its most recent upload time and status are included. It should be noted that a log file will only be uploaded once, even if it was defined multiple times within the configuration.

Monitors

The status of active monitor(s) on your host are listed here. Errors are displayed inline

Overview

The status -v command provides the operational status of a running instance of the Scalyr Agent. For additional detail, consult the Agent log (next section).

Scalyr Agent Log

Each instance of the Scalyr Agent uploads a diagnostic log to DataSet by default; this log has a minimal footprint and won't impact your overall log volume. For this reason, we recommend leaving the implicit_agent_log_collection parameter set to its factory value of true (this is the default value).

Furthermore, if the DataSet Support team can access this information (requires shared access to your account), it may enable us to identify a resolution more quickly.

You can quickly check for errors by running this DataSet search:

severity > 3 logfile='/var/log/scalyr-agent-2/agent.log'

If the status command indicated an issue, check the local Scalyr Agent log file, which is located at /var/log/scalyr-agent-2/agent.log or C:\Program Files (x86)\Scalyr\log\agent.log. Additional debugging information (which isn't uploaded to DataSet) may also be available.

Depending on the issue you've experienced (particularly if a connectivity or other issue prevented the Agent log from being uploaded), we may ask for the entire agent.log file to be tarred and sent in for evaluation.

Scalyr Agent Configuration File

If you recently made a change to the Scalyr Agent's configuration, or are receiving a related error that doesn't make sense, please feel free to contact us. In a lot of cases, errors with the configuration file(s) are due to unescaped characters, JSON formatting, or an invalid setting.

Note: Please include the Scalyr Agent configuration file (either /etc/scalyr-agent-2/agent.json or /etc/scalyr-agent-2/agent.d/.* or C:\Program Files (x86)\Scalyr\config\agent.json) after redacting the API key and any other confidential information beforehand.

Kubernetes Configuration

Kubernetes makes use of some additional configuration files, which are useful to process when an issue occurs. When contacting Support about a potential issue, please include the following if possible:

The ConfigMap and DaemonSet files
The version of Kubernetes
The platform (GKS, AKS, EKS, etc.) you are working with
Whether you are deploying with CloudFormation, Terraform, or a similar product