I wrote this article to provide some insight into troubleshooting the Scalyr Agent. In most cases, a repeatable set of steps can be used to identify potential areas of concern.
The Scalyr Agent automatically reloads configuration files that have been updated once every 30s-1min. If an error is found, the Agent will attempt to use the last known-good configuration. This ensures that the Agent is not abruptly terminated if a bad configuration is deployed. However, it is possible that the Agent will not successfully restart in cases of a critical configuration error. As you will learn from this article, diagnosing these errors is relatively straightforward. I've also included some of the best practices that the Support Team has developed.
Scalyr Agent Status
The immediate status of the Scalyr Agent can be determined by accessing the console, PowerShell, k8s pod, or Docker container and typing:
scalyr-agent-2 status -v
Note: We recommend including the output of the
status command in its entirety when filing a ticket with the Support team, as this expedites our investigation.
status command returns the following sections:
Displays the configuration file(s) that are presently loaded by the Scalyr Agent. Critical formatting issues within the configuration file(s) will be displayed here when they occur. We recommend checking this section after making modifications (especially when working with escaped characters).
Includes the host machine's connection status along with the last message returned by the DataSet API endpoint. This is the first section to check if you are wondering about connectivity or authentication (API key) issues.
All actively monitored logs (either individually or by wildcard) are listed here. If your log was recently modified by an application, its most recent upload time and status are included. It should be noted that a log file will only be uploaded once, even if it was defined multiple times within the configuration.
The status of active monitor(s) on your host are listed here. Errors are displayed inline
status command provides the operational status of a running instance of the Scalyr Agent. For additional detail, consult the Agent log (next section).
Scalyr Agent Log
Each instance of the Scalyr Agent uploads a diagnostic log to DataSet by default; this log has a minimal footprint and won't impact your overall log volume. For this reason, we recommend leaving the
implicit_agent_log_collection parameter set to its factory value of
true (this is the default value).
Furthermore, if the DataSet Support team can access this information (requires shared access to your account), it may enable us to identify a resolution more quickly.
You can quickly check for errors by running this DataSet search:
$severity > 3 $logfile == '/var/log/scalyr-agent-2/agent.log'
status command highlighted an issue, check the local Scalyr Agent log file, which is located at
C:\Program Files (x86)\Scalyr\log\agent.log. Additional debugging information (which isn't uploaded to DataSet) may also be available.
Depending on the issue you've experienced (particularly if a connectivity or other issue prevented the Agent log from being uploaded), we may ask for the agent.log file to be tarred and sent in for evaluation.
Scalyr Agent Configuration File
If you recently made a change to the Scalyr Agent's configuration, or are receiving a related error that doesn't make sense, please feel free to contact us. In a lot of cases, errors with the configuration file(s) are due to unescaped characters, JSON formatting, or an invalid setting.
Note: It's a good idea to include the Scalyr Agent configuration file (either
C:\Program Files (x86)\Scalyr\config\agent.json) after redacting the API key and any other confidential information beforehand.
Kubernetes makes use of some additional configuration files, which are useful to process when an issue occurs. When contacting Support about a potential issue, please include the following if possible:
- The ConfigMap and DaemonSet files
- The version of Kubernetes
- The platform (GKS, AKS, EKS, etc.) you are working with
- Whether you are deploying with CloudFormation, Terraform, or a similar product