Configuring ECS Fargate – DataSet Customer Portal

Introduction

DataSet ECS Fargate utilizes the Fluentd plugin to push logs to the addEvents API endpoint to ingest data in DataSet. This is useful if you want to use DataSet as a log aggregator.

There are several configuration options that we can set to allow for customizations and parsing on the DataSet end.

Instructions

You can configure different settings in the Task Definition of the container you are attempting to stream logs.

Here is an example Task Definition for a single container.

{
            "essential": true,
            "image": "<your Docker image>",  # The Docker image of the app you want to run and collect the logs of
            "name": "<your app name>",
            "logConfiguration": {
                "logDriver":"awsfirelens",  # This time our logDriver is "awsfirelens" which will route the stdout from this container through our logrouter container, and up to Scalyr from there
                "options": {  # The options for this logDriver get translated into key value pairs in the fluentd output configuration, this is a minimal configuration
                    "@type":"scalyr",  # "@type" is always "scalyr" to let fluentd know to use the Scalyr output plugin
                    "api_write_token": "<your write API key>"  # A valid write API key for Scalyr
                }
            },
            "memory": 100
        }

You can logConfiguration options to configure various attributes for the container defined in the task definition.

Here is an example of adding a server_attribute with a serverHost and parser. Note that we are making parsing the JSON object into a string. You can read more about serverAttributes here.

"options": { "@type":"scalyr", "api_write_token": "<your write API key>" "server_attributes": "{\"serverHost\": \"fluentd_host\",\"parser\": \"myapp\" }" }

Listed below is a list of supported options.

DataSet specific options

compression_type - compress DataSet traffic to reduce network traffic. Options are `bz2` and `deflate`. See here for more details. This feature is optional.

api_write_token - your DataSet write logs token. See here for more details. This value must be specified.

server_attributes - a JSON hash containing custom server attributes you want to include with each log request. This value is optional and defaults to nil.

use_hostname_for_serverhost - if `true` then if `server_attributes` is nil or it does not include a field called `serverHost` then the plugin will add the `serverHost` field with the value set to the hostname that fluentd is running on. Defaults to `true`.

scalyr_server - the DataSet server to send API requests to. This value is optional and defaults to agent.scalyr.com/

ssl_ca_bundle_path - a path on your server pointing to a valid certificate bundle. This value is optional and defaults to nil, which means it will look for a valid certificate bundle on its own.

Note: if the certificate bundle does not contain a certificate chain that verifies the DataSet SSL certificate then all requests to DataSet will fail unless ssl_verify_peer is set to false. If you suspect logging to DataSet is failing due to an invalid certificate chain, you can grep through the Fluentd output for warnings that contain the message 'certificate verification failed'. The full text of such warnings will look something like this:

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:85:rescue in write: SSL certificate verification failed. Please make sure your certificate bundle is configured correctly and points to a valid file. You can configure this with the ssl_ca_bundle_path configuration option. The current value of ssl_ca_bundle_path is '/etc/ssl/certs/ca-bundle.crt'

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:87:rescue in write: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:88:rescue in write: Discarding buffer chunk without retrying or logging to <secondary>

The cURL project maintains CA certificate bundles automatically converted from mozilla.org here.

ssl_verify_peer - verify SSL certificates when sending requests to DataSet. This value is optional, and defaults to true.

ssl_verify_depth - the depth to use when verifying certificates. This value is optional, and defaults to 5.

message_field - DataSet expects all log events to have a 'message' field containing the contents of a log message. If your event has the log message stored in another field, you can specify the field name here, and the plugin will rename that field to 'message' before sending the data to DataSet. Note: this will override any existing 'message' field if the log record contains both a 'message' field and the field specified by this config option.

max_request_buffer - The maximum size in bytes of each request to send to DataSet. Defaults to 5,500,000 (5.5MB). Fluentd chunks that generate JSON requests larger than the max_request_buffer will be split in to multiple separate requests. Note: The maximum size the DataSet servers accept for this value is 6MB and requests containing data larger than this will be rejected.

force_message_encoding - Set a specific encoding for all your log messages (defaults to nil). If your log messages are not in UTF-8, this can cause problems when converting the message to JSON in order to send to the DataSet server. You can avoid these problems by setting an encoding for your log messages so they can be correctly converted.

replace_invalid_utf8 - If this value is true and force_message_encoding is set to 'UTF-8' then all invalid UTF-8 sequences in log messages will be replaced with <?>. Defaults to false. This flag has no effect if force_message_encoding is not set to 'UTF-8'.

Buffer options

retry_max_times - the maximum number of times to retry a failed post request before giving up. Defaults to 40.

retry_wait - the initial time to wait before retrying a failed request. Defaults to 5 seconds. Wait times will increase up to a maximum of retry_max_interval

retry_max_interval - the maximum time to wait between retrying failed requests. Defaults to 30 seconds. Note: This is not the total maximum time of all retry waits, but rather the maximum time to wait for a single retry.

flush_interval - how often to upload logs to DataSet. Defaults to 5 seconds.

flush_thread_count - the number of threads to use to upload logs. This is currently fixed to 1 will cause fluentd to fail with a ConfigError if set to anything greater.

chunk_limit_size - the maximum amount of log data to send to DataSet in a single request. Defaults to 2.5MB. Note: if you set this value too large, then DataSet may reject your requests. Requests smaller than 6 MB will typically be accepted by DataSet, but note that the 6 MB limit also includes the entire request body and all associated JSON keys and punctuation, which may be considerably larger than the raw log data. This value should be set lower than the `max_request_buffer` option.

queue_limit_length - the maximum number of chunks to buffer before dropping new log requests. Defaults to 1024. Combines with chunk_limit_size to give you the total amount of buffer to use in the event of request failures before dropping requests.

The DataSet output plugin has a number of sensible defaults so the minimum configuration only requires your DataSet 'write logs' token.

Introduction

Instructions

DataSet specific options

Buffer options

Related articles