Introduction
The DataSet Kafka Connector supports sending custom application messages to DataSet. The mapping of application message fields to DataSet event attributes must be specified in the custom_app_event_mapping
section of the DataSet connector config. This is optional but useful if you want to make full use of the DataSet platform.
Mapping JSON data to DataSet
Edit the sink config
cd $KAFKA_SCALYR_SINK_CONFIG
vim connect-scalyr-sink-custom-app.json
{
"name": "scalyr-sink-connector",
"config": {
"connector.class": "com.scalyr.integrations.kafka.ScalyrSinkConnector",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":"false",
"tasks.max": "3", //should match tasks max for topic.
"topics": "logs",
"api_key": "<SCALYR LOG WRITE API TOKEN>",
"event_enrichment": "tag=kafka",
"custom_app_event_mapping":"[{\"matcher\": {\"attribute\": \"app.name\", \"value\": \"myapp\"}, \"eventMapping\": {\"message\": \"message\", \"logfile\": \"log.path\", \"source\": \"host.hostname\", \"parser\": \"fields.parser\", \"version\": \"app.version\", \"appField1\":\"appField1\", \"appField2\":\"nested.appField2\"}}]"
}
}
Modify or add a custom_app_event_mapping section
[{
"matcher": {
"attribute": "app.name",
"value": "myapp"
},
"eventMapping": {
"message": "message",
"logfile": "log.file.path",
"serverHost": "host.hostname",
"parser": "fields.parser",
"version": "app.version",
"appField1", "appField1",
"appField2", "nested.appField2"
},
"delimiter":"\\."
}]
Map fields to their proper key
Please refer to the table below. You can flatten it using the dotted naming convention.
For example, to map log.file.path
to the Dataset reserved attribute of logfile
, set "logfile":"log.file.path"
(also see below)
{
"@timestamp": "2020-02-15T01:59:18.429Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.6.0",
"pipeline": "filebeat-7.6.0-system-syslog-pipeline"
},
"message": "Feb 14 17:53:46 user com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.pid.WebContent.22691): Path not allowed in target domain: type = pid, path = /System/Library/StagedFrameworks/Safari/SafariShared.framework/Versions/A/XPCServices/com.apple.Safari.SearchHelper.xpc/Contents/MacOS/com.apple.Safari.SearchHelper error = 147: The specified service did not ship in the requestor's bundle, origin = /System/Library/StagedFrameworks/Safari/WebKit.framework/Versions/A/XPCServices/com.apple.WebKit.WebContent.xpc",
"input": {
"type": "log"
},
"event": {
"module": "system",
"dataset": "system.syslog",
"timezone": "-08:00"
},
"host": {
"hostname": "Test-Host",
"architecture": "x86_64",
"os": {
"platform": "darwin",
"version": "10.14.6",
"family": "darwin",
"name": "Mac OS X",
"kernel": "18.7.0",
"build": "18G95"
},
"id": "4C372C34-DFFD-5B38-B575-DCF17623AD29",
"name": "Test-Host"
},
"log": {
"file": {
"path": "/var/log/system.log"
},
"offset": 176072
},
"fileset": {
"name": "syslog"
},
"service": {
"type": "system"
},
"ecs": {
"version": "1.4.0"
},
"agent": {
"hostname": "Test-Host",
"id": "7714b3df-79af-4c7a-8c47-d45b134bbd24",
"version": "7.6.0",
"type": "filebeat",
"ephemeral_id": "a65e0775-fb06-409f-a366-c04a2157603c"
}
}
Flatten the custom_app_event_mapping
section
Be sure to escape double quotes
"custom_app_event_mapping":"[{\"matcher\": {\"attribute\": \"app.name\", \"value\": \"myapp\"}, \"eventMapping\": {\"message\": \"message\", \"logfile\": \"log.path\", \"source\": \"host.hostname\", \"parser\": \"fields.parser\", \"version\": \"app.version\", \"appField1\":\"appField1\", \"appField2\":\"nested.appField2\"}}]"
Start the connector
curl POST -H "Content-Type: application/json" -d @connect-scalyr-sink-custom-app-nginx.json
Appendix A
If you are configuring a custom connector, the following table explains how to map fields to DataSet
Field |
Requirement | Description |
message |
required | This is the log message body that will be interpreted by the parser and allow for free text search. |
serverHost | recommended | This will allow for the proper display of the home page dashboard as well be used to report on log volume. |
logfile | recommended | This will be used to report on log volume and is an attribute in the home page dashboard and the logvolume dashboard |
timestamp | optionalre | This should be in the message or in a field. Can be extracted by a parser later. This is important unless you can tolerate the ingestion time being used as the timestamp (if no suitable timestamp is found). This will keep your logs in order and ensure higher accuracy in searches, since the timestamp assigned by your platform will remain associated with the log in DataSet. |
severity | optional | Will be converted to an int with a range of 1 - 6 and can be used to easily search for errors |
parser | recommended | The value of this field will create a parser with that name if one is not already there. It is important to use one parser per logical data grouping. For example firewall logs and HTTP access logs would require two separate parsers to be created. You can specify prebuilt parsers from this list, but creating your own will impart more functionality. |
serverIP | optional | Optional field will show up on the home page. |
An application may have nested fields. DataSet events support a flat key/value structure. Nested fields are specified in the format: field1.field2.field3, where the nested fields are separated by a delimiter. By default, the delimiter is .
Comments
0 comments
Please sign in to leave a comment.