Introduction
DataSet provides two types of RESTful APIs for log ingestion: addEvents and uploadLogs. While both APIs are used to upload logs, they have characteristics that optimize them for particular use cases.
This article contains a summary of each API.
uploadLogs
- Easy to use
- Good for raw text ingestion
- Not to be used for intensive data ingestion (high average log volume or high-frequency ingestion)
uploadLogs
is well suited to simple integrations and requires a minimal set of parameters to be supplied. Each call to uploadLogs
sends a single log event to DataSet. Attributes are extracted from the log event upon ingestion by the specified parser
. As such, it is important to use a well structured log format and extract the timestamp
parameter (via the parser) to ensure that chronological order between events is maintained.
For instance, the quick-start app is implemented using uploadLogs
API which reads the content of the file at once and makes a single request to bulk upload all messages to DataSet.
with open(file_path, "r") as f:
content = f.read()
uploadLog_url = 'https://{scalyr_url}/api/uploadLogs?token={token}&host={host}&logfile={logfile}&parser={parser}' \
.format(scalyr_url=scalyr_url, token=api_key, host='support.scalyr.com', logfile=raw_filename, parser=parser)
logging.info(uploadLog_url)
resp = requests.post(uploadLog_url, data=content.encode('utf-8'), headers={'content-type': 'text/plain'})
Note that the max size for a single uploadLogs
API request is around 5MB, so if the data exceeds that limit, break it into chunks beforehand to ensure that requests are completed successfully.
Summary
The uploadLogs
API call is used to send complete log events within the message
field, with no modification to the underlying attributes. A parser is specified to extract attributes as needed, and batching of multiple log events is supported.
addEvents
addEvents
has several distinct advantages, however, this comes at the expense of increased complexity:
- Offers granular control over attributes and contents of the
message
field.addEvents
enables your application to define attributes. Rather than sending the entire log event and relying on a parser to extract fields, your application can directly create DataSet attributes.- Advantage: This approach may reduce your log volume, since only essential fields are transferred.
- Disadvantage: Free-text search capabilities (ex. searching for "value0" wherever it occurs) may not work if you're sending an abbreviated
message
field. If this is the case, you will need to specify the attribute to search within when you are looking for a particular value (ex.attribute contains "value1"
). - Note: It is still possible to parse logs that are sent via
addEvents
, however the field(s) you wish to parse must be part of themessage
field. This is because parsers can only be applied to themessage
field. Additionally, you will need to specify theparser
attribute in theattrs
field.
- Can be optimized for higher log volume
addEvents
is intended to send multiple log events at once. This reduces the frequency of API calls, as log events can be grouped and sent at fixed intervals (rather than 1 call per line).- However, there are several important considerations to note (see documentation).
Summary
The addEvents
API call enables users to have full control over the attributes that are sent with each log event. It supports the batching of multiple log events for efficiency.
Important Considerations
Fault Tolerance
It is essential to consider implementing logic to prevent the loss of logs when using uploadLogs
or addEvents
. For example, is it essential for your application to pick up where it left off when:
- It terminates unexpectedly or is restarted
- The network connection it relies upon disconnects for an arbitrary amount of time
- The log it is reading from is rotated
The open source Scalyr Agent was designed to handle these use cases and more. It is a good example for those who wish to create their own solution. You can review the code at our GitHub repository.
Pub/Sub
The rule of thumb for both uploadLogs
and addEvents
APIs is to group multiple log events within a single request, then send requests once every few seconds. DataSet is built on top of an extremely scalable and robust backend, so we have a stronger tolerance against malicious attacks relative to other platforms. To protect our servers from excessive bursts of traffic, we rate limit API calls from individual accounts. Consequently, requests which exceed the above limit could be dropped.
The log retrieval functions provided by Google Cloud prevent logs from being batched within the DataSet addEvents
API. Please note that it is prohibited to use the uploadLogs
or addEvents
APIs to send one log event at a time from your Google Cloud application, as this is will have detrimental effects on your account's performance. Furthermore, issuing thousands of API requests over the course of several seconds is considered a misuse of our API.
Instead, we recommend ingesting Google Cloud logs with our pub/sub monitor, as it is capable of grouping thousands of Google Cloud logs before making an ingestion request to DataSet.
Comments
0 comments
Please sign in to leave a comment.