How DataSet Log Volume is Calculated

March 17, 2023 22:19
Updated

Introduction

The log volume ingested by DataSet is the basis of billing, so it's important for customers to understand how it is calculated. The items below were written to facilitate the processes of streamlining logs that are output by your applications / platform, and optimizing the Agent (or other) upload mechanisms.

Counted Toward Log Volume

Raw log messages are counted
Attribute values that originate in the Agent or at the DataSet API are counted.
- Attribute name lengths aren't counted, but each attribute costs 1 byte plus the attribute value length

In theory, a log event with no attributes would cost 1 byte

Not Counted

Metalog events are not counted
sessionInfo/serverInfo fields and server attributes (app, launchTime, severity, session, machine, serverType, serverScope, sessionType, serverHost) are not counted
The logfile attribute, as well as internal K8s attributes (containerName, containerId, pod_name, pod_namespace, namespace, pod_uid, k8s_container_name, original_file, scalyr-category, k8s_node, container_id) are not counted
There is no separate charge for parsed attributes from the message attribute. You are only billed once for the complete log event.
- Note: Parsers can only be applied to the original log event, which is contained in the message attribute
- Hence, attributes that are created by the parser are not counted.

Comments

1 comment

Nikita Nefedov

April 29, 2022 13:53
Hi Mark!

A bit unclear on attribute costs.

Attribute name lengths aren't counted, but each attribute costs 1 byte

Was this meant to be "each attribute costs at least 1 byte?"

Attributes that are created by the parser are not counted

Cool so if from the log event that looks like this:
```
[29/Apr/2022:13:49:15 +0000] "POST /auth/api/15/feature-flags/context HTTP/1.1" 200 200 "-"
```
We parse out the following part into the attribute called `msg`
```
"POST /auth/api/15/feature-flags/context HTTP/1.1" 200
```
We will not pay extra for `msg`? (apart maybe for the `index:length` that you will have to store for each event to seek into the raw log to get the `msg` part)

And yeah another question was meant to be - do I understand correctly that the way you're storing attribute values is basically `index:length` pairs with which you can find the attribute's value in the raw message, and you only retrieve that value when needed and never store it separately? Do you then also have to index the attribute for search? I'd imagine indexing taking some space as well - if it does can we opt out of indexing for some attributes?
0

Please sign in to leave a comment.

Introduction

Counted Toward Log Volume

Not Counted

Related articles