Introduction
Our Engineering Team recently released a number of useful features and improvements. I highly recommend checking them out if you haven't already had a chance to do so. PowerQueries enable DataSet customers to extract valuable insights from their log data by combining our blazing fast search with the analytical capabilities of SQL. Best of all, PowerQueries can be used immediately. Simply upload your log data -- there are no prerequisites!
Feb., 2023
Improvements
Changes to or (||
) operator
a || b
now returns a
if a
evaluates to a "true-ish" value (i.e. not false, null, 0, or an empty string), otherwise it evaluates to b
.
Historically, a || b
evaluated to either true
or false
. The new behavior allows you to write expressions like x || 'default'
, which will yield x
unless x
is undefined, in which case it yields default
– similar to SQL's COALESCE operator.
For example,
tag=*
| let a = true
| let b = false
| let c = b || a
| let d = b || false
| let e = b || "default value"
| limit 1
Returns
Grouping functions now ignore missing or null
values
These functions include mean
, min
, max
, stddev
, and median
/ percentile functions. Previously, these functions would treat null as 0.
New Features
Grouping functions can specify a where
or not_null
condition
Only values which meet the condition will be used in the calculation. For example,
logfile='firewall' serverHost='syslog-01' !(Application in ('snmp','http-proxy','http-audio'))
| group Bytes_Received_Greater_Than_10000_Bytes_Only=sum(Bytes_Received where Bytes_Received > 10000) by Application
| sort -Bytes_Received_Greater_Than_10000_Bytes_Only
Only results whose Bytes_Received
attribute contains a value > 10000 bytes will be included in the results (consequently, some of the Application
rows to have 0 as a result):
Similarly, a function like mean(Bytes_Received not_null)
would ignore null values. For the mean
function, this is identical to the updated default behavior, however, it may come in handy in other contexts.
Transpose - limit
For instance, transpose host limit 3
will transpose on the host
field as usual, but will only show the top three host values (the three hosts whose graphs would reach the highest Y value). A negative limit, e.g. transpose host limit -3
, yields the lowest values instead of the highest values.
Transpose - by multiple values
The ability to transpose multiple values, e.g. create a transposed graph showing two or more metrics using the new on
option. For instance, transpose host on
timestamp
will treat all columns except host
and timestamp
as metrics, and will produce a breakdown of each of those columns. For example:
Destination_Zone=*
| group count() by timestamp=timebucket('10m'),Source_User,Protocol
Can be restructured as follows by adding:
...
| transpose Protocol on timestamp, Source_User
number(x)
and bool(x)
These functions convert a value to numeric form or to true/false (e.g. 0 -> false, 1 -> true), respectively. number()
is handy for converting values that were parsed as strings back to numbers for calculations. bool()
also allows emulating the old behavior of ||
, i.e. bool(x || y)
. For example:
tag=*
| let a = 0
| let b = 1
| let c = "2"
| let d = "3"
| let e = bool(a || b)
| let f = number(c) + number(d)
| let g = bool(a)
| let h = bool(b)
| limit 1
Comparison of dotted version strings
pad_version
can be used to compare dotted version strings, such as 3.19.5. It forces each component to have at least five digits: pad_version('3.19.5')
would return "00003.00019.00005". These strings can be compared using standard operators such as <
, >
, and the sort
command.
Sept., 2020
timebucket
The timebucket function enables you to convert the timestamp associated with a log into a generalized interval, such as day, hour, minute, etc. You can then use the timebucket to group log events by day.
("error") && $k8s-deployment == 'paymentservice' "Received error status code unknown" |
group count(1) by timestamp = timebucket("1d")
Returns
Note 1: This function is still pretty new and awaiting additional refinement. For now, it's limited to UTC time. In other words, if I look at the value for today that timebucket("1d") returns, it will be 2020-08-30 17:00 since my account is set to Pacific Standard Time (PST) and this is the equivalent of 2020-08-31 00:00 in UTC
Note 2: Without the timestamp assignment (above), timebucket will return the associated Unix epoch
transpose
Similar to the SQL function, transpose turns rows into columns. For example,
"error" && k8s-deployment = * |
group count(1) by k8s-deployment
Returns
With | transpose k8s-deployment
as the final line, this becomes
querystart / queryend / queryspan
The querystart
and queryend
functions enable the start and end time of a query to be extracted in various units (days, hours, seconds). Default is nanoseconds if no unit is provided. I found this to be extremely useful when generating DataSet search URLs from the results of a PowerQuery. Similarly, queryspan
is used to return the length of the query timerange.
Please note that the query timeframe is defined via the UI's date picker
Comments
0 comments
Please sign in to leave a comment.