Our Engineering Team recently released a number of useful features and improvements. I highly recommend checking them out if you haven't already had a chance to do so. PowerQueries enable DataSet customers to extract valuable insights from their log data by combining our blazing fast search with the analytical capabilities of SQL. Best of all, PowerQueries can be used immediately. Simply upload your log data -- there are no prerequisites!
Changes to or (
a || b now returns
a evaluates to a "true-ish" value (i.e. not false, null, 0, or an empty string), otherwise it evaluates to
a || b evaluated to either
false. The new behavior allows you to write expressions like
x || 'default', which will yield
x is undefined, in which case it yields
default – similar to SQL's COALESCE operator.
| let a = true
| let b = false
| let c = b || a
| let d = b || false
| let e = b || "default value"
| limit 1
Grouping functions now ignore missing or
These functions include
median / percentile functions. Previously, these functions would treat null as 0.
Grouping functions can specify a
Only values which meet the condition will be used in the calculation. For example,
logfile='firewall' serverHost='syslog-01' !(Application in ('snmp','http-proxy','http-audio'))
| group Bytes_Received_Greater_Than_10000_Bytes_Only=sum(Bytes_Received where Bytes_Received > 10000) by Application
| sort -Bytes_Received_Greater_Than_10000_Bytes_Only
Only results whose
Bytes_Received attribute contains a value > 10000 bytes will be included in the results (consequently, some of the
Application rows to have 0 as a result):
Similarly, a function like
mean(Bytes_Received not_null) would ignore null values. For the
mean function, this is identical to the updated default behavior, however, it may come in handy in other contexts.
Transpose - limit
transpose host limit 3 will transpose on the
host field as usual, but will only show the top three host values (the three hosts whose graphs would reach the highest Y value). A negative limit, e.g.
transpose host limit -3, yields the lowest values instead of the highest values.
Transpose - by multiple values
The ability to transpose multiple values, e.g. create a transposed graph showing two or more metrics using the new
on option. For instance,
transpose host on
timestamp will treat all columns except
timestamp as metrics, and will produce a breakdown of each of those columns. For example:
| group count() by timestamp=timebucket('10m'),Source_User,Protocol
Can be restructured as follows by adding:
| transpose Protocol on timestamp, Source_User
These functions convert a value to numeric form or to true/false (e.g. 0 -> false, 1 -> true), respectively.
number() is handy for converting values that were parsed as strings back to numbers for calculations.
bool() also allows emulating the old behavior of
bool(x || y). For example:
| let a = 0
| let b = 1
| let c = "2"
| let d = "3"
| let e = bool(a || b)
| let f = number(c) + number(d)
| let g = bool(a)
| let h = bool(b)
| limit 1
Comparison of dotted version strings
pad_version can be used to compare dotted version strings, such as 3.19.5. It forces each component to have at least five digits:
pad_version('3.19.5') would return "00003.00019.00005". These strings can be compared using standard operators such as
>, and the
The timebucket function enables you to convert the timestamp associated with a log into a generalized interval, such as day, hour, minute, etc. You can then use the timebucket to group log events by day.
("error") && $k8s-deployment == 'paymentservice' "Received error status code unknown" |
group count(1) by timestamp = timebucket("1d")
Note 1: This function is still pretty new and awaiting additional refinement. For now, it's limited to UTC time. In other words, if I look at the value for today that timebucket("1d") returns, it will be 2020-08-30 17:00 since my account is set to Pacific Standard Time (PST) and this is the equivalent of 2020-08-31 00:00 in UTC
Note 2: Without the timestamp assignment (above), timebucket will return the associated Unix epoch
Similar to the SQL function, transpose turns rows into columns. For example,
"error" && k8s-deployment = * |
group count(1) by k8s-deployment
| transpose k8s-deployment as the final line, this becomes
querystart / queryend / queryspan
queryend functions enable the start and end time of a query to be extracted in various units (days, hours, seconds). Default is nanoseconds if no unit is provided. I found this to be extremely useful when generating DataSet search URLs from the results of a PowerQuery. Similarly,
queryspan is used to return the length of the query timerange.
Please note that the query timeframe is defined via the UI's date picker