Introduction
A customer inquired about identifying logs that were recently modified, but not during the current day. Although calculating the data written to a particular log file is straightforward, identifying files that weren't written to is a bit trickier, since the logVolume
metric is not reported for inactive log files.
This query is useful for anyone interested in the log activity of a particular server, and could be used to identify logs for use in alerts or search queries.
Assumptions
- This query should be run over multiple days (we used 3 days in our example), as it identifies logs that were modified during this period
- Today is 2020-09-04 and we are in Pacific Standard Time (PST).
- The value for today that
timebucket("1d")
returns is 2020-09-03 17:00 (PST) - This is the equivalent of 2020-09-04 00:00 (UTC)
- As previously mentioned, a constraint of the
timebucket
function is its use of the UTC timezone (this will likely be addressed in a future update).
- The value for today that
Query
The final query is provided here for convenience; however, I will break it down by segment and include some tips and explanations that may assist readers with adapting it for their particular needs
tag='logVolume' metric='logBytes' !(forlogfile=='none') |
group log_mb=sum(value) / 1024 / 1024 by ts=timebucket("1d"), host, forlogfile |
columns ts, host, forlogfile, ts2=timebucket(queryend(),"1d") |
let endDate=(ts=ts2) |
sort ts, host, forlogfile |
group tally=count(forlogfile), tallyEndDate=count(endDate), lastActive=last(ts) by host, forlogfile |
filter tallyEndDate=0 |
columns timestamp=lastActive, host, forlogfile, tally, tallyEndDate
Lines 1-5
1: We are evaluating the logBytes
metric for a particular log file
2: Total the logBytes
and convert from bytes to megabytes. Group by day (as an epoch value), host, and logfile. This aggregation condenses the log events down to an essential list.
3: columns
only returns specified columns, similar to a SELECT
statement.
3: ts2=timebucket(queryend(),"1d")
takes the end of the query (as specified in the PowerQuery UI) and converts it to a day and assigns it to each row to identify whether the row was modified at this time.
4: let endDate=(ts=ts2)
generates a boolean (either 0 or 1) based on whether the file was modified on the PowerQuery's end date
5: sort
is needed because the last
function is used in an upcoming group by
statement. Otherwise, you wouldn't want to sort the data returned by the query at this point.
For example,
Lines 6-8
6: We're performing another group by
to further distill the log file statistics. The individual days when the log file was modified is no longer of concern, as we're interested in whether the log file was modified today. We use the sum
function to achieve this (although the largest value a given log file from a server could have is 1 , due to the group by
on line 2).
6: The last
function is used to carry the last date when the log file / server combination was modified
7: filter
restricts the results to log files that were not modified on the day of the PowerQuery's endDate
8: Setting the timestamp=
causes the the epoch to be displayed as a timestamp
Comments
0 comments
Please sign in to leave a comment.