Introduction
This article will go through a series of real-world scenarios showing different examples of practical applications of PowerQuery array functions.
Task: We want to get the top 5 destination IP addresses for each of the source client IP addresses
Array functions used:
array_agg(columnName)
This is a grouping operator (for use in a group statement) that creates an array containing all non-null values that are grouped into a particular row.
Note that it collects all non-null values, not just distinct/unique values. The order in which the values are collected is undefined.
dst.ip.address=*
| group total = count() by src.ip.address, dst.ip.address
| sort -total
| group dst_ip_addrs = array_agg(dst.ip.address, 5), count_array = array_agg(format("%d", total), 5) by src.ip.address
| filter net_private(src.ip.address)
Here we count the number of destination IP addresses, grouping by source/client IP address and getting the top 5 of the contacted destination IP addresses as the resulting array and the number of counts in another array accordingly.
Task: We want to get all command line parameters that were used when the process was started
Array functions used:
extract_matches(string, regex)
Returns an array of all regex matches in a string. Matches are not case-sensitive unless _matchcase is used.
If the regular expression has capturing groups, the return is the content of the first capturing group. Returns null if there are no matches.
src.process.cmdline=* endpoint.os='windows'
|let parameters = extract_matches(src.process.cmdline,'\-+[a-z]+[\\s|=|:][\s}{"\-a-z0-9]+')
| filter len(parameters) >1
We extract command line parameters into an array by matching regex we defined as the second argument of the extract_matches function.
Task: We want to get lists of unique IPv4 and IPv6 addresses from the event grouped by location
Array functions used:
array_agg_distinct(columnName)
Same as array_agg(columnName), but keeps unique values
array_filter(array, func(x))
Return a new array, consisting of those entries in the input array for which the function returns true
logfile='eventhub' callerIpAddress=*
|group ips = array_agg_distinct(callerIpAddress ) by properties.location
| filter len(ips) > 1
| let ipv4_array = array_filter(ips, func(x) -> net_ipv4(x))
| let ipv6_array = array_filter(ips, func(x) -> net_ipv6(x))
| filter len(ipv4_array) > 1
First, we are getting array with unique values of IP addresses grouped by location by using array_agg_distinct function, then we filter this array by using array_filter function with func net_ipv4 and func_ipv6 which return true if this is IPv4 or IPv6 address accordingly.
Task: We want to get the first role from the list of the roles grouped by IP address
Example: We have a list of the roles in the event and we want to take only the first role from each of the events and then count them against the IP address
Array functions used:
array_split(columnName, pattern)
Finds all matches for pattern (a regular expression) in string, and uses these as delimiters to split the string into an array.
array_get(array, index)
Return the value at the specified (zero-based) index in the array. A negative value is interpreted relative to the end of the array, e.g. -1 refers to the last entry in the array. If the index is past either end of the array, null is returned.
logfile='eventhub' callerIpAddress=* properties.roles=*
|columns callerIpAddress, properties.roles
|parse "$roles$" from properties.roles
|let roles_arr= array_split(roles, ' ')
|let firstrole = array_get(roles_arr, 0)
|group count() by firstrole,callerIpAddress
With array_split function, we are forming the array from the string field using space as delimiter.
Then using array_get function we fetch the first member of the array accessing it by index 0 and grouping it by callerIpAddress column
Task: We want to get a member of a nested JSON object from an attribute, which consists of the JSON objects
Example: We have an attribute that contains JSON objects, in this case, it is AWS CloudTrail logs, where want to extract "accountId" field
Array functions used:
array_from_json
Parses a string containing JSON object into an array
array_get(array, index)
Return the value at the specified (zero-based) index in the array. A negative value is interpreted relative to the end of the array, e.g. -1 refers to the last entry in the array. If the index is past either end of the array, null is returned.
resources = *
| columns resources , first_json = array_get(array_from_json(resources), 0)
| let accountId = json_object_value(first_json, 'accountId')
| group total = count() by accountId
| sort -total
With array_from_json function we transform the string that contains JSON objects into an array of JSON objects and then with array_get we take only the first JSON object. Using json_object_value function we extract "accountId" field from the JSON object.
Comments
0 comments
Please sign in to leave a comment.