Introduction
Due to the open-ended nature of PowerQueries, there are many ways to accomplish a single task. I received some helpful input from our Engineering team regarding the best practices for creating PowerQueries and wanted to take this opportunity to provide them to our customers. I will be adding more to this section over time, so please feel free to check in later.
Example Query
This article will be referencing the following simple query that I (admittedly) wrote:
(($euidalert == *)) && $logfile == "/var/log/argh.log" |
filter ($status==500) |
group c=count() by euidAlert=lower($euidalert) |
filter ($total >= 2) |
sort -c
Query Structure
The $ Prefix
The $ prefix (sigil) does not need to be used and can be omitted from PowerQueries
Implied AND (&&)
The double ampersand (&&) is implied and not necessary in the first line
Line 1
The first line of a PowerQuery is typically a search query — whenever possible, include as much detail here to improve relevance. Combining this criteria into the first line (as opposed to using successive filter
statements) improves the performance of the overall query.
For example, lines 1-2 of the original query can be simplified:
(($euidalert == *)) && $logfile == "/var/log/argh.log" |
filter ($status==500) |
...
// Becomes
euidalert=* logfile == "/var/log/argh.log" status=500
...
Pipe Characters (|)
We recommend prefixing each line with the pipe, as this makes the PowerQuery more readable. It's also easier to distinguish between new lines and continued lines when this is followed
The Final Query
euidalert=* logfile == "/var/log/argh.log" status=500
| group c=count() by euidAlert=lower(euidalert)
| filter total >= 2
| sort -c
By following these steps, our original query was reduced from 160 to 131 characters (nearly 20%) while becoming significantly more legible.
Conclusion
As you can see from the final example, our PowerQuery retained its original functionality with fewer characters and improved legibility. Creating a query with less characters is particularly advantageous when the query is complex and the DataSet API is used to issue it.
Since PowerQueries are relatively similar to SQL, I tend to follow SQL best practices when implementing them, such as:
- Minimize aggregation and avoid using passthroughs. For example, if you're doing a GROUP at an early stage of the query and are carrying a column through multiple GROUP statements, consider calculating this column by aggregation at a later stage (of course, this depends on the data structure and what you're trying to do).
- Avoid sorting until the final level of a query — sorting before then has no benefit and typically isn't needed unless an aggregate function like first / last / newest / oldest requires it.
- Use JOIN and UNION statements sparingly
- Assign shorter / more relevant column names as soon as possible (this is normally done in the SELECT statement), as this reduces the hassle of grouping columns with special characters in their name
- Only extract and process columns that are vital to the final solution
Although PowerQueries are fast, these tips ensure that any query you write will be user friendly and optimized.
Comments
0 comments
Please sign in to leave a comment.