Translate SQL to PowerQueries – DataSet Customer Portal

SQL, Structured Query Language, dates back to the 1970s and has been an enduring technology that has organized data and transactions from small to large businesses. To improve accessibility to our customer’s data, DataSet offers "powerqueries" that our customers can take advantage of previous training and knowledge around the SQL language. While powerqueries adhere to the SQL standards, they have their differences because our database is different. This article covers the similarities and differences between the two query languages.

The Search Filter Creates a Table

Log data by itself isn't a structured format, however, with the help of a parser, individual fields can be extracted from the message and formed abstractions of DB's columns. In powerqueries, everything except the join command starts with a regular search statement, so the search is not only a data filter but returns a conceptual table with parsed fields as columns.

As for rows of the table, the number of returning rows is determined by the search filter as well as the time range of the query. In SQL, the time range query is only applicable if the table has a timestamp column. For instance, you can run the following SQL query to get all data from the table sales in the last hour.

mysql> select * 
       from sales 
       where timestamp > now() - interval 1 hour;

For this query to work, it requires the table to have a "timestamp" column with a "datetime" type. In comparison to DataSet, the timestamp is a mandatory field of all data. It enables any powerqueries to scan against the selected time range without additional condition statements.

Screen_Shot_2022-11-02_at_4.58.52_PM.png

Convert SQL to PowerQuery

Now, we have the foundation of table composition in powerqueries, we're ready to review the actual syntax.

Let's use the same query on the DataSet PowerQuery page as an example. The example is to "determine the rate of 404s by URL path, sorted by the worst offenders". The assumption is that the data source is an access log that includes URL paths, status codes, etc. Let's take a closer look at how to get the answer via powerqueries v.s. SQL.

SQL

SELECT COUNT(*) as requests, COUNT(*) FILTER (WHERE status = 404) AS errors, ROUND (COUNT(*) FILTER (WHERE status = 404)/ COUNT(*), 3) as rate, uriPath 
FROM accesslog
GROUP BY uriPath 
HAVING ROUND (COUNT(*) FILTER (WHERE status = 404)/ COUNT(*), 3) > 0.01 
ORDER BY rate DESC;

PowerQuery

dataset = "accesslog"
| group requests = count(), errors = count(status == 404) by uriPath
| let rate = errors / requests
| filter rate > 0.01
| sort -rate

You may find the two statements are quite similar, but PowerQuery is actually easier to read due to its intuitive syntax. For instance, it's much quicker to define new variables such as error and rate in this case. Furthermore, those newly defined variables are immediately available for further filtering due to the flexibilities and dynamic characteristics of powerqueries.

Advanced SQL features that PowerQuery Supports

Besides the common statements such as select, where/having, group by, or order by, powerqueries has a number of advanced commands that match with SQL language:

Ternary

Powerqueries ternary function starts with the let command. You can find examples in this KB article. It behaves exactly the same as the CASE statement or IF function in SQL.

SELECT uriPath, (CASE WHEN status != 200 THEN 'error' ELSE 'ok' END) AS status 
FROM accesslog

Set the status to "ok" if the status code equals 200. Any other values will be an "error'.

Parse

parse is a powerqueries command that extracts a specific string from the field or the entire message. The closest equivalent in SQL is regexp_match() or substring() function.

SELECT substring(uriPath from '.+/([\w]+$)') as path_suffix, uriPath from dataset_table;

Get the suffix of the "uriPath" and assign it to the column "path_suffix"

Lookup

lookup command enables powerqueries users to reference values from a pre-existing table in a DataSet configuration file (i.e. datatables). A similar operation in SQL is to import a table that was previously exported via the "dump" command.

mysql -u username -p -D database_name < tableName.sql

To import a single table into an existing MySQL database

Join

Powerqueries supports both join and left join statements. One slight difference if you're trying to compare it with SQL is that in a PowerQuery command like | join A, B on X, each row in A results in at most one output row. By default, DataSet treatsjoin as a sort of a "lookup" command, where we look for records inBthat can add fields toA.

In contrast, if a record in A matches three records in B, SQL will yield three records in the output, combining the same A row with each of the three B rows.

Here is an example output of a join command in SQL:

select * from x
   a    b
0  1  foo
1  2  bar
2  2  foo


select * from y
   a       c
0  1    spam
1  1     ham
2  2    eggs
3  2   bacon
4  2  cheese

select a,b,c from x join y using (a)
   a    b       c
0  1  foo     ham
1  1  foo    spam
2  2  bar   bacon
3  2  bar  cheese
4  2  bar    eggs
5  2  foo   bacon
6  2  foo  cheese
7  2  foo    eggs

In powerqueries, the join command would output just one row per row in the first child query:

select a,b,c from x join y using (a)
   a    b       c
0  1  foo     ham
1  2  bar   bacon
2  2  foo   bacon

To get the same result as SQL, the user could use the special command sql inner join as a workaround.

The Search Filter Creates a Table

Convert SQL to PowerQuery

Advanced SQL features that PowerQuery Supports

Related articles