SQL, Structured Query Language, dates back to the 1970s and has been an enduring technology that has organized data and transactions from small to large businesses. To improve accessibility to our customer’s data, DataSet offers "powerqueries" that our customers can take advantage of previous training and knowledge around the SQL language. While powerqueries adhere to the SQL standards, they have their differences because our database is different. This article covers the similarities and differences between the two query languages.
The Search Filter Creates a Table
Log data by itself isn't a structured format, however, with the help of a parser, individual fields can be extracted from the message and formed abstractions of DB's columns. In powerqueries, everything except the
join command starts with a regular search statement, so the search is not only a data filter but returns a conceptual table with parsed fields as columns.
As for rows of the table, the number of returning rows is determined by the search filter as well as the time range of the query. In SQL, the time range query is only applicable if the table has a timestamp column. For instance, you can run the following SQL query to get all data from the table sales in the last hour.
mysql> select * from sales where timestamp > now() - interval 1 hour;
For this query to work, it requires the table to have a "timestamp" column with a "datetime" type. In comparison to DataSet, the timestamp is a mandatory field of all data. It enables any powerqueries to scan against the selected time range without additional condition statements.
Convert SQL to PowerQuery
Now, we have the foundation of table composition in powerqueries, we're ready to review the actual syntax.
Let's use the same query on the DataSet PowerQuery page as an example. The example is to "determine the rate of 404s by URL path, sorted by the worst offenders". The assumption is that the data source is an access log that includes URL paths, status codes, etc. Let's take a closer look at how to get the answer via powerqueries v.s. SQL.
SELECT COUNT(*) as requests, COUNT(*) FILTER (WHERE status = 404) AS errors, ROUND (COUNT(*) FILTER (WHERE status = 404)/ COUNT(*), 3) as rate, uriPath
GROUP BY uriPath
HAVING ROUND (COUNT(*) FILTER (WHERE status = 404)/ COUNT(*), 3) > 0.01
ORDER BY rate DESC;
dataset = "accesslog"
| group requests = count(), errors = count(status == 404) by uriPath
| let rate = errors / requests
| filter rate > 0.01
| sort -rate
You may find the two statements are quite similar, but PowerQuery is actually easier to read due to its intuitive syntax. For instance, it's much quicker to define new variables such as error and rate in this case. Furthermore, those newly defined variables are immediately available for further filtering due to the flexibilities and dynamic characteristics of powerqueries.
Advanced SQL features that PowerQuery Supports
Besides the common statements such as
group by, or
by, powerqueries has a number of advanced commands that match with SQL language:
Powerqueries ternary function starts with the
let command. You can find examples in this KB article. It behaves exactly the same as the
CASE statement or
IF function in SQL.
SELECT uriPath, (CASE WHEN status != 200 THEN 'error' ELSE 'ok' END) AS status
Set the status to "ok" if the status code equals 200. Any other values will be an "error'.
parse is a powerqueries command that extracts a specific string from the field or the entire message. The closest equivalent in SQL is
SELECT substring(uriPath from '.+/([\w]+$)') as path_suffix, uriPath from dataset_table;
Get the suffix of the "uriPath" and assign it to the column "path_suffix"
lookup command enables powerqueries users to reference values from a pre-existing table in a DataSet configuration file (i.e. datatables). A similar operation in SQL is to import a table that was previously exported via the "dump" command.
mysql -u username -p -D database_name < tableName.sql
To import a single table into an existing MySQL database
Powerqueries supports both
left join statements. One slight difference if you're trying to compare it with SQL is that in a PowerQuery command like
| join A, B on X, each row in A results in at most one output row. By default, DataSet treats
join as a sort of a "lookup" command, where we look for records in
Bthat can add fields to
In contrast, if a record in A matches three records in B, SQL will yield three records in the output, combining the same A row with each of the three B rows.
joincommand in SQL:
select * from x a b 0 1 foo 1 2 bar 2 2 foo select * from y a c 0 1 spam 1 1 ham 2 2 eggs 3 2 bacon 4 2 cheese select a,b,c from x join y using (a) a b c 0 1 foo ham 1 1 foo spam 2 2 bar bacon 3 2 bar cheese 4 2 bar eggs 5 2 foo bacon 6 2 foo cheese 7 2 foo eggs
In powerqueries, the
join command would output just one row per row in the first child query:
select a,b,c from x join y using (a) a b c 0 1 foo ham 1 2 bar bacon 2 2 foo bacon
To get the same result as SQL, the user could use the special command
sql inner join as a workaround.