Best Practices for Querying DataSet via the API – DataSet Customer Portal

DataSet makes a number of API endpoints available for querying data, most notably, query for retrieving log lines, timeseriesQuery for retrieving numeric or graph data, and powerQuery for issuing PowerQueries. Power users of these APIs may run into limits which prevent queries from completing successfully. Here are some some considerations to keep in mind.

Rate Limiting

Each query uses some amount CPU. Particularly with the query and powerQuery API's, complex queries or queries over long time periods can require extensive CPU. If you are experiencing API errors due to rate limiting, search for tag='audit' cpuUsage=* in your DataSet account to view an audit trail of your CPU usage, and identify queries which are consuming the most CPU. DataSet uses a "leaky bucket" model to limit CPU consumption - more detail on this is available in our documentation.

Timeouts

DataSet query API's have a hard timeout that prevent queries from running indefinitely and consuming an inordinate amount of resources. With a timeseriesQuery that times out, we may still be building the timeseries. Try again in 20 minutes or so, at which point the search will likely be returned successfully.

For query and powerQuery, consider reducing the time span, for example, from 7 days down to 1 day. You can make multiple API calls programmatically, and combine the results. This is analogous to what the DataSet UI does, in order to execute queries over long time periods (those searches where you see a % complete progress bar).

If you are using the Java client library, this functionality is now available starting with version 6.0.23. To take advantage of this, do the following:

Add a new parameter when calling the QueryService constructor, specifying the number of hours for each query. We suggest 12 hours: new QueryService("...your API token...", 12).
When using this constructor variant, for now, only the logQuery() and numericQuery() APIs are supported. If you need other APIs, you'll need to use a separate instance of QueryService where you use the old one-argument constructor.
For numericQuery, the NumericQueryResult.values list will contain separate values for each 12-hour query. If you passed function="count" and buckets=1 (to count the number of matching records), then you'll want to simply sum the values to get the total record count.

We will be enhancing the DataSet API to provide similar chunked-search functionality in the future.

Rate Limiting

Timeouts

Related articles