DataSet makes a number of API endpoints available for querying data, most notably,
query for retrieving log lines,
timeseriesQuery for retrieving numeric or graph data, and
powerQuery for issuing PowerQueries. Power users of these APIs may run into limits which prevent queries from completing successfully. Here are some some considerations to keep in mind.
Each query uses some amount CPU. Particularly with the
powerQuery API's, complex queries or queries over long time periods can require extensive CPU. If you are experiencing API errors due to rate limiting, search for
tag='audit' cpuUsage=* in your DataSet account to view an audit trail of your CPU usage, and identify queries which are consuming the most CPU. DataSet uses a "leaky bucket" model to limit CPU consumption - more detail on this is available in our documentation.
DataSet query API's have a hard timeout that prevent queries from running indefinitely and consuming an inordinate amount of resources. With a
timeseriesQuery that times out, we may still be building the timeseries. Try again in 20 minutes or so, at which point the search will likely be returned successfully.
powerQuery, consider reducing the time span, for example, from 7 days down to 1 day. You can make multiple API calls programmatically, and combine the results. This is analogous to what the DataSet UI does, in order to execute queries over long time periods (those searches where you see a % complete progress bar).
If you are using the Java client library, this functionality is now available starting with version 6.0.23. To take advantage of this, do the following:
Add a new parameter when calling the
QueryServiceconstructor, specifying the number of hours for each query. We suggest 12 hours:
new QueryService("...your API token...", 12).
When using this constructor variant, for now, only the
numericQuery()APIs are supported. If you need other APIs, you'll need to use a separate instance of
QueryServicewhere you use the old one-argument constructor.
NumericQueryResult.valueslist will contain separate values for each 12-hour query. If you passed
buckets=1(to count the number of matching records), then you'll want to simply sum the
valuesto get the total record count.
We will be enhancing the DataSet API to provide similar chunked-search functionality in the future.