Advanced Filtering
Filters¶
Advanced Filtering allows users to request data matching 1 or more conditions on any data field.
The use of the filters
parameter overrides the following parameters:
name
url
category
uses_product
uses_category
parent
language
All of the functionality of those parameters is available via filters=
. For example, uses_category=CRM
is equivalent to:
[{"field": "tag_categories", "operator": "CONTAINS_ANY", "value": "CRM"}]
When using filters
, it is preferable to POST
the JSON-encoded data in the body
of the request
Example - Python
import requests
import os
import json
response = requests.post(
"https://sourcestack-api.com/companies",
headers={
"X-API-KEY": os.environ["SOURCESTACK_KEY"],
"Content-Type": 'application/json' # this must be included for POSTs
},
data=json.dumps({
"count_only": True,
"filters": [{"field": "categories", "operator": "CONTAINS_ALL", "value": "Big Data"}]
})
)
print(response.json())
Operators¶
Filters are composed of one or more operators
acting upon any given field
- for example:
[{"field": "sku_count", "operator": "GREATER_THAN", "value": 10}]
The type
of the field
governs which operators
may be used with it; not every operator
+ field
combination is viable. Check the Data Fields documentation to confirm the type of each field.
Supported Combinations¶
operator | text (str) | number (int+float) | list (array) | true/false (boolean) | datetime (dt_str) |
---|---|---|---|---|---|
GREATER_THAN | |||||
LESS_THAN | |||||
EQUALS | |||||
NOT_EQUALS | |||||
IN | |||||
NOT_IN | |||||
CONTAINS_ANY | |||||
NOT_CONTAINS_ANY | |||||
CONTAINS_ALL | |||||
NOT_CONTAINS_ALL |
Combining Filters¶
Startup tier SourceStack customers can use up to 5 filters per request (Scaling/Enterprise tiers - up to 10 filters)
The logic of combining multiple filters is equivalent to a logical AND
- for example:
field1 NOT IN ('foo', 'bar') AND field2 > 5
[{"field": "field1", "operator": "NOT_IN", "value": ["foo", "bar"]}, {"field": "field2", "operator": "GREATER_THAN", "value": "5"}]
There is no support for OR
logic when combining filters - instead, use CONTAINS_ANY
or NOT_CONTAINS_ANY
, where applicable.
Datetime Formatting¶
The first_indexed
, last_indexed
, last_modified
, sku_published_at
, sku_created_at
, sku_updated_at
, job_created_at
, and job_published_at
fields are %Y-%m-%d %H:%M:%S
formatted datetimes, standardized to UTC-0 (e.g. 2022-03-03 19:04:07
).
Datetime filters will tolerate most datetime formats and epoch timestamps (e.g. 1613964267
). We generally recommend using %Y-%m-%d
format (e.g. 2020-05-22
) where possible.
The API does not accept timezones/offsets and will assume all datetimes input are standardized to UTC-0.
Use of EQUALS
and NOT_EQUALS
Advanced Filtering operators with datetime
fields is permitted but very rare; you must specify the datetime to the second.
You may use the following special values to refer to a timestamp a relative amount of time in the past: LAST_1D
, LAST_2D
, LAST_7D
, LAST_14D
, LAST_30D
.
Special Filters¶
SourceStack supports certain custom Advanced Filtering values to enable specific functionality:
value | supported operators | description |
---|---|---|
GLYPH | CONTAINS_ANY , NOT_CONTAINS_ANY | Matches non-ASCII-printable Unicode characters, like emojis (All characters not in Hex 20-7F ) |
NULL | EQUALS , NOT_EQUALS | Matches rows where there is no value in the given column |
LAST_1D | GREATER_THAN , LESS_THAN | Matches rows where the datetime value was before or after a given datetime one day ago (relative to midnight that day, UTC) |
To get SKUs where the sku_name does not contain emojis or non-Latin characters:
[{"field":"sku_name","operator":"NOT_CONTAINS_ANY","value":"GLYPH"}]
To get companies where the estimated web traffic rank (majestic_rank
) is in the top 1,000,000:
[{"field":"majestic_rank","operator":"NOT_EQUALS","value":"NULL"}]
To get job posts that were indexed in the last 7 days:
[{"field": "last_indexed", "operator": "GREATER_THAN", "value": "LAST_7D"}]