Advanced Filtering
Filters¶
Advanced Filtering allows users to request data matching 1 or more conditions on any data field.
The use of the filters parameter overrides the following parameters:
nameurlcategoryuses_productuses_categoryparentlanguage
All of the functionality of those parameters is available via filters=. For example, uses_category=CRM is equivalent to:
[{"field": "tag_categories", "operator": "CONTAINS_ANY", "value": "CRM"}]
When using filters, it is preferable to POST the JSON-encoded data in the body of the request
Example - Python
import requests
import os
import json
response = requests.post(
"https://sourcestack-api.com/companies",
headers={
"X-API-KEY": os.environ["SOURCESTACK_KEY"],
"Content-Type": 'application/json' # this must be included for POSTs
},
data=json.dumps({
"count_only": True,
"filters": [{"field": "categories", "operator": "CONTAINS_ALL", "value": "Big Data"}]
})
)
print(response.json())
Operators¶
Filters are composed of one or more operators acting upon any given field - for example:
[{"field": "sku_count", "operator": "GREATER_THAN", "value": 10}]
The type of the field governs which operators may be used with it; not every operator + field combination is viable. Check the Data Fields documentation to confirm the type of each field.
Supported Combinations¶
| operator | text (str) | number (int+float) | list (array) | true/false (boolean) | datetime (dt_str) |
|---|---|---|---|---|---|
GREATER_THAN | |||||
LESS_THAN | |||||
EQUALS | |||||
NOT_EQUALS | |||||
IN | |||||
NOT_IN | |||||
CONTAINS_ANY | |||||
NOT_CONTAINS_ANY | |||||
CONTAINS_ALL | |||||
NOT_CONTAINS_ALL |
Combining Filters¶
Startup tier SourceStack customers can use up to 5 filters per request (Scaling/Enterprise tiers - up to 10 filters)
The logic of combining multiple filters is equivalent to a logical AND - for example:
field1 NOT IN ('foo', 'bar') AND field2 > 5
[{"field": "field1", "operator": "NOT_IN", "value": ["foo", "bar"]}, {"field": "field2", "operator": "GREATER_THAN", "value": "5"}]
There is no support for OR logic when combining filters - instead, use CONTAINS_ANY or NOT_CONTAINS_ANY, where applicable.
Datetime Formatting¶
The first_indexed, last_indexed, last_modified, sku_published_at, sku_created_at, sku_updated_at, job_created_at, and job_published_at fields are %Y-%m-%d %H:%M:%S formatted datetimes, standardized to UTC-0 (e.g. 2022-03-03 19:04:07).
Datetime filters will tolerate most datetime formats and epoch timestamps (e.g. 1613964267). We generally recommend using %Y-%m-%d format (e.g. 2020-05-22) where possible.
The API does not accept timezones/offsets and will assume all datetimes input are standardized to UTC-0.
Use of EQUALS and NOT_EQUALS Advanced Filtering operators with datetime fields is permitted but very rare; you must specify the datetime to the second.
You may use the following special values to refer to a timestamp a relative amount of time in the past: LAST_1D, LAST_2D, LAST_7D, LAST_14D, LAST_30D.
Special Filters¶
SourceStack supports certain custom Advanced Filtering values to enable specific functionality:
| value | supported operators | description |
|---|---|---|
GLYPH | CONTAINS_ANY, NOT_CONTAINS_ANY | Matches non-ASCII-printable Unicode characters, like emojis (All characters not in Hex 20-7F) |
NULL | EQUALS, NOT_EQUALS | Matches rows where there is no value in the given column |
LAST_1D | GREATER_THAN, LESS_THAN | Matches rows where the datetime value was before or after a given datetime one day ago (relative to midnight that day, UTC) |
To get SKUs where the sku_name does not contain emojis or non-Latin characters:
[{"field":"sku_name","operator":"NOT_CONTAINS_ANY","value":"GLYPH"}]
To get companies where the estimated web traffic rank (majestic_rank) is in the top 1,000,000:
[{"field":"majestic_rank","operator":"NOT_EQUALS","value":"NULL"}]
To get job posts that were indexed in the last 7 days:
[{"field": "last_indexed", "operator": "GREATER_THAN", "value": "LAST_7D"}]