Skip to content

Advanced Filtering

Filters

Advanced Filtering allows users to request data matching 1 or more conditions on any data field.


The use of the filters parameter overrides the following parameters:

  • name
  • url
  • category
  • uses_product
  • uses_category
  • parent
  • language

All of the functionality of those parameters is available via filters=. For example, uses_category=CRM is equivalent to:

[{"field": "tag_categories", "operator": "CONTAINS_ANY", "value": "CRM"}]


When using filters, it is preferable to POST the JSON-encoded data in the body of the request

Example - Python
import requests
import os
import json

response = requests.post(
    "https://sourcestack-api.com/companies",
    headers={
      "X-API-KEY": os.environ["SOURCESTACK_KEY"],
      "Content-Type": 'application/json' # this must be included for POSTs
    },
    data=json.dumps({
      "count_only": True,
      "filters": [{"field": "categories", "operator": "CONTAINS_ALL", "value": "Big Data"}]
    })          
)
print(response.json())


Operators

Filters are composed of one or more operators acting upon any given field - for example:

[{"field": "sku_count", "operator": "GREATER_THAN", "value": 10}]


The type of the field governs which operators may be used with it; not every operator + field combination is viable. Check the Data Fields documentation to confirm the type of each field.


Supported Combinations

operator text (str) number (int+float) list (array) true/false (boolean) datetime (dt_str)
GREATER_THAN
LESS_THAN
EQUALS
NOT_EQUALS
IN
NOT_IN
CONTAINS_ANY
NOT_CONTAINS_ANY
CONTAINS_ALL
NOT_CONTAINS_ALL


Combining Filters

Startup tier SourceStack customers can use up to 5 filters per request (Scaling/Enterprise tiers - up to 10 filters)

The logic of combining multiple filters is equivalent to a logical AND - for example:

field1 NOT IN ('foo', 'bar') AND field2 > 5
[{"field": "field1", "operator": "NOT_IN", "value": ["foo", "bar"]}, {"field": "field2", "operator": "GREATER_THAN", "value": "5"}]

There is no support for OR logic when combining filters - instead, use CONTAINS_ANY or NOT_CONTAINS_ANY, where applicable.


Datetime Formatting

The first_indexed, last_indexed, last_modified, sku_published_at, sku_created_at, sku_updated_at, job_created_at, and job_published_at fields are %Y-%m-%d %H:%M:%S formatted datetimes, standardized to UTC-0 (e.g. 2022-03-03 19:04:07).

Datetime filters will tolerate most datetime formats and epoch timestamps (e.g. 1613964267). We generally recommend using %Y-%m-%d format (e.g. 2020-05-22) where possible.

The API does not accept timezones/offsets and will assume all datetimes input are standardized to UTC-0.

Use of EQUALS and NOT_EQUALS Advanced Filtering operators with datetime fields is permitted but very rare; you must specify the datetime to the second.

You may use the following special values to refer to a timestamp a relative amount of time in the past: LAST_1D, LAST_2D, LAST_7D, LAST_14D, LAST_30D.


Special Filters

SourceStack supports certain custom Advanced Filtering values to enable specific functionality:

value supported operators description
GLYPH CONTAINS_ANY, NOT_CONTAINS_ANY Matches non-ASCII-printable Unicode characters, like emojis (All characters not in Hex 20-7F)
NULL EQUALS, NOT_EQUALS Matches rows where there is no value in the given column
LAST_1D GREATER_THAN, LESS_THAN Matches rows where the datetime value was before or after a given datetime one day ago (relative to midnight that day, UTC)


To get SKUs where the sku_name does not contain emojis or non-Latin characters:

[{"field":"sku_name","operator":"NOT_CONTAINS_ANY","value":"GLYPH"}]

To get companies where the estimated web traffic rank (majestic_rank) is in the top 1,000,000:

[{"field":"majestic_rank","operator":"NOT_EQUALS","value":"NULL"}]

To get job posts that were indexed in the last 7 days:

[{"field": "last_indexed", "operator": "GREATER_THAN", "value": "LAST_7D"}]