Trigger data collection API - Bright Data Docs

How It Works

By default, scraping requests are processed asynchronously. When a request is submitted, the system begins processing the job in the background and immediately returns a snapshot ID. Once the scraping task is complete, the results can be retrieved at your convenience by using the snapshot ID to download the data via the API. Alternatively, you can configure the request to automatically deliver the results to an external storage destination, such as an S3 bucket or Azure Blob Storage. This approach is well-suited for handling larger jobs or integrating with automated data pipelines.

Body

The inputs to be used by the scraper. Can be provided either as JSON or as a CSV file:

Content-Type

string

A JSON array of inputs

Example: [{"url":"https://www.airbnb.com/rooms/50122531"}]

A CSV file, in a field called data

Example (curl): data=@path/to/your/file.csv

Web Scraper Types

Each scraper can require different inputs. There are 2 main types of scrapers:

1. PDP

These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages

2. Discovery

Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.

Request examples

`PDP` with URL input

Input format for PDP is always a URL, pointing to the page to be scraped.

Sample Request

curl -H "Authorization: Bearer API_KEY" -H "Content-Type: application/json" -d '[{"url":"https://www.airbnb.com/rooms/50122531"},{"url":"https://www.airbnb.com/rooms/50127677"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_ld7ll037kqy322v05&format=json&uncompressed_webhook=true"

Discovery input based on the `discovery` method

Sample Request

curl -H "Authorization: Bearer x2x3fdaaddrer" -H "Content-Type: application/json" -d '[{"keyword":"light bulb"},{"keyword":"dog toys"},{"keyword":"home decor"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l7q7dkf244hwjntr0&endpoint=https://webhook-url.com&auth_header=QWxhZGRpbjpPcGVuU2VzYW1l&notify=https://notify-me.com/&format=ndjson&uncompressed_webhook=true&type=discover_new&discover_by=keyword&limit_per_input=10"

Input format for discovery can vary according to the specific scraper. Inputs can be:

[{"keyword": "light bulb"},{"keyword": "dog toys"},{"keyword": "home decor"}]

And more. Find out what inputs each scraper requires here.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

dataset_id

string

required

Dataset ID for which data collection is triggered.

Example:

"gd_l1vikfnt1wgvvqz95w"

custom_output_fields

string

List of output columns, separated by | (e.g., url|about.updated_on). Filters the response to include only the specified fields.

Example:

"url|about.updated_on"

type

enum<string>

Set it to "discover_new" to trigger a collection that includes a discovery phase.

Available options:

discover_new

discover_by

string

Specifies which discovery method to use. Available options: "keyword", "best_sellers_url", "category_url", "location" and more (according to the specific API). Relevant only for collections that include a discovery phase.

include_errors

boolean

Include errors report with the results.

limit_per_input

number

Limit the number of results per input. Relevant only for collections that include a discovery phase.

Required range: x >= 1

limit_multiple_results

number

Limit the total number of results.

Required range: x >= 1

notify

string

URL where the notification will be sent once the collection is finished. Notification will contain snapshot_id and status.

endpoint

string

Webhook URL where data will be delivered.

format

enum<string>

Specifies the format of the data to be delivered to the webhook endpoint.

Available options:

json,

ndjson,

jsonl,

csv

auth_header

string

Authorization header to be used when sending notification to notify URL or delivering data via webhook endpoint.

uncompressed_webhook

boolean

By default, the data will be sent to the webhook compressed. Pass true to send it uncompressed.

Body

The body is of type Only inputs · object[].

Response

200 - application/json

Collection job successfully started

The response is of type object.

​How It Works

​Body

​Web Scraper Types

​1. PDP

​2. Discovery

​Request examples

​PDP with URL input

​Discovery input based on the discovery method

Authorizations

Query Parameters

Body

Response

How It Works

Body

Web Scraper Types

1. PDP

2. Discovery

Request examples

`PDP` with URL input

Discovery input based on the `discovery` method