This update allows for a more granular and streamlined way to request and manage your data collections, facilitating more effective dataset generation according to your specific needs. Understanding When to Use Each API:

Initial Collection Without Customer-Defined View:

The 3 primary API endpoints serve distinct purposes in the data collection workflow, facilitating a structured and efficient process in obtaining tailored datasets.

Requesting a Collection:

Endpoint: POST https://api.brightdata.com/datasets/request_collection Parameters:
dataset_id
string
required
Dataset ID
type
string
required
discover_new OR url_collection
inputs
array
Array - json
file
multipart
multipart - csv
Example
curl "https://api.brightdata.com/datasets/request_collection?dataset_id=gd_l1viktl72bvl7bjuj0&type=discover_new" -H "Authorization: Bearer API_KEY" -H "Content-Type: application/json" -k -d '[{"id":"user-id"}]'
Processing may take several minutes, based on the number of inputs. When you request to discover (‘discover_new’), finding all links (PDPs) may take time.

Checking Status of the Collection Above:

Endpoint: GET https://api.brightdata.com/datasets/request_collection Parameters:
request_id
string
required
Obtain from the previous API.
freshness_ms
string
required
Sets data freshness.If data is within this period (e.g., req ested 1 wee , collected 5 days ago), 0 new scrape occurs. If data is not fresh, we scrape it now.
  • 1 week: 604,800,000 ms
  • 1 month: 2,592,000,000 ms
Example
curl -k "https://api.brightdata.com/datasets/request_collection?request_id=REQUEST_ID&freshness_ms=2592000000" -H "Authorization: Bearer API_KEY" 
Response Indicating Nmber of Records and Freshness Found:
{
    "dataset_id": request_job.dataset_id,
    "total_lines": 100,
    "fresh_count": 30,
    "name": "linkedin_companies custom input",
    "status": "done",
    "request_id": "XXXX",
}
The request is still running:
{
    "total_lines": 100,
    "status": "running",
}
Issue with one (or more) inputs: in this case the url was sent as URL
{
    "request_id": "xxxx",  
    "error": "Validation failed",
    "error_code": "validation",
    "validation_errors": [
        {
            "line": "{\"URL\":\"https://www.tiktok.com/search?q=tjd\"}",
            "index": 1,
            "errors": [
                ["url", "Required field"]
            ]
        }
    ]
}

Initiating a Collection:

Endpoint: POST https://api.brightdata.com/datasets/initiate_collection Parameters:
request_id
string
required
The unique identifier for the collection request you are inquiring about.
freshness_ms
string
required
The time in milliseconds indicating the desired data freshness.
request_id
string
required
The time in milliseconds indicating the desired data freshness.
Example
curl -X POST -k "https://api.brightdata.com/datasets/initiate_collection" -d '{"request_id":"j_ln2x567b2961de0d1x","freshness_ms":2592000000}' -H "Authorization: Bearer API_KEY" -H "content-type: application/json"

Collection After Defining a View:


Initiating a Collection:

Endpoint: POST https://api.brightdata.com/datasets/initiate Parameters:
dataset_id
string
required
view
string
required
type
string
required
discover_new OR url_collection
inputs
array
Array - json
file
multipart
multipart - csv
Example
curl "https://api.brightdata.com/datasets/initiate?dataset_id=XXX_DATASET_ID&type=url_collection&view=XXX_VIEW_ID" -H "Authorization: Bearer API_KEY" -H "Content-Type: application/json" -k -d '[{"id":"user-id"}]'
Dataset will be delivered to the setting configured for this view. By leveraging these enhanced capabilities, users can now tailor their data collection processes more efficiently, ensuring that the datasets generated are aligned with their project requirements.

How to retrieve results of snapshot that was already collected

curl "https://api.brightdata.com/datasets/snapshots/snapshot_id/download" -H "Authorization: Bearer API_KEY"