Back to Skills

Bright Data Best Practices

Skills web-data
Install
npx claude-code-templates@latest --skill web-data/bright-data-best-practices

Content

Bright Data APIs

Bright Data provides infrastructure for web data extraction at scale. Four primary APIs cover different use cases — always pick the most specific tool for the job.

Choosing the Right API

Use Case API Why
Scrape any webpage by URL (no interaction) Web Unlocker HTTP-based, auto-bypasses bot detection, cheapest
Google / Bing / Yandex search results SERP API Specialized for SERP extraction, returns structured data
Structured data from Amazon, LinkedIn, Instagram, TikTok, etc. Web Scraper API Pre-built scrapers, no parsing needed
Click, scroll, fill forms, run JS, intercept XHR Browser API Full browser automation
Puppeteer / Playwright / Selenium automation Browser API Connects via CDP/WebDriver

Authentication Pattern (All APIs)

All APIs share the same authentication model:

bash
export BRIGHTDATA_API_KEY="your-api-key"         # From Control Panel > Account Settings
export BRIGHTDATA_UNLOCKER_ZONE="zone-name"       # Web Unlocker zone name
export BRIGHTDATA_SERP_ZONE="serp-zone-name"      # SERP API zone name
export BROWSER_AUTH="brd-customer-ID-zone-NAME:PASSWORD"  # Browser API credentials

REST API authentication header for Web Unlocker and SERP API:

Authorization: Bearer YOUR_API_KEY

Web Unlocker API

HTTP-based scraping proxy. Best for simple page fetches without browser interaction.

Endpoint: POST https://api.brightdata.com/request

python
import requests

response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_ZONE_NAME",
        "url": "https://example.com/product/123",
        "format": "raw"
    }
)
html = response.text

Key Parameters

Parameter Type Description
zone string Zone name (required)
url string Target URL with http:// or https:// (required)
format string "raw" (HTML) or "json" (structured wrapper) (required)
method string HTTP verb, default "GET"
country string 2-letter ISO for geo-targeting (e.g., "us", "de")
data_format string Transform: "markdown" or "screenshot"
async boolean true for async mode

Quick Patterns

python
# Get markdown (best for LLM input)
response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"zone": ZONE, "url": url, "format": "raw", "data_format": "markdown"}
)

# Geo-targeted request
json={"zone": ZONE, "url": url, "format": "raw", "country": "de"}

# Screenshot for debugging
json={"zone": ZONE, "url": url, "format": "raw", "data_format": "screenshot"}

# Async for bulk processing
json={"zone": ZONE, "url": url, "format": "raw", "async": True}

Critical rule: Never use Web Unlocker with Puppeteer, Playwright, Selenium, or anti-detect browsers. Use Browser API instead.

See references/web-unlocker.md for complete reference including proxy interface, special headers, async flow, features, and billing.


SERP API

Structured search engine result extraction for Google, Bing, Yandex, DuckDuckGo.

Endpoint: POST https://api.brightdata.com/request (same as Web Unlocker)

python
response = requests.post(
    "https://api.brightdata.com/request",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "zone": "YOUR_SERP_ZONE",
        "url": "https://www.google.com/search?q=python+web+scraping&brd_json=1&gl=us&hl=en",
        "format": "raw"
    }
)
data = response.json()
for result in data.get("organic", []):
    print(result["rank"], result["title"], result["link"])

Essential Google URL Parameters

Parameter Description Example
q Search query q=python+web+scraping
brd_json Parsed JSON output brd_json=1 (always use for data pipelines)
gl Country for search gl=us
hl Language hl=en
start Pagination offset start=10 (page 2), start=20 (page 3)
tbm Search type tbm=nws (news), tbm=isch (images), tbm=vid (videos)
brd_mobile Device brd_mobile=1 (mobile), brd_mobile=ios
brd_browser Browser brd_browser=chrome
brd_ai_overview Trigger AI Overview brd_ai_overview=2
uule Encoded geo location for precise location targeting

Note: num parameter is deprecated as of September 2025. Use start for pagination.

Parsed JSON Response Structure

json
{
  "organic": [{"rank": 1, "global_rank": 1, "title": "...", "link": "...", "description": "..."}],
  "paid": [],
  "people_also_ask": [],
  "knowledge_graph": {},
  "related_searches": [],
  "general": {"results_cnt": 1240000000, "query": "..."}
}

Bing Key Parameters

Parameter Description
q Search query
setLang Language (prefer 4-letter: en-US)
cc Country code
first Pagination (increment by 10: 1, 11, 21...)
safesearch off, moderate, strict
brd_mobile Device type

Async for Bulk SERP

python
# Submit
response = requests.post(
    "https://api.brightdata.com/request",
    params={"async": "1"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"zone": SERP_ZONE, "url": "https://www.google.com/search?q=test&brd_json=1", "format": "raw"}
)
response_id = response.headers.get("x-response-id")

# Retrieve (retrieve calls are NOT billed)
result = requests.get(
    "https://api.brightdata.com/serp/get_result",
    params={"response_id": response_id},
    headers={"Authorization": f"Bearer {API_KEY}"}
)

Billing: Pay per 1,000 successful requests only. Async retrieve calls are not billed.

See references/serp-api.md for complete reference including Maps, Trends, Reviews, Lens, Hotels, Flights parameters.


Web Scraper API

Pre-built scrapers for structured data extraction from 100+ platforms. No parsing logic needed.

Sync Endpoint: POST https://api.brightdata.com/datasets/v3/scrape Async Endpoint: POST https://api.brightdata.com/datasets/v3/trigger

python
# Sync (up to 20 URLs, returns immediately)
response = requests.post(
    "https://api.brightdata.com/datasets/v3/scrape",
    params={"dataset_id": "YOUR_DATASET_ID", "format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"input": [{"url": "https://www.amazon.com/dp/B09X7M8TBQ"}]}
)

if response.status_code == 200:
    data = response.json()  # Results ready
elif response.status_code == 202:
    snapshot_id = response.json()["snapshot_id"]  # Poll for completion

Parameters

Parameter Type Description
dataset_id string Scraper identifier from the Scraper Library (required)
format string json (default), ndjson, jsonl, csv
custom_output_fields string Pipe-separated fields: url|title|price
include_errors boolean Include error info in results

Request Body

json
{
  "input": [
    { "url": "https://www.amazon.com/dp/B09X7M8TBQ" },
    { "url": "https://www.amazon.com/dp/B0B7CTCPKN" }
  ]
}

Poll for Async Results

python
import time

# Trigger
snapshot_id = requests.post(
    "https://api.brightdata.com/datasets/v3/trigger",
    params={"dataset_id": DATASET_ID, "format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"input": [{"url": u} for u in urls]}
).json()["snapshot_id"]

# Poll
while True:
    status = requests.get(
        f"https://api.brightdata.com/datasets/v3/progress/{snapshot_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    ).json()["status"]

    if status == "ready": break
    if status == "failed": raise Exception("Job failed")
    time.sleep(10)

# Download
data = requests.get(
    f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}",
    params={"format": "json"},
    headers={"Authorization": f"Bearer {API_KEY}"}
).json()

Progress status values: startingrunningready | failed Data retention: 30 days. Billing: Per delivered record. Invalid input URLs that fail are still billable.

See references/web-scraper-api.md for complete reference including scraper types, output formats, delivery options, and billing details.


Browser API (Scraping Browser)

Full browser automation via CDP/WebDriver. Handles CAPTCHA, fingerprinting, and anti-bot detection automatically.

Connection:

  • Playwright/Puppeteer: wss://${AUTH}@brd.superproxy.io:9222
  • Selenium: https://${AUTH}@brd.superproxy.io:9515
javascript
const { chromium } = require("playwright-core");

const AUTH = process.env.BROWSER_AUTH;
const browser = await chromium.connectOverCDP(`wss://${AUTH}@brd.superproxy.io:9222`);
const page = await browser.newPage();
page.setDefaultNavigationTimeout(120000); // Always set to 2 minutes

await page.goto("https://example.com", { waitUntil: "domcontentloaded" });
const html = await page.content();
await browser.close();
python
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(f"wss://{AUTH}@brd.superproxy.io:9222")
    page = await browser.new_page()
    page.set_default_navigation_timeout(120000)
    await page.goto("https://example.com", wait_until="domcontentloaded")
    html = await page.content()
    await browser.close()

Custom CDP Functions

Function Purpose
Captcha.solve Manually trigger CAPTCHA solving
Captcha.setAutoSolve Enable/disable auto CAPTCHA solving
Proxy.setLocation Set precise geo location (call BEFORE goto)
Proxy.useSession Maintain same IP across sessions
Emulation.setDevice Apply device profile (iPhone 14, etc.)
Emulation.getSupportedDevices List available device profiles
Unblocker.enableAdBlock Block ads to save bandwidth
Unblocker.disableAdBlock Re-enable ads
Input.type Fast text input for bulk form filling
Browser.addCertificate Install client SSL cert for session
Page.inspect Get DevTools debug URL for live session
javascript
// CDP session pattern for custom functions
const client = await page.target().createCDPSession();

// CAPTCHA solve with timeout
const result = await client.send("Captcha.solve", { timeout: 30000 });

// Precise geo location (must be before goto)
await client.send("Proxy.setLocation", {
  latitude: 37.7749,
  longitude: -122.4194,
  distance: 10,
  strict: true
});

// Block unnecessary resources
await client.send("Network.setBlockedURLs", { urls: ["*google-analytics*", "*.ads.*"] });

// Device emulation
await client.send("Emulation.setDevice", { deviceName: "iPhone 14" });

Session Rules

  • One initial navigation per session — new URL = new session
  • Idle timeout: 5 minutes
  • Max duration: 30 minutes

Geolocation

  • Country-level: append -country-us to credentials username
  • EU-wide: append -country-eu (routes through 29+ European countries)
  • Precise: use Proxy.setLocation CDP command (before navigation)

Error Codes

Code Issue Fix
407 Wrong port Playwright/Puppeteer → 9222, Selenium → 9515
403 Bad auth Check credentials format and zone type
503 Service scaling Wait 1 minute, reconnect

Billing: Traffic-based only. Block images/CSS/fonts to reduce costs.

See references/browser-api.md for complete reference including all CDP functions, bandwidth optimization, CAPTCHA patterns, and debugging.


Detailed References

  • references/web-unlocker.md — Web Unlocker: full parameter list, proxy interface, special headers, async flow, features, billing, anti-patterns
  • references/serp-api.md — SERP API: all Google params (Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, parsed JSON structure, async, billing
  • references/web-scraper-api.md — Web Scraper API: sync vs async, all parameters, polling, scraper types, output formats, billing
  • references/browser-api.md — Browser API: connection strings, session rules, all CDP functions, geo-targeting, bandwidth optimization, CAPTCHA, debugging, error codes

Stack Builder

0

Your stack is empty

Click + on components to add them