Extract clean, structured content from URLs
Turn any web page into clean, structured data. Extract content from up to 10 URLs per request with optional AI-powered summaries, structured extraction via JSON schemas, and screenshot capture.
- Synchronous (default): Returns extracted content directly
- Async: Set
async: trueto process in the background and poll ajob_id - Pay-per-success: Only charged for URLs that are successfully processed (0.001 with AI features). Failed URLs are free
- Partial / total failure: A
206is returned when some URLs fail; a422is returned when every URL fails
Authorizations
API key for authentication. Get yours at platform.valyu.ai.
Body
Content extraction request parameters.
URLs to extract content from. Must be valid HTTP/HTTPS URLs. Duplicates are silently removed.
1 - 10 elements[
"https://en.wikipedia.org/wiki/Quantum_computing",
"https://arxiv.org/abs/2401.00001"
]Maximum character length of extracted content per URL.
"short"- 25,000 characters (default)"medium"- 50,000 characters"large"- 100,000 characters"max"- No limit- Any positive integer for a custom character limit
short, medium, large, max Maximum budget in USD for this request. If estimated cost exceeds this, the request is rejected. Defaults to 2x the maximum possible cost.
x > 00.05
Controls AI-powered content processing. Adds $0.001 per URL when enabled.
false- No AI processing, raw extracted content only (default)true- Generate a text summary of each page"string"- Generate a summary guided by custom instructions (max 500 characters){json_schema}- Extract structured data matching the provided JSON schema
Controls how pages are rendered for extraction.
"auto"- Automatically selects the best method per URL (default)"normal"- HTTP-only extraction, fastest"high"- Full Chrome rendering, best for JavaScript-heavy pages
auto, normal, high Capture a screenshot of each page. Screenshot URL returned in screenshot_url field of each result.
Process URLs asynchronously in the background. Returns a job_id to poll via GET /v1/contents/jobs/{job_id}. Useful for slow, JavaScript-heavy pages.
HTTPS URL to receive a webhook POST when async processing completes. Only used when async is true. The webhook payload is signed with HMAC-SHA256.
"https://your-app.com/webhooks/valyu"
Response
All URLs processed successfully (synchronous).
Synchronous content extraction response.
Whether the extraction completed successfully.
true
Transaction ID for tracking.
"tx_a1b2c3d4-e5f6-7890-abcd-ef1234567890"
Number of deduplicated URLs submitted.
3
Number of URLs successfully processed.
3
Number of URLs that failed processing.
0
Extraction results for each URL.
Total cost in USD (only successful URLs are charged).
0.003
Sum of all result content lengths.
45230
Error message if some URLs failed. null on complete success.
null

