Documentation Index
Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Contents API enables you to extract clean, structured content from web pages with optional AI-powered processing, including summarization and structured data extraction.
Basic Usage
from valyu import Valyu
valyu = Valyu()
response = valyu.contents([
"https://en.wikipedia.org/wiki/Machine_learning"
])
print(f"Processed {response.urls_processed} of {response.urls_requested} URLs")
if response.results:
for result in response.results:
print(f"Title: {result.title}")
print(f"Content length: {result.length} characters")
print(f"Content preview: {result.content[:200]}...")
Parameters
URLs (Required)
| Parameter | Type | Description |
|---|
urls | List[str] | Array of URLs to process (max 10 sync, max 50 async) |
Options (Optional)
| Parameter | Type | Description | Default |
|---|
summary | bool | str | dict | AI processing configuration: False (none), True (auto), string (custom), or JSON schema | False |
extract_effort | "normal" | "high" | "auto" | Processing effort level for content extraction | ”normal” |
response_length | str | int | Content length per URL: "short" (25k), "medium" (50k), "large" (100k), "max", or custom | ”short” |
screenshot | bool | Request page screenshots. When True, results include screenshot_url field | False |
class ContentsResponse:
success: bool
error: Optional[str]
tx_id: str
urls_requested: int
urls_processed: int
urls_failed: int
results: List[ContentsResult]
total_cost_dollars: float
total_characters: int
class ContentsResult:
url: str
title: str
content: Union[str, dict] # string for raw content, dict for structured
description: Optional[str]
length: int
price: float
source: str
summary_success: Optional[bool]
data_type: Optional[str]
image_url: Optional[Dict[str, str]]
screenshot_url: Optional[str] # Only present when screenshot=True
citation: Optional[str]
Parameter Examples
Extract clean content without AI processing:
response = valyu.contents([
"https://www.python.org",
"https://nodejs.org"
])
if response.results:
for result in response.results:
print(f"{result.title}: {result.length} characters")
AI Summary (Boolean)
Get automatic AI summaries of the extracted content:
response = valyu.contents([
"https://en.wikipedia.org/wiki/Artificial_intelligence"
], summary=True, response_length="medium")
if response.results and response.results[0].content:
print("AI Summary:", response.results[0].content)
Custom Summary Instructions
Provide specific instructions for AI summarization:
response = valyu.contents([
"https://en.wikipedia.org/wiki/Artificial_intelligence"
],
summary="Summarize the main AI trends mentioned in exactly 3 bullet points",
response_length="medium",
extract_effort="high")
Extract specific data points using JSON schema:
response = valyu.contents([
"https://www.openai.com"
],
extract_effort="high",
response_length="large",
summary={
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "The name of the company"
},
"industry": {
"type": "string",
"enum": ["tech", "finance", "healthcare", "retail", "other"],
"description": "Primary industry sector"
},
"key_products": {
"type": "array",
"items": {"type": "string"},
"maxItems": 5,
"description": "Main products or services"
},
"founded_year": {
"type": "number",
"description": "Year the company was founded"
}
},
"required": ["company_name", "industry"]
})
if response.results and response.results[0].content:
print("Extracted data:", response.results[0].content)
Response Length Control
Control the amount of content extracted per URL:
response = valyu.contents([
"https://arxiv.org/abs/2301.00001",
"https://arxiv.org/abs/1706.03762",
"https://www.science.org/doi/10.1126/science.1234567"
],
response_length="large", # More content for academic papers
summary="Extract the main research findings and methodology",
extract_effort="high")
Control the extraction quality and processing intensity:
# Normal (default) - Fast
normal_response = valyu.contents(urls, extract_effort="normal")
# High - Enhanced quality for complex layouts and JS heavy pages
high_quality_response = valyu.contents(urls, extract_effort="high")
# Auto - Intelligent effort selection
auto_response = valyu.contents(urls, extract_effort="auto")
Response Length Options
Control content length with predefined or custom limits:
# Predefined lengths
short_response = valyu.contents(urls, response_length="short") # 25k characters
medium_response = valyu.contents(urls, response_length="medium") # 50k characters
large_response = valyu.contents(urls, response_length="large") # 100k characters
full_response = valyu.contents(urls, response_length="max") # No limit
# Custom length
custom_response = valyu.contents(urls, response_length=15000) # Custom character limit
Use Case Examples
Research Paper Analysis
Build an AI-powered academic research assistant that extracts and analyzes research papers:
def analyze_research_paper(paper_url: str):
response = valyu.contents([paper_url],
summary={
"type": "object",
"properties": {
"title": {"type": "string"},
"authors": {
"type": "array",
"items": {"type": "string"}
},
"abstract": {"type": "string"},
"key_contributions": {
"type": "array",
"items": {"type": "string"},
"maxItems": 5,
"description": "Main contributions of the research"
},
"methodology": {
"type": "string",
"description": "Research methodology and approach"
},
"results_summary": {
"type": "string",
"description": "Summary of key findings and results"
},
"implications": {
"type": "string",
"description": "Broader implications and significance"
},
"citations_count": {"type": "number"},
"publication_date": {"type": "string"}
},
"required": ["title", "abstract", "key_contributions", "methodology"]
},
response_length="max",
extract_effort="high")
if response.success and response.results and response.results[0].summary:
analysis = response.results[0].summary
print("=== Research Paper Analysis ===")
print(f"Title: {analysis['title']}")
print(f"Authors: {', '.join(analysis.get('authors', []))}")
print(f"\nAbstract: {analysis['abstract']}")
print("\nKey Contributions:")
for i, contrib in enumerate(analysis.get('key_contributions', []), 1):
print(f"{i}. {contrib}")
print(f"\nMethodology: {analysis['methodology']}")
print(f"\nResults: {analysis['results_summary']}")
print(f"\nImplications: {analysis['implications']}")
return analysis
return None
# Usage
paper_analysis = analyze_research_paper(
"https://arxiv.org/abs/2024.01234"
)
E-commerce Product Intelligence
Create a product research tool that extracts comprehensive product data:
def analyze_products(product_urls: List[str]):
response = valyu.contents(product_urls,
summary={
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"brand": {"type": "string"},
"price": {"type": "string"},
"original_price": {"type": "string"},
"discount_percentage": {"type": "string"},
"description": {"type": "string"},
"key_features": {
"type": "array",
"items": {"type": "string"},
"maxItems": 8
},
"specifications": {
"type": "object",
"description": "Technical specifications"
},
"customer_rating": {"type": "number"},
"review_count": {"type": "number"},
"availability": {
"type": "string",
"enum": ["in_stock", "out_of_stock", "limited", "pre_order"]
},
"shipping_info": {"type": "string"},
"warranty_info": {"type": "string"}
},
"required": ["product_name", "price", "description"]
}
},
"comparison_summary": {
"type": "string",
"description": "Overall comparison of the products"
}
}
},
extract_effort="high",
response_length="large")
if response.success and response.results and response.results[0].content:
analysis = response.results[0].content
print("=== Product Analysis ===")
for i, product in enumerate(analysis.get('products', []), 1):
print(f"\n{i}. {product['product_name']}")
print(f" Brand: {product['brand']}")
print(f" Price: {product['price']}")
print(f" Rating: {product['customer_rating']}/5 ({product['review_count']} reviews)")
print(f" Availability: {product['availability']}")
if product.get('key_features'):
print(" Key Features:")
for feature in product['key_features']:
print(f" • {feature}")
print(f"\n=== Comparison Summary ===")
print(analysis['comparison_summary'])
return analysis
return None
# Usage
product_comparison = analyze_products([
"https://amazon.com/product1",
"https://bestbuy.com/product2",
"https://target.com/product3"
])
Technical Documentation Processor
Build a documentation analysis tool that extracts API information and technical details:
def process_documentation(doc_urls: List[str]):
response = valyu.contents(doc_urls,
summary={
"type": "object",
"properties": {
"documentation_overview": {
"type": "string",
"description": "Overview of what the documentation covers"
},
"api_endpoints": {
"type": "array",
"items": {
"type": "object",
"properties": {
"method": {"type": "string"},
"path": {"type": "string"},
"description": {"type": "string"},
"parameters": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string"},
"required": {"type": "boolean"},
"description": {"type": "string"}
}
}
},
"response_format": {"type": "string"}
}
}
},
"authentication": {
"type": "object",
"properties": {
"method": {"type": "string"},
"description": {"type": "string"},
"example": {"type": "string"}
}
},
"rate_limits": {"type": "string"},
"code_examples": {
"type": "array",
"items": {
"type": "object",
"properties": {
"language": {"type": "string"},
"example": {"type": "string"},
"description": {"type": "string"}
}
}
},
"common_errors": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["documentation_overview", "api_endpoints", "authentication"]
},
extract_effort="high",
response_length="large")
if response.success and response.results and response.results[0].content:
docs = response.results[0].content
print("=== API Documentation Analysis ===")
print(f"\nOverview: {docs['documentation_overview']}")
print("\n=== Authentication ===")
auth = docs.get('authentication', {})
print(f"Method: {auth.get('method')}")
print(f"Description: {auth.get('description')}")
print("\n=== API Endpoints ===")
for i, endpoint in enumerate(docs.get('api_endpoints', []), 1):
print(f"\n{i}. {endpoint['method']} {endpoint['path']}")
print(f" Description: {endpoint['description']}")
if endpoint.get('parameters'):
print(" Parameters:")
for param in endpoint['parameters']:
required = "(required)" if param['required'] else "(optional)"
print(f" • {param['name']} ({param['type']}) {required}: {param['description']}")
if docs.get('rate_limits'):
print(f"\n=== Rate Limits ===")
print(docs['rate_limits'])
return docs
return None
# Usage
api_docs = process_documentation([
"https://docs.example.com/api-reference",
"https://developers.service.com/guide"
])
Async Processing
For large-scale extraction (11-50 URLs) or non-blocking workflows, use async mode.
Async mode is required when submitting more than 10 URLs. Max 50 URLs per request, processed in batches of 5 with 120s timeout per URL (vs 25s sync). Jobs expire after 7 days.
Submit and wait
The simplest approach — pass wait=True to block until the job completes:
result = valyu.contents(
urls=["https://example.com/page1", "https://example.com/page2", ...],
async_mode=True,
wait=True, # blocks until job completes
poll_interval=5, # seconds between polls (default: 5)
max_wait_time=3600, # max seconds to wait (default: 3600)
)
for r in result["results"]:
print(f"{r['title']}: {r['length']} characters")
print(f"Total cost: ${result['actual_cost_dollars']}")
Submit, then wait separately
For more control, submit first, then call wait_for_contents_job() with a progress callback:
# Submit — returns immediately
job = valyu.contents(
urls=["https://example.com/page1", "https://example.com/page2", ...],
async_mode=True,
webhook_url="https://your-app.com/webhooks/valyu", # optional
)
print(f"Job ID: {job['job_id']}")
# Store the webhook_secret immediately — it is ONLY returned here
if job.get("webhook_secret"):
save_webhook_secret(job["job_id"], job["webhook_secret"])
# Wait with progress tracking
result = valyu.wait_for_contents_job(
job["job_id"],
poll_interval=5,
max_wait_time=3600,
on_progress=lambda s: print(f" {s['status']} — batch {s.get('current_batch', '?')}/{s.get('total_batches', '?')}"),
)
if result["status"] in ("completed", "partial"):
for r in result["results"]:
print(f"{r['title']}: {r['length']} characters")
Manual polling
If you prefer full control over the polling loop:
import time
while True:
status = valyu.get_contents_job(job["job_id"])
print(f"Status: {status['status']}")
if status["status"] in ("completed", "partial", "failed"):
break
time.sleep(2)
Async parameters
| Parameter | Type | Description | Default |
|---|
async_mode | bool | Process URLs asynchronously. Required for more than 10 URLs. | False |
webhook_url | str | HTTPS URL to receive results via webhook POST. | None |
wait | bool | Block until the job completes (SDK handles polling). | False |
poll_interval | int | Seconds between polls when wait=True or using wait_for_contents_job. | 5 |
max_wait_time | int | Max seconds to wait before timing out. | 3600 |
on_progress | Callable | Callback invoked on each poll with the current status dict. | None |
Webhook verification
Webhooks are signed using HMAC-SHA256 with format "{timestamp}.{json_body}". See the Content Extraction guide for full verification examples.
Async response types
# Initial response (HTTP 202)
class ContentsAsyncResponse:
success: bool
job_id: str
status: str # Always "pending"
urls_total: int
poll_url: str
tx_id: str
webhook_secret: Optional[str] # ONLY returned here — store immediately
# Job status response (polling / wait result)
class ContentsJobResponse:
success: bool
job_id: str
status: str # "pending" | "processing" | "completed" | "partial" | "failed"
urls_total: int
urls_processed: int
urls_failed: int
created_at: int # Milliseconds since epoch
updated_at: int
current_batch: Optional[int] # Present during "processing"
total_batches: Optional[int] # Present during "processing"
results: Optional[List[ContentsResult]] # Present when completed/partial
actual_cost_dollars: Optional[float] # Present when completed/partial
error: Optional[str] # Present when partial/failed
Async Client (asyncio)
This section covers the AsyncValyu Python client — calling
the Contents API with async/await inside an event loop. That’s
different from the server-side async jobs described under
Async Processing above, which use
async_mode=True to submit a long-running job server-side. You
can combine both: use AsyncValyu to submit and poll a server-side
async job without blocking your event loop. The last example below
shows how.
AsyncValyu.contents accepts the exact same arguments and returns
the same response types (ContentsResponse,
ContentsJobCreateResponse, ContentsJobStatus) as the synchronous
contents — summary, extract_effort, response_length,
max_price_dollars, screenshot, async_mode, webhook_url,
wait, and the polling knobs all behave identically. The only
difference is that the call is awaited.
import asyncio
from valyu import AsyncValyu
async def main():
async with AsyncValyu() as valyu:
response = await valyu.contents(
urls=["https://arxiv.org/abs/1706.03762"],
)
for r in response.results:
print(r.title, "—", r.length, "chars")
asyncio.run(main())
The natural fit for async contents is many single-URL extractions
running in parallel — for example, expanding a list of search
hits into full-text, or warming a cache from a feed of URLs:
import asyncio
from valyu import AsyncValyu
urls = [
"https://arxiv.org/abs/1706.03762",
"https://arxiv.org/abs/2005.14165",
"https://arxiv.org/abs/2106.09685",
]
async def main():
async with AsyncValyu() as valyu:
responses = await asyncio.gather(*[
valyu.contents(urls=[u]) for u in urls
])
for url, r in zip(urls, responses):
print(url, "→", r.urls_processed, "/", r.urls_requested)
asyncio.run(main())
Each request extracts one URL; running them through
asyncio.gather means total wall time is the slowest single URL,
not the sum. Bound concurrency with asyncio.Semaphore when your
list grows past a few dozen — see the Async Usage section of the
Python SDK overview for that pattern.
Server-side async jobs without blocking the event loop
When a single job spans 11-50 URLs, you have to use async_mode=True
(see Async Processing above). Calling that
through AsyncValyu means polling the job status without parking a
thread: wait_for_contents_job is awaitable and sleeps between polls
using asyncio.sleep, leaving the event loop free to do other work.
import asyncio
from valyu import AsyncValyu
urls = [f"https://example.com/page{i}" for i in range(1, 31)]
async def main():
async with AsyncValyu() as valyu:
job = await valyu.contents(urls=urls, async_mode=True)
result = await valyu.wait_for_contents_job(
job.job_id,
poll_interval=5,
max_wait_time=3600,
)
print(f"Completed {result.urls_processed}/{result.urls_total}")
asyncio.run(main())
You can also submit several server-side async jobs concurrently and
wait on all of them:
async def main():
async with AsyncValyu() as valyu:
batches = [
[f"https://source-a.com/page{i}" for i in range(30)],
[f"https://source-b.com/page{i}" for i in range(30)],
]
jobs = await asyncio.gather(*[
valyu.contents(urls=b, async_mode=True) for b in batches
])
results = await asyncio.gather(*[
valyu.wait_for_contents_job(j.job_id) for j in jobs
])
for b, r in zip(batches, results):
print(f"batch of {len(b)}: {r.urls_processed} processed")
If you prefer a single call, pass wait=True and AsyncValyu will
submit the job and await its terminal state for you, the same way the
sync client does with blocking I/O:
async with AsyncValyu() as valyu:
result = await valyu.contents(
urls=urls,
async_mode=True,
wait=True,
poll_interval=5,
)
See the Python SDK overview for all
AsyncValyu constructor options and lifecycle patterns.
Error Handling
response = valyu.contents(urls, **options)
if not response.success:
print("Contents extraction failed:", response.error)
return
# Check for partial failures
if response.urls_failed and response.urls_failed > 0:
print(f"{response.urls_failed} of {response.urls_requested} URLs failed")
# Process successful results
if response.results:
for index, result in enumerate(response.results):
print(f"Result {index + 1}:")
print(f" Title: {result.title}")
print(f" URL: {result.url}")
print(f" Length: {result.length} characters")
if result.summary_success:
print(f" Summary: {result.content}")