Content Endpoint Guide

Turn any web page into clean, structured data. The Contents API extracts content from URLs with batch processing, AI-powered summaries, and structured outputs.

What You Can Do

Feed your AI - Clean data without noise
Aggregate content - Extract structured data from multiple sources
Transform content - Convert web pages into usable formats
Automate research - Pull key information from articles, papers, and reports

Features

Batch Processing

Submit up to 10 URLs per request.

AI-Powered Structuring

Use JSON schemas to extract specific data points.

Smart Summarisation

Generate tailored summaries with custom instructions.

Pay-per-Success

Only pay for URLs that are successfully processed.

Getting Started

Basic Extraction

from valyu import Valyu

valyu = Valyu()  # Uses VALYU_API_KEY from env

data = valyu.contents(
    urls=[
        "https://techcrunch.com/category/artificial-intelligence/",
    ],
    response_length="medium",
    extract_effort="auto",
)
print(data["results"][0]["content"][:500])

Returns clean markdown content for each URL.

Response Length

Length	Characters	Use for
`short`	25,000	Summaries, key points
`medium`	50,000	Articles, blog posts
`large`	100,000	Academic papers, long-form content
`max`	Unlimited	Full document extraction
Custom integer	1,000-1,000,000	Specific requirements

Extract Effort

Effort	Description
`normal`	Standard speed and quality (default)
`high`	Better quality, slower
`auto`	Automatically chooses the right level

Screenshot Capture

Capture visual screenshots of pages alongside content extraction:

from valyu import Valyu

valyu = Valyu()

data = valyu.contents(
    urls=["https://example.com/article"],
    extract_effort="auto",
    screenshot=True,
)
print(data["results"][0]["screenshot_url"])

Screenshots are captured during page rendering and returned as pre-signed URLs. PDF files do not support screenshots.

Advanced Features

Summary Options

The summary field accepts four types of values:

No AI Processing (`false`)

from valyu import Valyu

valyu = Valyu()

data = valyu.contents(
    urls=["https://example.com/article"],
    extract_effort="normal",
    summary=False,
)
print(data["results"][0]["content"][:300])

Basic Summary (`true`)

from valyu import Valyu

valyu = Valyu()

data = valyu.contents(
    urls=["https://example.com/article"],
    extract_effort="auto",
    summary=True,
)
print(data["results"][0]["content"])

Custom Instructions (`string`)

from valyu import Valyu

valyu = Valyu()

data = valyu.contents(
    urls=["https://example.com/research-paper"],
    extract_effort="auto",
    summary="Summarise the methodology, key findings, and practical applications in 2-3 paragraphs",
)
print(data["results"][0]["content"])

Structured Extraction (`object`)

from valyu import Valyu

valyu = Valyu()

data = valyu.contents(
    urls=["https://example.com/product-page"],
    extract_effort="auto",
    summary={
        "type": "object",
        "properties": {
            "product_name": {"type": "string", "description": "Name of the product"},
            "price": {"type": "number", "description": "Product price in USD"},
            "features": {
                "type": "array",
                "items": {"type": "string"},
                "maxItems": 5,
                "description": "Key product features",
            },
            "availability": {
                "type": "string",
                "enum": ["in_stock", "out_of_stock", "preorder"],
                "description": "Product availability status",
            },
        },
        "required": ["product_name", "price"],
    },
)
print(data["results"][0]["content"])

JSON Schema Reference

For structured extraction, you can use any valid JSON Schema. See the JSON Schema Type Reference for details. Limits:

5,000 characters max
3 levels deep max
20 properties per object max

Common types:

string - Text with optional format validation
number / integer - Numbers with optional min/max
boolean - True/false
array - Lists with optional size limits
object - Nested structures

Examples

News Aggregator

Extract structured article data:

{
  "urls": [
    "https://techcrunch.com/category/artificial-intelligence/",
    "https://venturebeat.com/category/entrepreneur/",
    "https://www.bbc.co.uk/news/technology"
  ],
  "extract_effort": "auto",
  "summary": {
    "type": "object",
    "properties": {
      "headline": { "type": "string" },
      "summary_text": { "type": "string" },
      "category": { "type": "string" },
      "tags": {
        "type": "array",
        "items": { "type": "string" },
        "maxItems": 5
      }
    },
    "required": ["headline"]
  }
}

Research Paper

Extract structured academic data:

{
  "urls": ["https://arxiv.org/paper/example"],
  "response_length": "max",
  "extract_effort": "high",
  "summary": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "abstract": { "type": "string" },
      "methodology": { "type": "string" },
      "key_findings": {
        "type": "array",
        "items": { "type": "string" }
      },
      "limitations": { "type": "string" }
    },
    "required": ["title"]
  }
}

Product Info

Extract product data:

{
  "urls": ["https://company.com/product-A", "https://company.com/product-B"],
  "extract_effort": "auto",
  "summary": {
    "type": "object",
    "properties": {
      "product_name": { "type": "string" },
      "features": {
        "type": "array",
        "items": { "type": "string" }
      },
      "pricing": { "type": "string" },
      "target_audience": { "type": "string" }
    },
    "required": ["product_name"]
  }
}

Response Format

Raw Content (summary: false)

{
  "success": true,
  "error": null,
  "tx_id": "tx_12345678-1234-1234-1234-123456789abc",
  "results": [
    {
      "title": "AI Breakthrough in Natural Language Processing",
      "url": "https://example.com/article?utm_source=valyu",
      "content": "# AI Breakthrough in Natural Language Processing\n\nPage content in markdown...",
      "description": "Latest AI developments",
      "source": "web",
      "price": 0.001,
      "length": 12840,
      "data_type": "unstructured",
      "image_url": {
        "main": "https://example.com/hero-image.jpg"
      }
    }
  ],
  "urls_requested": 1,
  "urls_processed": 1,
  "urls_failed": 0,
  "total_cost_dollars": 0.001,
  "total_characters": 12840
}

Summary (summary: true or string)

{
  "success": true,
  "results": [
    {
      "title": "AI Breakthrough in Natural Language Processing",
      "content": "This article discusses a breakthrough in AI...",
      "summary_success": true,
      "price": 0.002,
      "data_type": "unstructured"
    }
  ]
}

Structured (JSON Schema)

{
  "success": true,
  "results": [
    {
      "title": "AI Breakthrough in Natural Language Processing",
      "content": {
        "title": "AI Breakthrough in Natural Language Processing",
        "author": "John Doe",
        "category": "technology",
        "key_points": [
          "New AI model achieves 95% accuracy",
          "Reduces computational requirements by 40%"
        ]
      },
      "summary_success": true,
      "price": 0.002,
      "data_type": "structured"
    }
  ]
}

Response Fields

Field	Description
`title`	Extracted page title
`url`	Original URL with UTM tracking parameters
`content`	Extracted content (markdown or JSON)
`description`	Page meta description or excerpt
`source`	Source type - always “web” for URL processing
`price`	Cost for processing this URL in dollars
`length`	Character count of extracted content
`data_type`	`"unstructured"` or `"structured"`
`summary_success`	Whether AI processing succeeded (only when `summary` parameter is used)
`image_url`	Dictionary of extracted image URLs
`screenshot_url`	Pre-signed URL to page screenshot (only when `screenshot=true` was requested)

Best Practices

Choosing Summary Type

false: Fastest and cheapest—no AI
true: Basic summary for overviews
"string": Custom instructions for specific needs
{object}: Structured extraction for data processing

JSON Schema Tips

Use clear descriptions to guide extraction
Use enums for consistent categorisation
Keep schemas under 3 levels deep
Mark essential fields as required

Batch Processing

Group similar content types together
Choose appropriate response length
Check summary_success for AI status
Track total_cost_dollars

Error Handling

# Check for partial failures (HTTP 206)
if response.status_code == 206:
    successful_results = [r for r in response.json()["results"]]
    failed_count = response.json()["urls_failed"]

# Check AI processing success
for result in results:
    if "summary" in result and "summary_success" in result:
        if not result["summary_success"]:
            print(f"AI processing failed for {result['url']}")

# Handle complete failures (HTTP 422)
if response.status_code == 422:
    error_message = response.json()["error"]

Try the Contents API

Full API reference with interactive examples

Next Steps

API Reference

Complete parameter documentation

Python SDK

Python integration

TypeScript SDK

TypeScript integration

Integrations

LangChain, LlamaIndex, and more

Getting Started

Guides & Best Practices

AI SDK Tooling Guides

Use Cases

Core Concepts

Data Sources

Compare

Important Updates

Account & Pricing

Other

​What You Can Do

​Features

Batch Processing

AI-Powered Structuring

Smart Summarisation

Pay-per-Success

​Getting Started

​Basic Extraction

​Response Length

​Extract Effort

​Screenshot Capture

​Advanced Features

​Summary Options

​No AI Processing (false)

​Basic Summary (true)

​Custom Instructions (string)

​Structured Extraction (object)

​JSON Schema Reference

​Examples

​News Aggregator

​Research Paper

​Product Info

​Response Format

​Raw Content (summary: false)

​Summary (summary: true or string)

​Structured (JSON Schema)

​Response Fields

​Best Practices

​Choosing Summary Type

​JSON Schema Tips

​Batch Processing

​Error Handling

Try the Contents API

​Next Steps

API Reference

Python SDK

TypeScript SDK

Integrations

What You Can Do

Features

Getting Started

Basic Extraction

Response Length

Extract Effort

Screenshot Capture

Advanced Features

Summary Options

No AI Processing (`false`)

Basic Summary (`true`)

Custom Instructions (`string`)

Structured Extraction (`object`)

JSON Schema Reference

Examples

News Aggregator

Research Paper

Product Info

Response Format

Raw Content (summary: false)

Summary (summary: true or string)

Structured (JSON Schema)

Response Fields

Best Practices

Choosing Summary Type

JSON Schema Tips

Batch Processing

Error Handling

Next Steps