DeepResearch Documentation

DeepResearch is an asynchronous research API that performs comprehensive research by searching multiple sources, analyzing content, and generating detailed reports. Unlike synchronous APIs, DeepResearch runs tasks in the background, allowing for thorough multi-step research that can take minutes to complete.

When to Use DeepResearch

Use DeepResearch when you need:

In-depth analysis - Complex research across multiple sources
Report generation - Markdown or PDF output with citations
Structured data extraction - Research results in custom JSON formats
Background processing - Long-running research without blocking your application

For quick answers to simple questions, consider using the Answer API instead.

Features

Multi-Source Research

Searches web, academic, and proprietary sources in a single task.

Research Modes

Choose fast for instant answers, standard for quick research, or heavy for complex analysis.

Multiple Outputs

Get results as markdown, PDF, or structured JSON.

File Analysis

Attach PDFs, images, and documents for analysis.

URL Extraction

Include specific URLs to analyze as part of research.

Webhooks

Get notified when research completes.

Quick Start

Create a Research Task

from valyu import Valyu

valyu = Valyu()

# Create a research task
task = valyu.deepresearch.create(
    query="What are the key differences between RAG and fine-tuning for LLMs?",
    mode="standard"
)

print(f"Task created: {task.deepresearch_id}")
print(f"Status: {task.status}")

Wait for Completion

# Wait for the task to complete
result = valyu.deepresearch.wait(
    task.deepresearch_id,
    poll_interval=5,      # Check every 5 seconds
    max_wait_time=1800    # Timeout after 30 minutes (standard mode)
)

if result.status == "completed":
    print("Research completed!")
    print(result.output)
    
    # Access sources used
    for source in result.sources:
        print(f"- {source.title}: {source.url}")
    
    # Check cost
    print(f"Cost: ${result.cost}")

Task Statuses

When you create a task, it goes through the following statuses:

Status	Description
`queued`	Task is waiting to start due to rate limits or capacity
`running`	Task is actively researching
`completed`	Research finished successfully
`failed`	Research failed (check `error` field)
`cancelled`	Task was cancelled by user

Queued Tasks

Tasks may be queued when:

Your organization has multiple concurrent tasks running
System capacity is temporarily limited

Queued tasks start automatically when capacity becomes available. No action is required.

task = valyu.deepresearch.create(query="Research query")

if task.status == "queued":
    print(f"Task queued: {task.message}")
    # Task will start automatically - just wait for it
    
result = valyu.deepresearch.wait(task.deepresearch_id)

The wait method handles queued tasks automatically. It continues polling until the task completes, fails, or is cancelled.

Research Modes

Mode	Price	Best For	Max Steps
fast	$0.10	Quick queries, batch processing	10
standard	$0.50	Balanced research	15
heavy	$2.50	Complex topics requiring fact verification	15
max	$15.00	Exhaustive research with maximum quality	25

# Use fast mode for quick lookups
task = valyu.deepresearch.create(
    query="What is quantum computing?",
    mode="fast"
)

# Use heavy mode for complex research
task = valyu.deepresearch.create(
    query="Analyze the competitive landscape of the cloud computing market in 2024",
    mode="heavy"
)

# Use max mode for exhaustive research
task = valyu.deepresearch.create(
    query="Comprehensive analysis of AI safety research with fact verification",
    mode="max"
)

Output Formats

Markdown (Default)

task = valyu.deepresearch.create(
    query="Explain quantum computing advancements",
    output_formats=["markdown"]
)

Markdown + PDF

task = valyu.deepresearch.create(
    query="Write a report on renewable energy trends",
    output_formats=["markdown", "pdf"]
)

# After completion, access the PDF URL
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.pdf_url:
    print(f"PDF available at: {result.pdf_url}")

Structured JSON

Get research results in a custom schema using JSON Schema specification:

task = valyu.deepresearch.create(
    query="Research competitor pricing in the SaaS market",
    output_formats=[{
        "type": "object",
        "properties": {
            "competitors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "pricing_model": {"type": "string"},
                        "price_range": {"type": "string"},
                        "key_features": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["name", "pricing_model"]
                }
            },
            "market_summary": {"type": "string"},
            "recommendations": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["competitors", "market_summary"]
    }]
)

You cannot mix JSON Schema with markdown/pdf formats. Use one or the other.

The schema must be a valid JSON Schema. Use type, properties, required, items, and other standard JSON Schema keywords.

Search Configuration

Search parameters control which data sources are queried, what content is included/excluded, and how results are filtered by date or category. These parameters are specified in the search object within the request.

Search Type

Controls which backend search systems are queried:

"all" (default): Searches both web and proprietary data sources
"web": Searches only web sources (general web search, news, articles)
"proprietary": Searches only proprietary data sources (academic papers, finance data, patents, etc.)

When set at the request level, this parameter cannot be overridden by the AI agent during research.

task = valyu.deepresearch.create(
    query="Recent advances in quantum computing",
    search={"search_type": "proprietary"}
)

Included Sources

Restricts search to only the specified source types. When specified, only these sources will be searched. If the AI agent attempts to use other sources, they will be ignored. Available source types:

"web": General web search results (news, articles, websites)
"academic": Academic papers and research databases (ArXiv, PubMed, BioRxiv/MedRxiv, Clinical trials, FDA drug labels, WHO health data, NIH grants, Wikipedia)
"finance": Financial and economic data (Stock/crypto/FX prices, SEC filings, Company financial statements, Economic indicators, Prediction markets)
"patent": Patent and intellectual property data (USPTO patent database, Patent abstracts, claims, descriptions)
"transportation": Transit and transportation data (UK National Rail schedules, Maritime vessel tracking)
"politics": Government and parliamentary data (UK Parliament members, bills, votes)
"legal": Case law and legal data (UK court judgments, Legislation text)

task = valyu.deepresearch.create(
    query="Latest AI research",
    search={
        "search_type": "proprietary",
        "included_sources": ["academic", "web"]
    }
)

Excluded Sources

Excludes specific source types from search results. Uses the same source type values as included_sources. Cannot be used simultaneously with included_sources (use one or the other).

task = valyu.deepresearch.create(
    query="Clinical trial results",
    search={
        "search_type": "proprietary",
        "excluded_sources": ["web", "patent"]
    }
)

Start Date

Format: ISO date format (YYYY-MM-DD) Filters search results to only include content published or dated on or after this date. Applied to both publication dates and event dates when available. Works across all source types.

task = valyu.deepresearch.create(
    query="Recent AI developments",
    search={"start_date": "2024-01-01"}
)

End Date

Format: ISO date format (YYYY-MM-DD) Filters search results to only include content published or dated on or before this date. Applied to both publication dates and event dates when available. Works across all source types.

task = valyu.deepresearch.create(
    query="Historical market analysis",
    search={"end_date": "2020-12-31"}
)

Country Code

Format: ISO 3166-1 alpha-2 code (e.g., "US", "GB", "DE") Filters web search results to prioritize content from a specific country or region. This affects web search results by biasing towards content relevant to the specified location.

task = valyu.deepresearch.create(
    query="Local business regulations",
    search={"country_code": "GB"}
)

Country code filtering primarily affects web search results. Academic and proprietary data sources may not support location-based filtering.

Important Notes

Parameter Enforcement

Request-level parameters are enforced and cannot be overridden by the AI agent during research. This ensures consistent search behavior throughout the research process. Tool-level source specifications are ignored if request-level sources are specified.

Date Filtering

Dates are applied to both publication dates and event dates when available. ISO format (YYYY-MM-DD) is required. Date filtering works across all source types. If only start_date is provided, results include all content from that date forward. If only end_date is provided, results include all content up to that date. Both dates can be combined for a specific date range.

File Attachments

Analyze documents as part of research:

import base64

# Read and encode a PDF
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

task = valyu.deepresearch.create(
    query="Summarize the key findings from this report and compare with current market trends",
    mode="heavy",
    files=[{
        "data": f"data:application/pdf;base64,{pdf_data}",
        "filename": "report.pdf",
        "mediaType": "application/pdf",
        "context": "Q4 2024 financial report"
    }]
)

Supported file types include PDFs, images (PNG, JPEG, WebP), and documents.

File Uploads

Deep Research accepts file attachments via the files array in the request body. Files are validated on upload and rejected with a 400 status if they violate any constraints.

Supported File Types

Type	MIME Type	Extensions	Max Size
PDF	`application/pdf`	.pdf	50 MB
PNG	`image/png`	.png	20 MB
JPEG	`image/jpeg`	.jpg, .jpeg	20 MB
GIF	`image/gif`	.gif	20 MB
WebP	`image/webp`	.webp	20 MB
Plain text	`text/plain`	.txt, .md, .log	10 MB
CSV	`text/csv`	.csv	10 MB
Markdown	`text/markdown`	.md	10 MB
Word	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`	.docx	50 MB
Excel	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`	.xlsx	20 MB
PowerPoint	`application/vnd.openxmlformats-officedocument.presentationml.presentation`	.pptx	50 MB

Total size limit: 100 MB across all files in a single request. Max files per request: 10.

How Files Are Processed

Most file types are passed directly to the LLM as native file content parts. The exception is PPTX, which is not natively supported by Claude/Gemini. PPTX files are automatically converted to markdown text (slide-by-slide) before being sent to the model. Extracted text is truncated at 500K characters to prevent context overflow.

Error Responses

All validation errors return HTTP 400 with a JSON body:

{ "error": "..." }

Unsupported file type

Returned when the MIME type is not in the whitelist.

{ "error": "files[0]: Unsupported file type \"application/x-msdownload\". Supported types: .pdf, .png, .jpg, .jpeg, .gif, .webp, .txt, .md, .log, .csv, .docx, .xlsx, .pptx" }

Extension mismatch

Returned when the file extension doesn’t match the declared MIME type.

{ "error": "files[0]: Extension \".txt\" does not match MIME type \"application/pdf\". Expected: .pdf" }

Per-file size exceeded

Returned when a single file exceeds the limit for its type.

{ "error": "files[2]: File \"huge.pdf\" is 62.3 MB, exceeds 50 MB limit for application/pdf" }

Total size exceeded

Returned when the combined size of all files exceeds 100 MB.

{ "error": "Total file size 112.5 MB exceeds 100 MB limit" }

Structural errors

Returned when the file object is malformed (missing fields, wrong types, invalid data URL format).

{ "error": "files[0].data is required and must be a string (data URL)" }
{ "error": "files[0].filename is required and must be a string" }
{ "error": "files[0].mediaType is required and must be a string" }
{ "error": "files[0].data must be a data URL (e.g., \"data:application/pdf;base64,...\")" }

Too many files

{ "error": "Maximum 10 files allowed per request" }

URL Extraction

Include specific URLs to analyze:

task = valyu.deepresearch.create(
    query="Compare the approaches described in these articles",
    urls=[
        "https://example.com/article-1",
        "https://example.com/article-2"
    ]
)

Task Management

Check Status

status = valyu.deepresearch.status(task_id)

print(f"Status: {status.status}")
if status.progress:
    print(f"Progress: {status.progress.current_step}/{status.progress.total_steps}")

Add Follow-up Instructions

While a task is running, you can add instructions to refine or adjust the scope of the research report to guide the research and report generation process.

Follow-up instructions can only be added before the writing phase starts. Once research completes and report generation begins, new instructions are rejected.

# Add first instruction
valyu.deepresearch.update(
    task_id,
    instruction="Focus more on peer-reviewed sources"
)

# Add another instruction
valyu.deepresearch.update(
    task_id,
    instruction="Include a comparison table of major providers"
)

Submit instructions as early as possible during the research phase. Check task status to know when research has completed.

Cancel a Task

valyu.deepresearch.cancel(task_id)

Delete a Task

valyu.deepresearch.delete(task_id)

List All Tasks

tasks = valyu.deepresearch.list(limit=50)

for task in tasks.data:
    print(f"{task['query']} - {task['status']}")

Webhooks

Webhooks provide real-time notifications when a DeepResearch task completes or fails, eliminating the need for polling.

When to Use Webhooks

Approach	Best For
Webhooks	Event-driven workflows
Polling	Simple scripts, real-time progress tracking

Setting Up Webhooks

When you provide a webhook_url, the server generates a cryptographic secret for signature verification:

task = valyu.deepresearch.create(
    query="Research market trends",
    webhook_url="https://your-app.com/webhooks/deepresearch"
)

# IMPORTANT: Save the secret immediately - it's only returned once
webhook_secret = task.webhook_secret
print(f"Store this secret securely: {webhook_secret}")

The webhook_secret is only returned in the initial task creation response. Store it securely in your database or secrets manager—you cannot retrieve it later.

Webhook URLs must use HTTPS. HTTP URLs are rejected for security.

Webhook Payload

When the task completes or fails, your endpoint receives a POST request with the full task data:

{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "mode": "standard",
  "query": "Research market trends",
  "output_formats": ["markdown"],
  "output": "# Market Trends Analysis\n\n## Overview...",
  "pdf_url": "https://storage.valyu.ai/reports/...",
  "sources": [
    {
      "title": "Market Analysis Report 2024",
      "url": "https://example.com/report",
      "snippet": "Key findings indicate...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "images": [],
  "cost": 0.50,
  "error": null,
  "created_at": 1759617800000,
  "updated_at": 1759617836483,
  "completed_at": 1759617836483,
  "search_params": {},
  "code_execution": true,
  "current_step": 5,
  "total_steps": 5
}

Request Headers

Each webhook request includes headers for verification:

Header	Description
`X-Webhook-Signature`	HMAC-SHA256 signature in format `sha256=<hex_signature>`
`X-Webhook-Timestamp`	Unix timestamp (milliseconds) when the request was sent
`Content-Type`	`application/json`
`User-Agent`	`Valyu-DeepResearch/1.0`

Verifying Webhook Signatures

Always verify the signature to ensure the webhook is authentic:

import hmac
import hashlib

def verify_webhook(
    payload_body: str,
    signature_header: str,
    timestamp_header: str,
    secret: str
) -> bool:
    """Verify the webhook signature is valid."""
    # Reconstruct the signed payload: timestamp.payload
    signed_payload = f"{timestamp_header}.{payload_body}"
    
    # Generate expected signature
    expected_signature = "sha256=" + hmac.new(
        secret.encode(),
        signed_payload.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Use constant-time comparison to prevent timing attacks
    return hmac.compare_digest(expected_signature, signature_header)


# Example Flask endpoint
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = "your-stored-secret"  # Load from secure storage

@app.route("/webhooks/deepresearch", methods=["POST"])
def handle_webhook():
    signature = request.headers.get("X-Webhook-Signature")
    timestamp = request.headers.get("X-Webhook-Timestamp")
    payload = request.get_data(as_text=True)
    
    if not verify_webhook(payload, signature, timestamp, WEBHOOK_SECRET):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    
    if data["status"] == "completed":
        print(f"Research completed: {data['deepresearch_id']}")
        print(f"Output: {data['output'][:200]}...")
    elif data["status"] == "failed":
        print(f"Research failed: {data['error']}")
    
    return jsonify({"received": True}), 200

Retry Behavior

The webhook service automatically retries failed deliveries:

Property	Value
Maximum retries	5 attempts
Timeout per request	15 seconds
Backoff strategy	Exponential: 1s → 2s → 4s → 8s → 16s
4xx errors	No retry (client error)
5xx errors	Will retry (server error)

Return a 2xx status code quickly to acknowledge receipt. Process the webhook payload asynchronously to avoid timeouts.

Webhook Events

Webhooks are triggered for:

Event	When
`completed`	Research finished successfully
`failed`	Research encountered an error

Webhooks are not sent for cancelled tasks. If you need to track cancellations, use the status endpoint or list endpoint to check task states.

Complete Webhook Flow

1. POST /deepresearch with webhook_url
   ↓
2. API returns task with webhook_secret (store this!)
   ↓
3. Research executes asynchronously
   ↓
4. On completion/failure → POST to your webhook_url
   • Headers: X-Webhook-Signature, X-Webhook-Timestamp
   • Body: Full task payload (output, sources, usage, etc.)
   ↓
5. Your server verifies signature and processes result
   ↓
6. Return 2xx to acknowledge receipt

Progress Callbacks

Track progress in real-time:

def on_progress(status):
    if status.progress:
        print(f"Step {status.progress.current_step}/{status.progress.total_steps}")
    print(f"Status: {status.status}")

result = valyu.deepresearch.wait(
    task_id,
    poll_interval=5,
    on_progress=on_progress
)

Response Structure

Completed Task

{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "query": "What are the key differences between RAG and fine-tuning?",
  "mode": "standard",
  "output_type": "markdown",
  "output": "# RAG vs Fine-tuning\n\n## Overview...",
  "sources": [
    {
      "title": "Understanding RAG Systems",
      "url": "https://example.com/rag-guide",
      "snippet": "Retrieval-Augmented Generation combines...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "cost": 0.50,
  "completed_at": 1759617836483,
  "pdf_url": "https://s3.amazonaws.com/..."
}

Error Handling

task = valyu.deepresearch.create(query="Research query")

if not task.success:
    print(f"Failed to create task: {task.error}")
else:
    try:
        result = valyu.deepresearch.wait(task.deepresearch_id)
    except TimeoutError:
        print("Task took too long")
        valyu.deepresearch.cancel(task.deepresearch_id)
    except ValueError as e:
        print(f"Task failed: {e}")

Best Practices

Polling Strategy

Fast mode: Poll every 2-5 seconds, timeout after 10 minutes
Standard mode: Poll every 5-10 seconds, timeout after 30 minutes
Heavy mode: Poll every 10-30 seconds, timeout after 120 minutes
Max mode: Poll every 30-60 seconds, timeout after 180 minutes
Use webhooks for production to avoid polling overhead

Recommended Timeouts

Mode	Recommended Timeout
`fast`	10 minutes (600 seconds)
`standard`	30 minutes (1800 seconds)
`heavy`	120 minutes (7200 seconds)
`max`	180 minutes (10800 seconds)

Cost Optimization

Use fast mode for quick lookups and simple questions
Use standard mode for moderate research needs
Use heavy mode for complex topics requiring fact verification
Use max mode only for exhaustive research requiring maximum quality
Filter sources with search config to focus on relevant content
Set start_date and end_date to limit scope

Research Quality

Provide clear, specific research queries
Use strategy to guide the research approach
Add context to file attachments

Limitations

Limit	Value
Maximum files per request	10
Maximum URLs per request	10
Maximum MCP servers	5
Maximum previous reports for context	3
Recommended file size	< 10MB per file

Next Steps

Batch Processing

Process multiple research tasks efficiently

API Reference

Complete endpoint documentation

Python SDK

Python SDK reference

TypeScript SDK

TypeScript SDK reference

Getting Started

Guides & Best Practices

AI SDK Tooling Guides

Use Cases

Core Concepts

Data Sources

Compare

Important Updates

Account & Pricing

Other

​When to Use DeepResearch

​Features

Multi-Source Research

Research Modes

Multiple Outputs

File Analysis

URL Extraction

Webhooks

​Quick Start

​Create a Research Task

​Wait for Completion

​Task Statuses

​Queued Tasks

​Research Modes

​Output Formats

​Markdown (Default)

​Markdown + PDF

​Structured JSON

​Search Configuration

​Search Type

​Included Sources

​Excluded Sources

​Start Date

​End Date

​Category

​Country Code

​Important Notes

​Parameter Enforcement

​Date Filtering

​File Attachments

​File Uploads

​Supported File Types

​How Files Are Processed

​Error Responses

​Unsupported file type

​Extension mismatch

​Per-file size exceeded

​Total size exceeded

​Structural errors

​Too many files

​URL Extraction

​Task Management

​Check Status

​Add Follow-up Instructions

​Cancel a Task

​Delete a Task

​List All Tasks

​Webhooks

​When to Use Webhooks

​Setting Up Webhooks

​Webhook Payload

​Request Headers

​Verifying Webhook Signatures

​Retry Behavior

​Webhook Events

​Complete Webhook Flow

​Progress Callbacks

​Response Structure

​Completed Task

​Error Handling

​Best Practices

​Polling Strategy

​Recommended Timeouts

​Cost Optimization

​Research Quality

​Limitations

​Next Steps

Batch Processing

API Reference

Python SDK

When to Use DeepResearch

Features

Quick Start

Create a Research Task

Wait for Completion

Task Statuses

Queued Tasks

Research Modes

Output Formats

Markdown (Default)

Markdown + PDF

Structured JSON

Search Configuration

Search Type

Included Sources

Excluded Sources

Start Date

End Date

Category

Country Code

Important Notes

Parameter Enforcement

Date Filtering

File Attachments

File Uploads

Supported File Types

How Files Are Processed

Error Responses

Unsupported file type

Extension mismatch

Per-file size exceeded

Total size exceeded

Structural errors

Too many files

URL Extraction

Task Management

Check Status

Add Follow-up Instructions

Cancel a Task

Delete a Task

List All Tasks

Webhooks

When to Use Webhooks

Setting Up Webhooks

Webhook Payload

Request Headers

Verifying Webhook Signatures

Retry Behavior

Webhook Events

Complete Webhook Flow

Progress Callbacks

Response Structure

Completed Task

Error Handling

Best Practices

Polling Strategy

Recommended Timeouts

Cost Optimization

Research Quality

Limitations

Next Steps