Skip to main content
DeepResearch is an asynchronous research API that performs comprehensive research by searching multiple sources, analyzing content, and generating detailed reports. Unlike synchronous APIs, DeepResearch runs tasks in the background, allowing for thorough multi-step research that can take minutes to complete.

When to Use DeepResearch

Use DeepResearch when you need:
  • In-depth analysis - Complex research across multiple sources
  • Report generation - Markdown or PDF output with citations
  • Structured data extraction - Research results in custom JSON formats
  • Background processing - Long-running research without blocking your application
For quick answers to simple questions, consider using the Answer API instead.

Features

Multi-Source Research

Searches web, academic, and proprietary sources in a single task.

Research Modes

Choose fast for instant answers, standard for quick research, or heavy for complex analysis.

Multiple Outputs

Get results as markdown, PDF, or structured JSON.

File Analysis

Attach PDFs, images, and documents for analysis.

URL Extraction

Include specific URLs to analyze as part of research.

Webhooks

Get notified when research completes.

Quick Start

Create a Research Task

from valyu import Valyu

valyu = Valyu()

# Create a research task
task = valyu.deepresearch.create(
    query="What are the key differences between RAG and fine-tuning for LLMs?",
    mode="standard"
)

print(f"Task created: {task.deepresearch_id}")
print(f"Status: {task.status}")

Wait for Completion

# Wait for the task to complete
result = valyu.deepresearch.wait(
    task.deepresearch_id,
    poll_interval=5,      # Check every 5 seconds
    max_wait_time=1800    # Timeout after 30 minutes (standard mode)
)

if result.status == "completed":
    print("Research completed!")
    print(result.output)
    
    # Access sources used
    for source in result.sources:
        print(f"- {source.title}: {source.url}")
    
    # Check cost
    print(f"Cost: ${result.cost}")

Task Statuses

When you create a task, it goes through the following statuses:
StatusDescription
queuedTask is waiting to start due to rate limits or capacity
runningTask is actively researching
completedResearch finished successfully
failedResearch failed (check error field)
cancelledTask was cancelled by user

Queued Tasks

Tasks may be queued when:
  • Your organization has multiple concurrent tasks running
  • System capacity is temporarily limited
Queued tasks start automatically when capacity becomes available. No action is required.
task = valyu.deepresearch.create(query="Research query")

if task.status == "queued":
    print(f"Task queued: {task.message}")
    # Task will start automatically - just wait for it
    
result = valyu.deepresearch.wait(task.deepresearch_id)
The wait method handles queued tasks automatically. It continues polling until the task completes, fails, or is cancelled.

Research Modes

ModeUse CaseTypical Completion Time
fastQuick answers, lightweight research, simple lookups~5 minutes
standardBalanced research, deeper insights without long wait times~10-20 minutes
heavyIn-depth, long-running research tasks, complex analysisUp to ~90 minutes
# Use heavy mode for complex research
task = valyu.deepresearch.create(
    query="Analyze the competitive landscape of the cloud computing market in 2024",
    mode="heavy"
)

Output Formats

Markdown (Default)

task = valyu.deepresearch.create(
    query="Explain quantum computing advancements",
    output_formats=["markdown"]
)

Markdown + PDF

task = valyu.deepresearch.create(
    query="Write a report on renewable energy trends",
    output_formats=["markdown", "pdf"]
)

# After completion, access the PDF URL
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.pdf_url:
    print(f"PDF available at: {result.pdf_url}")

Structured JSON

Get research results in a custom schema using JSON Schema specification:
task = valyu.deepresearch.create(
    query="Research competitor pricing in the SaaS market",
    output_formats=[{
        "type": "object",
        "properties": {
            "competitors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "pricing_model": {"type": "string"},
                        "price_range": {"type": "string"},
                        "key_features": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["name", "pricing_model"]
                }
            },
            "market_summary": {"type": "string"},
            "recommendations": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["competitors", "market_summary"]
    }]
)
You cannot mix JSON Schema with markdown/pdf formats. Use one or the other.
The schema must be a valid JSON Schema. Use type, properties, required, items, and other standard JSON Schema keywords.

Search Configuration

Search parameters control which data sources are queried, what content is included/excluded, and how results are filtered by date or category. These parameters are specified in the search object within the request.

Search Type

Controls which backend search systems are queried:
  • "all" (default): Searches both web and proprietary data sources
  • "web": Searches only web sources (general web search, news, articles)
  • "proprietary": Searches only proprietary data sources (academic papers, finance data, patents, etc.)
When set at the request level, this parameter cannot be overridden by the AI agent during research.
task = valyu.deepresearch.create(
    query="Recent advances in quantum computing",
    search={"search_type": "proprietary"}
)

Included Sources

Restricts search to only the specified source types. When specified, only these sources will be searched. If the AI agent attempts to use other sources, they will be ignored. Available source types:
  • "web": General web search results (news, articles, websites)
  • "academic": Academic papers and research databases (ArXiv, PubMed, BioRxiv/MedRxiv, Clinical trials, FDA drug labels, WHO health data, NIH grants, Wikipedia)
  • "finance": Financial and economic data (Stock/crypto/FX prices, SEC filings, Company financial statements, Economic indicators, Prediction markets)
  • "patent": Patent and intellectual property data (USPTO patent database, Patent abstracts, claims, descriptions)
  • "transportation": Transit and transportation data (UK National Rail schedules, Maritime vessel tracking)
  • "politics": Government and parliamentary data (UK Parliament members, bills, votes)
  • "legal": Case law and legal data (UK court judgments, Legislation text)
task = valyu.deepresearch.create(
    query="Latest AI research",
    search={
        "search_type": "proprietary",
        "included_sources": ["academic", "web"]
    }
)

Excluded Sources

Excludes specific source types from search results. Uses the same source type values as included_sources. Cannot be used simultaneously with included_sources (use one or the other).
task = valyu.deepresearch.create(
    query="Clinical trial results",
    search={
        "search_type": "proprietary",
        "excluded_sources": ["web", "patent"]
    }
)

Start Date

Format: ISO date format (YYYY-MM-DD) Filters search results to only include content published or dated on or after this date. Applied to both publication dates and event dates when available. Works across all source types.
task = valyu.deepresearch.create(
    query="Recent AI developments",
    search={"start_date": "2024-01-01"}
)

End Date

Format: ISO date format (YYYY-MM-DD) Filters search results to only include content published or dated on or before this date. Applied to both publication dates and event dates when available. Works across all source types.
task = valyu.deepresearch.create(
    query="Historical market analysis",
    search={"end_date": "2020-12-31"}
)

Category

Filters results by a specific category. The exact categories available depend on the data source. Category values are source-dependent and may not be applicable to all source types.
task = valyu.deepresearch.create(
    query="Technology trends",
    search={"category": "technology"}
)

Country Code

Format: ISO 3166-1 alpha-2 code (e.g., "US", "GB", "DE") Filters web search results to prioritize content from a specific country or region. This affects web search results by biasing towards content relevant to the specified location.
task = valyu.deepresearch.create(
    query="Local business regulations",
    search={"country_code": "GB"}
)
Country code filtering primarily affects web search results. Academic and proprietary data sources may not support location-based filtering.

Important Notes

Parameter Enforcement

Request-level parameters are enforced and cannot be overridden by the AI agent during research. This ensures consistent search behavior throughout the research process. Tool-level source specifications are ignored if request-level sources are specified.

Date Filtering

Dates are applied to both publication dates and event dates when available. ISO format (YYYY-MM-DD) is required. Date filtering works across all source types. If only start_date is provided, results include all content from that date forward. If only end_date is provided, results include all content up to that date. Both dates can be combined for a specific date range.

File Attachments

Analyze documents as part of research:
import base64

# Read and encode a PDF
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

task = valyu.deepresearch.create(
    query="Summarize the key findings from this report and compare with current market trends",
    mode="heavy",
    files=[{
        "data": f"data:application/pdf;base64,{pdf_data}",
        "filename": "report.pdf",
        "mediaType": "application/pdf",
        "context": "Q4 2024 financial report"
    }]
)
Supported file types include PDFs, images (PNG, JPEG, WebP), and documents.

URL Extraction

Include specific URLs to analyze:
task = valyu.deepresearch.create(
    query="Compare the approaches described in these articles",
    urls=[
        "https://example.com/article-1",
        "https://example.com/article-2"
    ]
)

Task Management

Check Status

status = valyu.deepresearch.status(task_id)

print(f"Status: {status.status}")
if status.progress:
    print(f"Progress: {status.progress.current_step}/{status.progress.total_steps}")

Add Follow-up Instructions

While a task is running, you can add instructions to refine or adjust the scope of the research report to guide the research and report generation process.
Follow-up instructions can only be added before the writing phase starts. Once research completes and report generation begins, new instructions are rejected.
# Add first instruction
valyu.deepresearch.update(
    task_id,
    instruction="Focus more on peer-reviewed sources"
)

# Add another instruction
valyu.deepresearch.update(
    task_id,
    instruction="Include a comparison table of major providers"
)
Submit instructions as early as possible during the research phase. Check task status to know when research has completed.

Cancel a Task

valyu.deepresearch.cancel(task_id)

Delete a Task

valyu.deepresearch.delete(task_id)

List All Tasks

tasks = valyu.deepresearch.list(api_key_id="your-api-key-id", limit=50)

for task in tasks.data:
    print(f"{task['query']} - {task['status']}")

Webhooks

Webhooks provide real-time notifications when a DeepResearch task completes or fails, eliminating the need for polling.

When to Use Webhooks

ApproachBest For
WebhooksEvent-driven workflows
PollingSimple scripts, real-time progress tracking

Setting Up Webhooks

When you provide a webhook_url, the server generates a cryptographic secret for signature verification:
task = valyu.deepresearch.create(
    query="Research market trends",
    webhook_url="https://your-app.com/webhooks/deepresearch"
)

# IMPORTANT: Save the secret immediately - it's only returned once
webhook_secret = task.webhook_secret
print(f"Store this secret securely: {webhook_secret}")
The webhook_secret is only returned in the initial task creation response. Store it securely in your database or secrets manager—you cannot retrieve it later.
Webhook URLs must use HTTPS. HTTP URLs are rejected for security.

Webhook Payload

When the task completes or fails, your endpoint receives a POST request with the full task data:
{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "mode": "standard",
  "query": "Research market trends",
  "output_formats": ["markdown"],
  "output": "# Market Trends Analysis\n\n## Overview...",
  "pdf_url": "https://storage.valyu.ai/reports/...",
  "sources": [
    {
      "title": "Market Analysis Report 2024",
      "url": "https://example.com/report",
      "snippet": "Key findings indicate...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "images": [],
  "cost": 0.50,
  "error": null,
  "created_at": 1759617800000,
  "updated_at": 1759617836483,
  "completed_at": 1759617836483,
  "search_params": {},
  "code_execution": true,
  "current_step": 5,
  "total_steps": 5
}

Request Headers

Each webhook request includes headers for verification:
HeaderDescription
X-Webhook-SignatureHMAC-SHA256 signature in format sha256=<hex_signature>
X-Webhook-TimestampUnix timestamp (milliseconds) when the request was sent
Content-Typeapplication/json
User-AgentValyu-DeepResearch/1.0

Verifying Webhook Signatures

Always verify the signature to ensure the webhook is authentic:
import hmac
import hashlib

def verify_webhook(
    payload_body: str,
    signature_header: str,
    timestamp_header: str,
    secret: str
) -> bool:
    """Verify the webhook signature is valid."""
    # Reconstruct the signed payload: timestamp.payload
    signed_payload = f"{timestamp_header}.{payload_body}"
    
    # Generate expected signature
    expected_signature = "sha256=" + hmac.new(
        secret.encode(),
        signed_payload.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Use constant-time comparison to prevent timing attacks
    return hmac.compare_digest(expected_signature, signature_header)


# Example Flask endpoint
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = "your-stored-secret"  # Load from secure storage

@app.route("/webhooks/deepresearch", methods=["POST"])
def handle_webhook():
    signature = request.headers.get("X-Webhook-Signature")
    timestamp = request.headers.get("X-Webhook-Timestamp")
    payload = request.get_data(as_text=True)
    
    if not verify_webhook(payload, signature, timestamp, WEBHOOK_SECRET):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    
    if data["status"] == "completed":
        print(f"Research completed: {data['deepresearch_id']}")
        print(f"Output: {data['output'][:200]}...")
    elif data["status"] == "failed":
        print(f"Research failed: {data['error']}")
    
    return jsonify({"received": True}), 200

Retry Behavior

The webhook service automatically retries failed deliveries:
PropertyValue
Maximum retries5 attempts
Timeout per request15 seconds
Backoff strategyExponential: 1s → 2s → 4s → 8s → 16s
4xx errorsNo retry (client error)
5xx errorsWill retry (server error)
Return a 2xx status code quickly to acknowledge receipt. Process the webhook payload asynchronously to avoid timeouts.

Webhook Events

Webhooks are triggered for:
EventWhen
completedResearch finished successfully
failedResearch encountered an error
Webhooks are not sent for cancelled tasks. If you need to track cancellations, use the status endpoint or list endpoint to check task states.

Complete Webhook Flow

1. POST /deepresearch with webhook_url

2. API returns task with webhook_secret (store this!)

3. Research executes asynchronously

4. On completion/failure → POST to your webhook_url
   • Headers: X-Webhook-Signature, X-Webhook-Timestamp
   • Body: Full task payload (output, sources, usage, etc.)

5. Your server verifies signature and processes result

6. Return 2xx to acknowledge receipt

Progress Callbacks

Track progress in real-time:
def on_progress(status):
    if status.progress:
        print(f"Step {status.progress.current_step}/{status.progress.total_steps}")
    print(f"Status: {status.status}")

result = valyu.deepresearch.wait(
    task_id,
    poll_interval=5,
    on_progress=on_progress
)

Response Structure

Completed Task

{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "query": "What are the key differences between RAG and fine-tuning?",
  "mode": "standard",
  "output_type": "markdown",
  "output": "# RAG vs Fine-tuning\n\n## Overview...",
  "sources": [
    {
      "title": "Understanding RAG Systems",
      "url": "https://example.com/rag-guide",
      "snippet": "Retrieval-Augmented Generation combines...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "cost": 0.50,
  "completed_at": 1759617836483,
  "pdf_url": "https://s3.amazonaws.com/..."
}

Error Handling

task = valyu.deepresearch.create(query="Research query")

if not task.success:
    print(f"Failed to create task: {task.error}")
else:
    try:
        result = valyu.deepresearch.wait(task.deepresearch_id)
    except TimeoutError:
        print("Task took too long")
        valyu.deepresearch.cancel(task.deepresearch_id)
    except ValueError as e:
        print(f"Task failed: {e}")

Best Practices

Polling Strategy

  • Fast mode: Poll every 2-5 seconds, timeout after 10 minutes
  • Standard mode: Poll every 5-10 seconds, timeout after 30 minutes
  • Heavy mode: Poll every 10-30 seconds, timeout after 120 minutes
  • Use webhooks for production to avoid polling overhead
ModeRecommended Timeout
fast10 minutes (600 seconds)
standard30 minutes (1800 seconds)
heavy120 minutes (7200 seconds)

Cost Optimization

  • Use fast mode for quick lookups and simple questions
  • Use standard mode for moderate research needs
  • Filter sources with search config to focus on relevant content
  • Set start_date and end_date to limit scope

Research Quality

  • Provide clear, specific research queries
  • Use strategy to guide the research approach
  • Add context to file attachments

Limitations

LimitValue
Maximum files per request10
Maximum URLs per request10
Maximum MCP servers5
Maximum previous reports for context3
Recommended file size< 10MB per file

Next Steps