Skip to main content
The DeepResearch API performs comprehensive research by searching multiple sources, analyzing content, and generating detailed reports. Unlike synchronous APIs, tasks run in the background, enabling thorough multi-step research that can take several minutes.
DeepResearch is ideal for complex research tasks. For quick answers to simple questions, use the Answer API instead.

Basic Usage

from valyu import Valyu

valyu = Valyu()

# Create a research task
task = valyu.deepresearch.create(
    input="What are the key differences between RAG and fine-tuning for LLMs?",
    model="lite"
)

if task.success:
    print(f"Task created: {task.deepresearch_id}")
    
    # Wait for completion with progress updates
    result = valyu.deepresearch.wait(
        task.deepresearch_id,
        on_progress=lambda s: print(f"Status: {s.status}")
    )
    
    if result.status == "completed":
        print(result.output)
        print(f"Cost: ${result.usage.total_cost:.4f}")

Research Modes

DeepResearch offers two modes optimized for different use cases:
ModeBest ForTypical Completion Time
liteQuick research, fact-checking, straightforward questions5-10 minutes
heavyComplex analysis, multi-faceted topics, detailed reports15-30 minutes
# Use heavy mode for complex research
task = valyu.deepresearch.create(
    input="Analyze the competitive landscape of cloud computing in 2024",
    model="heavy"
)

Parameters

Input (Required)

ParameterTypeDescription
inputstrResearch query or task description

Options (Optional)

ParameterTypeDescriptionDefault
model"lite" | "heavy"Research mode"lite"
output_formatslistOutput formats (see below)["markdown"]
strategystrNatural language strategy instructionsNone
searchdictSearch configuration (filters, date range)None
urlslist[str]URLs to analyze (max 10)None
fileslist[dict]File attachments (max 10)None
mcp_serverslist[dict]MCP server configurations (max 5)None
code_executionboolEnable code executionTrue
previous_reportslist[str]Previous task IDs for context (max 3)None
webhook_urlstrHTTPS URL for completion notificationNone
metadatadictCustom metadata for trackingNone

Output Formats

Markdown (Default)

task = valyu.deepresearch.create(
    input="Explain quantum computing advancements in 2024",
    output_formats=["markdown"]
)

Markdown + PDF

Request both markdown and a downloadable PDF report:
task = valyu.deepresearch.create(
    input="Write a report on renewable energy trends",
    output_formats=["markdown", "pdf"]
)

result = valyu.deepresearch.wait(task.deepresearch_id)

if result.pdf_url:
    print(f"PDF available at: {result.pdf_url}")

Structured JSON

Get research results in a custom schema using JSON Schema specification:
task = valyu.deepresearch.create(
    input="Research competitor pricing in the SaaS market",
    output_formats=[{
        "type": "object",
        "properties": {
            "competitors": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "pricing_model": {"type": "string"},
                        "price_range": {"type": "string"},
                        "key_features": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["name", "pricing_model"]
                }
            },
            "market_summary": {"type": "string"},
            "recommendations": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["competitors", "market_summary"]
    }]
)

result = valyu.deepresearch.wait(task.deepresearch_id)

if result.output_type == "json":
    data = result.output
    for competitor in data["competitors"]:
        print(f"{competitor['name']}: {competitor['pricing_model']}")
You cannot mix JSON Schema with markdown/pdf formats. Use one or the other.
The schema must be a valid JSON Schema. Use type, properties, required, items, and other standard JSON Schema keywords.

Search Configuration

Filter which sources the research uses:
task = valyu.deepresearch.create(
    input="Latest AI research in healthcare diagnostics",
    model="heavy",
    search={
        "search_type": "all",  # "all", "web", or "proprietary"
        "included_sources": ["pubmed", "arxiv", "nature.com"],
        "excluded_sources": ["wikipedia.org"],
        "start_date": "2024-01-01",
        "end_date": "2025-01-01"
    }
)

Search Types

TypeDescription
allSearch web and proprietary sources (default)
webWeb sources only
proprietaryAcademic and premium sources only

File Attachments

Analyze documents as part of research:
import base64

# Read and encode a PDF
with open("report.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

task = valyu.deepresearch.create(
    input="Summarize the key findings and compare with market trends",
    model="heavy",
    files=[{
        "data": f"data:application/pdf;base64,{pdf_data}",
        "filename": "report.pdf",
        "mediaType": "application/pdf",
        "context": "Q4 2024 financial report"  # Optional context
    }]
)
Supported file types: PDFs, images (PNG, JPEG, WebP), and documents.

URL Extraction

Include specific URLs to analyze alongside web research:
task = valyu.deepresearch.create(
    input="Compare the approaches described in these articles",
    urls=[
        "https://example.com/article-1",
        "https://example.com/article-2"
    ]
)

Waiting for Completion

Basic Wait

result = valyu.deepresearch.wait(task.deepresearch_id)

if result.status == "completed":
    print(result.output)

With Progress Callback

Track research progress in real-time:
def on_progress(status):
    if status.progress:
        pct = (status.progress.current_step / status.progress.total_steps) * 100
        print(f"Progress: {pct:.0f}% - Step {status.progress.current_step}/{status.progress.total_steps}")
    print(f"Status: {status.status}")

result = valyu.deepresearch.wait(
    task.deepresearch_id,
    poll_interval=5,      # Check every 5 seconds
    max_wait_time=900,    # Timeout after 15 minutes (lite mode)
    on_progress=on_progress
)

Polling Parameters

ParameterTypeDescriptionDefault
poll_intervalintSeconds between status checks5
max_wait_timeintMaximum wait time in seconds3600
on_progressCallableCallback for progress updatesNone

Response Format

class DeepResearchStatusResponse:
    success: bool
    deepresearch_id: str
    status: str  # "queued" | "running" | "completed" | "failed" | "cancelled"
    query: str
    mode: str  # "lite" | "heavy"
    output_type: str  # "markdown" | "json"
    output: str | dict  # Research output
    sources: list[DeepResearchSource]  # Sources used
    usage: Usage  # Cost breakdown
    completed_at: int  # Unix timestamp
    pdf_url: str | None  # PDF download URL
    images: list[ImageMetadata]  # Generated images
    error: str | None  # Error message if failed

Source Object

class DeepResearchSource:
    title: str
    url: str
    snippet: str
    source: str  # web, pubmed, arxiv, etc.
    word_count: int
    doi: str | None  # For academic papers

Usage Object

class Usage:
    search_cost: float   # Search operations
    contents_cost: float # Content retrieval
    ai_cost: float       # AI processing 
    compute_cost: float  # Compute resources
    total_cost: float    # Total billed

Task Management

Check Status

status = valyu.deepresearch.status(task_id)

print(f"Status: {status.status}")
if status.progress:
    print(f"Step {status.progress.current_step}/{status.progress.total_steps}")

Add Follow-up Instructions

Add instructions to a running task:
response = valyu.deepresearch.update(
    task_id,
    instruction="Focus more on peer-reviewed sources from 2024"
)

if response.success:
    print("Instruction added")

Cancel a Task

response = valyu.deepresearch.cancel(task_id)

if response.success:
    print("Task cancelled")

Delete a Task

response = valyu.deepresearch.delete(task_id)

if response.success:
    print("Task deleted")

List All Tasks

tasks = valyu.deepresearch.list(
    api_key_id="your-api-key-id",
    limit=50
)

for task in tasks.data:
    print(f"{task['query'][:50]}... - {task['status']}")

Webhooks

Get notified when research completes instead of polling. Webhooks are ideal for production systems and serverless architectures.

Setup

task = valyu.deepresearch.create(
    input="Research market trends in AI",
    webhook_url="https://your-app.com/webhooks/deepresearch"  # Must be HTTPS
)

# IMPORTANT: Save the secret immediately - it's only returned once
webhook_secret = task.webhook_secret
print(f"Store securely: {webhook_secret}")
The webhook_secret is only returned once. Store it securely—you cannot retrieve it later.

Verifying Signatures

Always verify webhook signatures to ensure authenticity:
import hmac
import hashlib

def verify_webhook(
    payload_body: str,
    signature_header: str,
    timestamp_header: str,
    secret: str
) -> bool:
    """Verify the webhook signature is valid."""
    # Reconstruct signed payload: timestamp.payload
    signed_payload = f"{timestamp_header}.{payload_body}"
    
    # Generate expected signature
    expected_signature = "sha256=" + hmac.new(
        secret.encode(),
        signed_payload.encode(),
        hashlib.sha256
    ).hexdigest()
    
    # Constant-time comparison prevents timing attacks
    return hmac.compare_digest(expected_signature, signature_header)

Handling Webhooks

from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = "your-stored-secret"  # Load from secure storage

@app.route("/webhooks/deepresearch", methods=["POST"])
def handle_deepresearch_webhook():
    signature = request.headers.get("X-Webhook-Signature")
    timestamp = request.headers.get("X-Webhook-Timestamp")
    payload = request.get_data(as_text=True)
    
    if not verify_webhook(payload, signature, timestamp, WEBHOOK_SECRET):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    
    if data["status"] == "completed":
        # Process completed research
        save_research_result(data["deepresearch_id"], data["output"])
    elif data["status"] == "failed":
        # Handle failure
        log_error(data["deepresearch_id"], data["error"])
    
    return jsonify({"received": True}), 200

Webhook Headers

HeaderDescription
X-Webhook-SignatureHMAC-SHA256 signature: sha256=<hex>
X-Webhook-TimestampUnix timestamp (ms) when sent
Content-Typeapplication/json

Retry Behavior

  • 5 retry attempts with exponential backoff (1s → 2s → 4s → 8s → 16s)
  • 15 second timeout per request
  • 4xx errors: No retry (client error)
  • 5xx errors: Will retry (server error)
Return a 2xx status quickly and process the payload asynchronously to avoid timeouts.

Use Case Examples

Academic Research Assistant

task = valyu.deepresearch.create(
    input="Analyze recent advances in transformer architectures for NLP",
    model="heavy",
    search={
        "search_type": "proprietary",
        "included_sources": ["arxiv", "pubmed", "nature.com"],
        "start_date": "2024-01-01"
    },
    strategy="Focus on peer-reviewed sources. Include methodology comparisons and performance benchmarks."
)

result = valyu.deepresearch.wait(task.deepresearch_id)

# Access academic sources
for source in result.sources:
    if source.doi:
        print(f"📄 {source.title}")
        print(f"   DOI: {source.doi}")

Competitive Intelligence

task = valyu.deepresearch.create(
    input="Analyze the competitive landscape of the cloud computing market",
    model="heavy",
    output_formats=[{
        "type": "object",
        "properties": {
            "market_overview": {"type": "string"},
            "key_players": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "company": {"type": "string"},
                        "market_share": {"type": "string"},
                        "strengths": {"type": "array", "items": {"type": "string"}},
                        "weaknesses": {"type": "array", "items": {"type": "string"}}
                    }
                }
            },
            "trends": {"type": "array", "items": {"type": "string"}},
            "opportunities": {"type": "array", "items": {"type": "string"}}
        },
        "required": ["market_overview", "key_players", "trends"]
    }]
)

result = valyu.deepresearch.wait(task.deepresearch_id)

if result.output_type == "json":
    analysis = result.output
    print(f"Market Overview: {analysis['market_overview']}\n")
    
    for player in analysis["key_players"]:
        print(f"📊 {player['company']} ({player['market_share']})")
        print(f"   Strengths: {', '.join(player['strengths'][:3])}")

Document Analysis

import base64

# Load multiple documents
files = []
for filename in ["q1_report.pdf", "q2_report.pdf"]:
    with open(filename, "rb") as f:
        data = base64.b64encode(f.read()).decode()
        files.append({
            "data": f"data:application/pdf;base64,{data}",
            "filename": filename,
            "mediaType": "application/pdf"
        })

task = valyu.deepresearch.create(
    input="Compare performance across these quarterly reports. Identify trends and anomalies.",
    model="heavy",
    files=files,
    output_formats=["markdown", "pdf"]
)

result = valyu.deepresearch.wait(task.deepresearch_id)
print(f"Analysis complete. PDF: {result.pdf_url}")

Error Handling

task = valyu.deepresearch.create(input="Research query")

if not task.success:
    print(f"Failed to create task: {task.error}")
    return

try:
    result = valyu.deepresearch.wait(
        task.deepresearch_id,
        max_wait_time=900  # 15 minutes for lite, use 2700 for heavy
    )
    
    if result.status == "completed":
        print(result.output)
    elif result.status == "failed":
        print(f"Research failed: {result.error}")
        
except TimeoutError:
    print("Task timed out - cancelling")
    valyu.deepresearch.cancel(task.deepresearch_id)
    
except ValueError as e:
    print(f"Task error: {e}")

Best Practices

Choose the Right Mode

# Lite: Quick research, simple questions
valyu.deepresearch.create(
    input="What is the current market cap of Apple?",
    model="lite"
)

# Heavy: Complex analysis, detailed reports
valyu.deepresearch.create(
    input="Analyze Apple's competitive position vs Microsoft in cloud services",
    model="heavy"
)

Optimize Polling

# Lite mode: poll every 5 seconds, 15 min timeout
result = valyu.deepresearch.wait(task_id, poll_interval=5, max_wait_time=900)

# Heavy mode: poll every 15 seconds, 45 min timeout
result = valyu.deepresearch.wait(task_id, poll_interval=15, max_wait_time=2700)

# Production: use webhooks instead
task = valyu.deepresearch.create(
    input="...",
    webhook_url="https://your-app.com/webhooks"
)

Provide Clear Context

# ❌ Vague
valyu.deepresearch.create(input="Tell me about AI")

# ✅ Specific with context
valyu.deepresearch.create(
    input="What are the practical applications of large language models in healthcare, focusing on diagnostic assistance and clinical documentation?",
    strategy="Focus on peer-reviewed studies and real-world deployments. Include both benefits and limitations.",
    search={"start_date": "2023-01-01"}
)

Handle Long-Running Tasks

import time

def research_with_status(query: str):
    task = valyu.deepresearch.create(input=query, model="heavy")
    
    if not task.success:
        return None
    
    print(f"Started research: {task.deepresearch_id}")
    
    try:
        result = valyu.deepresearch.wait(
            task.deepresearch_id,
            poll_interval=10,
            max_wait_time=2700,  # 45 minutes for heavy mode
            on_progress=lambda s: print(f"  {s.status}...")
        )
        return result
        
    except TimeoutError:
        # Check if still running
        status = valyu.deepresearch.status(task.deepresearch_id)
        if status.status == "running":
            print("Still running - check back later")
            return status
        raise

result = research_with_status("Comprehensive market analysis of EV industry")