The DeepResearch API performs comprehensive research by searching multiple sources, analyzing content, and generating detailed reports. Unlike synchronous APIs, tasks run in the background, enabling thorough multi-step research that can take several minutes.
DeepResearch is ideal for complex research tasks. For quick answers to simple questions, use the Answer API instead.
Basic Usage
from valyu import Valyu
valyu = Valyu()
# Create a research task
task = valyu.deepresearch.create(
input="What are the key differences between RAG and fine-tuning for LLMs?",
model="lite"
)
if task.success:
print(f"Task created: {task.deepresearch_id}")
# Wait for completion with progress updates
result = valyu.deepresearch.wait(
task.deepresearch_id,
on_progress=lambda s: print(f"Status: {s.status}")
)
if result.status == "completed":
print(result.output)
print(f"Cost: ${result.usage.total_cost:.4f}")
Research Modes
DeepResearch offers two modes optimized for different use cases:
| Mode | Best For | Typical Completion Time |
|---|
lite | Quick research, fact-checking, straightforward questions | 5-10 minutes |
heavy | Complex analysis, multi-faceted topics, detailed reports | 15-30 minutes |
# Use heavy mode for complex research
task = valyu.deepresearch.create(
input="Analyze the competitive landscape of cloud computing in 2024",
model="heavy"
)
Parameters
| Parameter | Type | Description |
|---|
input | str | Research query or task description |
Options (Optional)
| Parameter | Type | Description | Default |
|---|
model | "lite" | "heavy" | Research mode | "lite" |
output_formats | list | Output formats (see below) | ["markdown"] |
strategy | str | Natural language strategy instructions | None |
search | dict | Search configuration (filters, date range) | None |
urls | list[str] | URLs to analyze (max 10) | None |
files | list[dict] | File attachments (max 10) | None |
mcp_servers | list[dict] | MCP server configurations (max 5) | None |
code_execution | bool | Enable code execution | True |
previous_reports | list[str] | Previous task IDs for context (max 3) | None |
webhook_url | str | HTTPS URL for completion notification | None |
metadata | dict | Custom metadata for tracking | None |
Markdown (Default)
task = valyu.deepresearch.create(
input="Explain quantum computing advancements in 2024",
output_formats=["markdown"]
)
Markdown + PDF
Request both markdown and a downloadable PDF report:
task = valyu.deepresearch.create(
input="Write a report on renewable energy trends",
output_formats=["markdown", "pdf"]
)
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.pdf_url:
print(f"PDF available at: {result.pdf_url}")
Structured JSON
Get research results in a custom schema using JSON Schema specification:
task = valyu.deepresearch.create(
input="Research competitor pricing in the SaaS market",
output_formats=[{
"type": "object",
"properties": {
"competitors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"pricing_model": {"type": "string"},
"price_range": {"type": "string"},
"key_features": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "pricing_model"]
}
},
"market_summary": {"type": "string"},
"recommendations": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["competitors", "market_summary"]
}]
)
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.output_type == "json":
data = result.output
for competitor in data["competitors"]:
print(f"{competitor['name']}: {competitor['pricing_model']}")
You cannot mix JSON Schema with markdown/pdf formats. Use one or the other.
The schema must be a valid JSON Schema. Use type, properties, required, items, and other standard JSON Schema keywords.
Search Configuration
Filter which sources the research uses:
task = valyu.deepresearch.create(
input="Latest AI research in healthcare diagnostics",
model="heavy",
search={
"search_type": "all", # "all", "web", or "proprietary"
"included_sources": ["pubmed", "arxiv", "nature.com"],
"excluded_sources": ["wikipedia.org"],
"start_date": "2024-01-01",
"end_date": "2025-01-01"
}
)
Search Types
| Type | Description |
|---|
all | Search web and proprietary sources (default) |
web | Web sources only |
proprietary | Academic and premium sources only |
File Attachments
Analyze documents as part of research:
import base64
# Read and encode a PDF
with open("report.pdf", "rb") as f:
pdf_data = base64.b64encode(f.read()).decode()
task = valyu.deepresearch.create(
input="Summarize the key findings and compare with market trends",
model="heavy",
files=[{
"data": f"data:application/pdf;base64,{pdf_data}",
"filename": "report.pdf",
"mediaType": "application/pdf",
"context": "Q4 2024 financial report" # Optional context
}]
)
Supported file types: PDFs, images (PNG, JPEG, WebP), and documents.
Include specific URLs to analyze alongside web research:
task = valyu.deepresearch.create(
input="Compare the approaches described in these articles",
urls=[
"https://example.com/article-1",
"https://example.com/article-2"
]
)
Waiting for Completion
Basic Wait
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.status == "completed":
print(result.output)
With Progress Callback
Track research progress in real-time:
def on_progress(status):
if status.progress:
pct = (status.progress.current_step / status.progress.total_steps) * 100
print(f"Progress: {pct:.0f}% - Step {status.progress.current_step}/{status.progress.total_steps}")
print(f"Status: {status.status}")
result = valyu.deepresearch.wait(
task.deepresearch_id,
poll_interval=5, # Check every 5 seconds
max_wait_time=900, # Timeout after 15 minutes (lite mode)
on_progress=on_progress
)
Polling Parameters
| Parameter | Type | Description | Default |
|---|
poll_interval | int | Seconds between status checks | 5 |
max_wait_time | int | Maximum wait time in seconds | 3600 |
on_progress | Callable | Callback for progress updates | None |
class DeepResearchStatusResponse:
success: bool
deepresearch_id: str
status: str # "queued" | "running" | "completed" | "failed" | "cancelled"
query: str
mode: str # "lite" | "heavy"
output_type: str # "markdown" | "json"
output: str | dict # Research output
sources: list[DeepResearchSource] # Sources used
usage: Usage # Cost breakdown
completed_at: int # Unix timestamp
pdf_url: str | None # PDF download URL
images: list[ImageMetadata] # Generated images
error: str | None # Error message if failed
Source Object
class DeepResearchSource:
title: str
url: str
snippet: str
source: str # web, pubmed, arxiv, etc.
word_count: int
doi: str | None # For academic papers
Usage Object
class Usage:
search_cost: float # Search operations
contents_cost: float # Content retrieval
ai_cost: float # AI processing
compute_cost: float # Compute resources
total_cost: float # Total billed
Task Management
Check Status
status = valyu.deepresearch.status(task_id)
print(f"Status: {status.status}")
if status.progress:
print(f"Step {status.progress.current_step}/{status.progress.total_steps}")
Add Follow-up Instructions
Add instructions to a running task:
response = valyu.deepresearch.update(
task_id,
instruction="Focus more on peer-reviewed sources from 2024"
)
if response.success:
print("Instruction added")
Cancel a Task
response = valyu.deepresearch.cancel(task_id)
if response.success:
print("Task cancelled")
Delete a Task
response = valyu.deepresearch.delete(task_id)
if response.success:
print("Task deleted")
List All Tasks
tasks = valyu.deepresearch.list(
api_key_id="your-api-key-id",
limit=50
)
for task in tasks.data:
print(f"{task['query'][:50]}... - {task['status']}")
Webhooks
Get notified when research completes instead of polling. Webhooks are ideal for production systems and serverless architectures.
Setup
task = valyu.deepresearch.create(
input="Research market trends in AI",
webhook_url="https://your-app.com/webhooks/deepresearch" # Must be HTTPS
)
# IMPORTANT: Save the secret immediately - it's only returned once
webhook_secret = task.webhook_secret
print(f"Store securely: {webhook_secret}")
The webhook_secret is only returned once. Store it securely—you cannot retrieve it later.
Verifying Signatures
Always verify webhook signatures to ensure authenticity:
import hmac
import hashlib
def verify_webhook(
payload_body: str,
signature_header: str,
timestamp_header: str,
secret: str
) -> bool:
"""Verify the webhook signature is valid."""
# Reconstruct signed payload: timestamp.payload
signed_payload = f"{timestamp_header}.{payload_body}"
# Generate expected signature
expected_signature = "sha256=" + hmac.new(
secret.encode(),
signed_payload.encode(),
hashlib.sha256
).hexdigest()
# Constant-time comparison prevents timing attacks
return hmac.compare_digest(expected_signature, signature_header)
Handling Webhooks
from flask import Flask, request, jsonify
app = Flask(__name__)
WEBHOOK_SECRET = "your-stored-secret" # Load from secure storage
@app.route("/webhooks/deepresearch", methods=["POST"])
def handle_deepresearch_webhook():
signature = request.headers.get("X-Webhook-Signature")
timestamp = request.headers.get("X-Webhook-Timestamp")
payload = request.get_data(as_text=True)
if not verify_webhook(payload, signature, timestamp, WEBHOOK_SECRET):
return jsonify({"error": "Invalid signature"}), 401
data = request.json
if data["status"] == "completed":
# Process completed research
save_research_result(data["deepresearch_id"], data["output"])
elif data["status"] == "failed":
# Handle failure
log_error(data["deepresearch_id"], data["error"])
return jsonify({"received": True}), 200
| Header | Description |
|---|
X-Webhook-Signature | HMAC-SHA256 signature: sha256=<hex> |
X-Webhook-Timestamp | Unix timestamp (ms) when sent |
Content-Type | application/json |
Retry Behavior
- 5 retry attempts with exponential backoff (1s → 2s → 4s → 8s → 16s)
- 15 second timeout per request
- 4xx errors: No retry (client error)
- 5xx errors: Will retry (server error)
Return a 2xx status quickly and process the payload asynchronously to avoid timeouts.
Use Case Examples
Academic Research Assistant
task = valyu.deepresearch.create(
input="Analyze recent advances in transformer architectures for NLP",
model="heavy",
search={
"search_type": "proprietary",
"included_sources": ["arxiv", "pubmed", "nature.com"],
"start_date": "2024-01-01"
},
strategy="Focus on peer-reviewed sources. Include methodology comparisons and performance benchmarks."
)
result = valyu.deepresearch.wait(task.deepresearch_id)
# Access academic sources
for source in result.sources:
if source.doi:
print(f"📄 {source.title}")
print(f" DOI: {source.doi}")
Competitive Intelligence
task = valyu.deepresearch.create(
input="Analyze the competitive landscape of the cloud computing market",
model="heavy",
output_formats=[{
"type": "object",
"properties": {
"market_overview": {"type": "string"},
"key_players": {
"type": "array",
"items": {
"type": "object",
"properties": {
"company": {"type": "string"},
"market_share": {"type": "string"},
"strengths": {"type": "array", "items": {"type": "string"}},
"weaknesses": {"type": "array", "items": {"type": "string"}}
}
}
},
"trends": {"type": "array", "items": {"type": "string"}},
"opportunities": {"type": "array", "items": {"type": "string"}}
},
"required": ["market_overview", "key_players", "trends"]
}]
)
result = valyu.deepresearch.wait(task.deepresearch_id)
if result.output_type == "json":
analysis = result.output
print(f"Market Overview: {analysis['market_overview']}\n")
for player in analysis["key_players"]:
print(f"📊 {player['company']} ({player['market_share']})")
print(f" Strengths: {', '.join(player['strengths'][:3])}")
Document Analysis
import base64
# Load multiple documents
files = []
for filename in ["q1_report.pdf", "q2_report.pdf"]:
with open(filename, "rb") as f:
data = base64.b64encode(f.read()).decode()
files.append({
"data": f"data:application/pdf;base64,{data}",
"filename": filename,
"mediaType": "application/pdf"
})
task = valyu.deepresearch.create(
input="Compare performance across these quarterly reports. Identify trends and anomalies.",
model="heavy",
files=files,
output_formats=["markdown", "pdf"]
)
result = valyu.deepresearch.wait(task.deepresearch_id)
print(f"Analysis complete. PDF: {result.pdf_url}")
Error Handling
task = valyu.deepresearch.create(input="Research query")
if not task.success:
print(f"Failed to create task: {task.error}")
return
try:
result = valyu.deepresearch.wait(
task.deepresearch_id,
max_wait_time=900 # 15 minutes for lite, use 2700 for heavy
)
if result.status == "completed":
print(result.output)
elif result.status == "failed":
print(f"Research failed: {result.error}")
except TimeoutError:
print("Task timed out - cancelling")
valyu.deepresearch.cancel(task.deepresearch_id)
except ValueError as e:
print(f"Task error: {e}")
Best Practices
Choose the Right Mode
# Lite: Quick research, simple questions
valyu.deepresearch.create(
input="What is the current market cap of Apple?",
model="lite"
)
# Heavy: Complex analysis, detailed reports
valyu.deepresearch.create(
input="Analyze Apple's competitive position vs Microsoft in cloud services",
model="heavy"
)
Optimize Polling
# Lite mode: poll every 5 seconds, 15 min timeout
result = valyu.deepresearch.wait(task_id, poll_interval=5, max_wait_time=900)
# Heavy mode: poll every 15 seconds, 45 min timeout
result = valyu.deepresearch.wait(task_id, poll_interval=15, max_wait_time=2700)
# Production: use webhooks instead
task = valyu.deepresearch.create(
input="...",
webhook_url="https://your-app.com/webhooks"
)
Provide Clear Context
# ❌ Vague
valyu.deepresearch.create(input="Tell me about AI")
# ✅ Specific with context
valyu.deepresearch.create(
input="What are the practical applications of large language models in healthcare, focusing on diagnostic assistance and clinical documentation?",
strategy="Focus on peer-reviewed studies and real-world deployments. Include both benefits and limitations.",
search={"start_date": "2023-01-01"}
)
Handle Long-Running Tasks
import time
def research_with_status(query: str):
task = valyu.deepresearch.create(input=query, model="heavy")
if not task.success:
return None
print(f"Started research: {task.deepresearch_id}")
try:
result = valyu.deepresearch.wait(
task.deepresearch_id,
poll_interval=10,
max_wait_time=2700, # 45 minutes for heavy mode
on_progress=lambda s: print(f" {s.status}...")
)
return result
except TimeoutError:
# Check if still running
status = valyu.deepresearch.status(task.deepresearch_id)
if status.status == "running":
print("Still running - check back later")
return status
raise
result = research_with_status("Comprehensive market analysis of EV industry")