> ## Documentation Index
> Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# DeepResearch Documentation

> Build AI research assistants with Valyu's async deep research API

DeepResearch is an asynchronous research API that performs comprehensive research by searching multiple sources, analyzing content, and generating detailed reports. Unlike synchronous APIs, DeepResearch runs tasks in the background, allowing for thorough multi-step research that can take minutes to complete.

## When to Use DeepResearch

Use DeepResearch when you need:

* **In-depth analysis** - Complex research across multiple sources
* **Report generation** - Markdown or PDF output with citations
* **Structured data extraction** - Research results in custom JSON formats
* **Background processing** - Long-running research without blocking your application

For quick answers to simple questions, consider using the [Answer API](/guides/answer-api) instead.

## Features

<CardGroup cols={2}>
  <Card title="Multi-Source Research" icon="magnifying-glass">
    Searches web, academic, and proprietary sources in a single task.
  </Card>

  <Card title="Research Modes" icon="gauge">
    Choose fast for instant answers, standard for quick research, or heavy for complex analysis.
  </Card>

  <Card title="Multiple Outputs" icon="file-lines">
    Get results as markdown, PDF, or structured JSON.
  </Card>

  <Card title="File Analysis" icon="file-pdf">
    Attach PDFs, images, and documents for analysis.
  </Card>

  <Card title="URL Extraction" icon="link">
    Include specific URLs to analyze as part of research.
  </Card>

  <Card title="Webhooks" icon="bell">
    Get notified when research completes.
  </Card>
</CardGroup>

## Quick Start

### Create a Research Task

<CodeGroup>
  ```python Python theme={null}
  from valyu import Valyu

  valyu = Valyu()

  # Create a research task
  task = valyu.deepresearch.create(
      query="What are the key differences between RAG and fine-tuning for LLMs?",
      mode="standard"
  )

  print(f"Task created: {task.deepresearch_id}")
  print(f"Status: {task.status}")
  ```

  ```typescript TypeScript theme={null}
  import { Valyu } from "valyu-js";

  const valyu = new Valyu();

  // Create a research task
  const task = await valyu.deepresearch.create({
    query: "What are the key differences between RAG and fine-tuning for LLMs?",
    mode: "standard"
  });

  console.log(`Task created: ${task.deepresearch_id}`);
  console.log(`Status: ${task.status}`);
  ```

  ```bash cURL theme={null}
  curl -X POST "https://api.valyu.ai/v1/deepresearch/tasks" \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_API_KEY" \
    -d '{
      "query": "What are the key differences between RAG and fine-tuning for LLMs?",
      "mode": "standard"
    }'
  ```
</CodeGroup>

### Wait for Completion

<CodeGroup>
  ```python Python theme={null}
  # Wait for the task to complete
  result = valyu.deepresearch.wait(
      task.deepresearch_id,
      poll_interval=5,      # Check every 5 seconds
      max_wait_time=1800    # Timeout after 30 minutes (standard mode)
  )

  if result.status == "completed":
      print("Research completed!")
      print(result.output)
      
      # Access sources used
      for source in result.sources:
          print(f"- {source.title}: {source.url}")
      
      # Check cost
      print(f"Cost: ${result.cost}")
  ```

  ```typescript TypeScript theme={null}
  // Wait for the task to complete
  const result = await valyu.deepresearch.wait(task.deepresearch_id, {
    pollInterval: 5000,    // Check every 5 seconds
    maxWaitTime: 1800000   // Timeout after 30 minutes (standard mode)
  });

  if (result.status === "completed") {
    console.log("Research completed!");
    console.log(result.output);
    
    // Access sources used
    result.sources?.forEach(source => {
      console.log(`- ${source.title}: ${source.url}`);
    });
    
    // Check cost
    console.log(`Cost: $${result.cost}`);
  }
  ```
</CodeGroup>

## Task Statuses

When you create a task, it goes through the following statuses:

| Status           | Description                                             |
| ---------------- | ------------------------------------------------------- |
| `queued`         | Task is waiting to start due to rate limits or capacity |
| `running`        | Task is actively researching                            |
| `completed`      | Research finished successfully                          |
| `failed`         | Research failed (check `error` field)                   |
| `cancelled`      | Task was cancelled by user                              |
| `awaiting_input` | HITL checkpoint active, waiting for user response       |
| `paused`         | HITL checkpoint timed out, state saved to S3            |

### Queued Tasks

Tasks may be queued when:

* Your organization has multiple concurrent tasks running
* System capacity is temporarily limited

Queued tasks start automatically when capacity becomes available. No action is required.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(query="Research query")

  if task.status == "queued":
      print(f"Task queued: {task.message}")
      # Task will start automatically - just wait for it
      
  result = valyu.deepresearch.wait(task.deepresearch_id)
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Research query"
  });

  if (task.status === "queued") {
    console.log(`Task queued: ${task.message}`);
    // Task will start automatically - just wait for it
  }

  const result = await valyu.deepresearch.wait(task.deepresearch_id!);
  ```
</CodeGroup>

<Note>
  The `wait` method handles queued tasks automatically. It continues polling until the task completes, fails, or is cancelled.
</Note>

## Research Modes

| Mode         | Price   | Best For                                   | Max Steps |
| ------------ | ------- | ------------------------------------------ | --------- |
| **fast**     | \$0.10  | Quick queries, batch processing            | 10        |
| **standard** | \$0.50  | Balanced research                          | 15        |
| **heavy**    | \$2.50  | Complex topics requiring fact verification | 15        |
| **max**      | \$15.00 | Exhaustive research with maximum quality   | 25        |

<CodeGroup>
  ```python Python theme={null}
  # Use fast mode for quick lookups
  task = valyu.deepresearch.create(
      query="What is quantum computing?",
      mode="fast"
  )

  # Use heavy mode for complex research
  task = valyu.deepresearch.create(
      query="Analyze the competitive landscape of the cloud computing market in 2024",
      mode="heavy"
  )

  # Use max mode for exhaustive research
  task = valyu.deepresearch.create(
      query="Comprehensive analysis of AI safety research with fact verification",
      mode="max"
  )
  ```

  ```typescript TypeScript theme={null}
  // Use fast mode for quick lookups
  const task = await valyu.deepresearch.create({
    query: "What is quantum computing?",
    mode: "fast"
  });

  // Use heavy mode for complex research
  const task = await valyu.deepresearch.create({
    query: "Analyze the competitive landscape of the cloud computing market in 2024",
    mode: "heavy"
  });

  // Use max mode for exhaustive research
  const task = await valyu.deepresearch.create({
    query: "Comprehensive analysis of AI safety research with fact verification",
    mode: "max"
  });
  ```
</CodeGroup>

## Output Formats

### Markdown (Default)

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Explain quantum computing advancements",
      output_formats=["markdown"]
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Explain quantum computing advancements",
    outputFormats: ["markdown"]
  });
  ```
</CodeGroup>

### Markdown + PDF

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Write a report on renewable energy trends",
      output_formats=["markdown", "pdf"]
  )

  # After completion, access the PDF URL
  result = valyu.deepresearch.wait(task.deepresearch_id)
  if result.pdf_url:
      print(f"PDF available at: {result.pdf_url}")
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Write a report on renewable energy trends",
    outputFormats: ["markdown", "pdf"]
  });

  // After completion, access the PDF URL
  const result = await valyu.deepresearch.wait(task.deepresearch_id!);
  if (result.pdf_url) {
    console.log(`PDF available at: ${result.pdf_url}`);
  }
  ```
</CodeGroup>

### Structured JSON

Get research results in a custom schema using [JSON Schema](https://json-schema.org/understanding-json-schema) specification:

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Research competitor pricing in the SaaS market",
      output_formats=[{
          "type": "object",
          "properties": {
              "competitors": {
                  "type": "array",
                  "items": {
                      "type": "object",
                      "properties": {
                          "name": {"type": "string"},
                          "pricing_model": {"type": "string"},
                          "price_range": {"type": "string"},
                          "key_features": {
                              "type": "array",
                              "items": {"type": "string"}
                          }
                      },
                      "required": ["name", "pricing_model"]
                  }
              },
              "market_summary": {"type": "string"},
              "recommendations": {
                  "type": "array",
                  "items": {"type": "string"}
              }
          },
          "required": ["competitors", "market_summary"]
      }]
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Research competitor pricing in the SaaS market",
    outputFormats: [{
      type: "object",
      properties: {
        competitors: {
          type: "array",
          items: {
            type: "object",
            properties: {
              name: { type: "string" },
              pricing_model: { type: "string" },
              price_range: { type: "string" },
              key_features: {
                type: "array",
                items: { type: "string" }
              }
            },
            required: ["name", "pricing_model"]
          }
        },
        market_summary: { type: "string" },
        recommendations: {
          type: "array",
          items: { type: "string" }
        }
      },
      required: ["competitors", "market_summary"]
    }]
  });
  ```
</CodeGroup>

<Warning>
  You cannot mix JSON Schema with markdown/pdf formats. Use one or the other.
</Warning>

<Note>
  The schema must be a valid [JSON Schema](https://json-schema.org/understanding-json-schema). Use `type`, `properties`, `required`, `items`, and other standard JSON Schema keywords.
</Note>

## Guiding research and output

Two parameters let you control **how** the agent researches and **what** the final report looks like:

| Parameter           | Controls                                                                           |
| ------------------- | ---------------------------------------------------------------------------------- |
| `research_strategy` | The research phase — what to search for, which sources to prioritise, methodology  |
| `report_format`     | The output — structure, style, length, presentation (overrides default formatting) |

<Note>
  The older `strategy` parameter is deprecated. Use `research_strategy` instead. If both are provided, `research_strategy` takes precedence.
</Note>

<Warning>
  The combined length of `research_strategy` and `report_format` must not exceed **15,000 characters**. Requests exceeding this limit will be rejected.
</Warning>

### Research strategy

Use `research_strategy` to tell the agent *how* to conduct research — what angles to explore, which sources matter most, and what methodology to follow.

<CodeGroup>
  ```python Python theme={null}
  task = client.deepresearch.create(
      query="Evaluate the competitive landscape of AI chip manufacturers",
      mode="heavy",
      research_strategy="Focus on financial performance, market share data, and recent product launches. "
                        "Prioritise SEC filings, earnings call transcripts, and industry analyst reports. "
                        "Compare NVIDIA, AMD, Intel, and emerging startups.",
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await client.deepresearch.create({
    query: "Evaluate the competitive landscape of AI chip manufacturers",
    mode: "heavy",
    researchStrategy: "Focus on financial performance, market share data, and recent product launches. " +
                      "Prioritise SEC filings, earnings call transcripts, and industry analyst reports. " +
                      "Compare NVIDIA, AMD, Intel, and emerging startups.",
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.valyu.ai/v1/deepresearch \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_API_KEY" \
    -d '{
      "query": "Evaluate the competitive landscape of AI chip manufacturers",
      "mode": "heavy",
      "research_strategy": "Focus on financial performance, market share data, and recent product launches. Prioritise SEC filings, earnings call transcripts, and industry analyst reports. Compare NVIDIA, AMD, Intel, and emerging startups."
    }'
  ```
</CodeGroup>

### Report format

Use `report_format` to control the structure and style of the final output. This overrides default formatting.

<CodeGroup>
  ```python Python theme={null}
  task = client.deepresearch.create(
      query="Impact of GLP-1 receptor agonists on cardiovascular outcomes",
      mode="standard",
      research_strategy="Focus on Phase 3 clinical trials published after 2022. "
                        "Compare semaglutide, tirzepatide, and liraglutide.",
      report_format="Write a 2-page executive summary structured as: "
                    "1) Key Findings (bullet points), "
                    "2) Comparison Table (drug, trial, primary endpoint, result), "
                    "3) Clinical Implications (2 paragraphs). "
                    "Use formal medical writing style.",
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await client.deepresearch.create({
    query: "Impact of GLP-1 receptor agonists on cardiovascular outcomes",
    mode: "standard",
    researchStrategy: "Focus on Phase 3 clinical trials published after 2022. " +
                      "Compare semaglutide, tirzepatide, and liraglutide.",
    reportFormat: "Write a 2-page executive summary structured as: " +
                  "1) Key Findings (bullet points), " +
                  "2) Comparison Table (drug, trial, primary endpoint, result), " +
                  "3) Clinical Implications (2 paragraphs). " +
                  "Use formal medical writing style.",
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.valyu.ai/v1/deepresearch \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_API_KEY" \
    -d '{
      "query": "Impact of GLP-1 receptor agonists on cardiovascular outcomes",
      "mode": "standard",
      "research_strategy": "Focus on Phase 3 clinical trials published after 2022. Compare semaglutide, tirzepatide, and liraglutide.",
      "report_format": "Write a 2-page executive summary structured as: 1) Key Findings (bullet points), 2) Comparison Table (drug, trial, primary endpoint, result), 3) Clinical Implications (2 paragraphs). Use formal medical writing style."
    }'
  ```
</CodeGroup>

### Using both together

`research_strategy` and `report_format` work independently — you can use one or both. When combined, the agent follows your research strategy during the search phase, then formats the output according to your report format instructions.

<CodeGroup>
  ```python Python theme={null}
  # Research strategy guides WHAT to look for
  # Report format guides HOW to present it
  task = client.deepresearch.create(
      query="Renewable energy investment trends in Southeast Asia",
      mode="heavy",
      research_strategy="Analyse government policy incentives, private sector investments, "
                        "and infrastructure developments across Vietnam, Thailand, Indonesia, "
                        "and the Philippines. Include data from 2023-2025.",
      report_format="Create a country-by-country breakdown with: "
                    "investment figures in a table, key policy highlights as bullet points, "
                    "and a final section ranking countries by investment attractiveness. "
                    "Keep total length under 3000 words.",
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await client.deepresearch.create({
    query: "Renewable energy investment trends in Southeast Asia",
    mode: "heavy",
    researchStrategy: "Analyse government policy incentives, private sector investments, " +
                      "and infrastructure developments across Vietnam, Thailand, Indonesia, " +
                      "and the Philippines. Include data from 2023-2025.",
    reportFormat: "Create a country-by-country breakdown with: " +
                  "investment figures in a table, key policy highlights as bullet points, " +
                  "and a final section ranking countries by investment attractiveness. " +
                  "Keep total length under 3000 words.",
  });
  ```
</CodeGroup>

## Search configuration

Search parameters control which data sources are queried, what content is included/excluded, and how results are filtered by date or category. These parameters are specified in the `search` object within the request.

### Search Type

Controls which backend search systems are queried:

* **`"all"`** (default): Searches both web and proprietary data sources
* **`"web"`**: Searches only web sources (general web search, news, articles)
* **`"proprietary"`**: Searches only proprietary data sources (academic papers, finance data, patents, etc.)

When set at the request level, this parameter **cannot be overridden** by the AI agent during research.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Recent advances in quantum computing",
      search={"search_type": "proprietary"}
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Recent advances in quantum computing",
    search: { searchType: "proprietary" }
  });
  ```
</CodeGroup>

### Included Sources

Restricts search to only the specified source types. When specified, **only** these sources will be searched. If the AI agent attempts to use other sources, they will be ignored.

**Available source types:**

* **`"web"`**: General web search results (news, articles, websites)
* **`"academic"`**: Academic papers and research databases (ArXiv, PubMed, BioRxiv/MedRxiv, Clinical trials, FDA drug labels, WHO health data, NIH grants, Wikipedia)
* **`"finance"`**: Financial and economic data (Stock/crypto/FX prices, SEC filings, Company financial statements, Economic indicators, Prediction markets)
* **`"patent"`**: Patent and intellectual property data (USPTO patent database, Patent abstracts, claims, descriptions)
* **`"transportation"`**: Transit and transportation data (UK National Rail schedules, Maritime vessel tracking)
* **`"politics"`**: Government and parliamentary data (UK Parliament members, bills, votes)
* **`"legal"`**: Case law and legal data (UK court judgments, Legislation text)

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Latest AI research",
      search={
          "search_type": "proprietary",
          "included_sources": ["academic", "web"]
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Latest AI research",
    search: {
      searchType: "proprietary",
      includedSources: ["academic", "web"]
    }
  });
  ```
</CodeGroup>

### Excluded Sources

Excludes specific source types from search results. Uses the same source type values as `included_sources`. Cannot be used simultaneously with `included_sources` (use one or the other).

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Clinical trial results",
      search={
          "search_type": "proprietary",
          "excluded_sources": ["web", "patent"]
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Clinical trial results",
    search: {
      searchType: "proprietary",
      excludedSources: ["web", "patent"]
    }
  });
  ```
</CodeGroup>

### Source Biases

Soft ranking hints that influence (but don't hard-filter) which sources appear in results. Keys are domains or URL paths, values are integers from **-5** (strong demotion) to **+5** (strong boost). Unlike `included_sources`/`excluded_sources`, biases adjust ranking without removing any sources entirely.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Environmental policy analysis",
      search={
          "source_biases": {
              "epa.gov": 5,
              "nasa.gov": 3,
              "noaa.gov": 2,
              "example.com": -4
          }
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Environmental policy analysis",
    search: {
      sourceBiases: {
        "epa.gov": 5,
        "nasa.gov": 3,
        "noaa.gov": 2,
        "example.com": -4
      }
    }
  });
  ```
</CodeGroup>

### Start Date

**Format:** ISO date format (`YYYY-MM-DD`)

Filters search results to only include content published or dated on or after this date. Applied to both publication dates and event dates when available. Works across all source types.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Recent AI developments",
      search={"start_date": "2024-01-01"}
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Recent AI developments",
    search: { startDate: "2024-01-01" }
  });
  ```
</CodeGroup>

### End Date

**Format:** ISO date format (`YYYY-MM-DD`)

Filters search results to only include content published or dated on or before this date. Applied to both publication dates and event dates when available. Works across all source types.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Historical market analysis",
      search={"end_date": "2020-12-31"}
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Historical market analysis",
    search: { endDate: "2020-12-31" }
  });
  ```
</CodeGroup>

### Category

Filters results by a specific category. The exact categories available depend on the data source. Category values are source-dependent and may not be applicable to all source types.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Technology trends",
      search={"category": "technology"}
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Technology trends",
    search: { category: "technology" }
  });
  ```
</CodeGroup>

### Country Code

**Format:** ISO 3166-1 alpha-2 code (e.g., `"US"`, `"GB"`, `"DE"`)

Filters web search results to prioritize content from a specific country or region. This affects web search results by biasing towards content relevant to the specified location.

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Local business regulations",
      search={"country_code": "GB"}
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Local business regulations",
    search: { countryCode: "GB" }
  });
  ```
</CodeGroup>

<Note>
  Country code filtering primarily affects web search results. Academic and proprietary data sources may not support location-based filtering.
</Note>

### Important Notes

#### Parameter Enforcement

Request-level parameters are enforced and cannot be overridden by the AI agent during research. This ensures consistent search behavior throughout the research process. Tool-level source specifications are ignored if request-level sources are specified.

#### Date Filtering

Dates are applied to both publication dates and event dates when available. ISO format (`YYYY-MM-DD`) is required. Date filtering works across all source types. If only `start_date` is provided, results include all content from that date forward. If only `end_date` is provided, results include all content up to that date. Both dates can be combined for a specific date range.

## File Attachments

Analyze documents as part of research:

<CodeGroup>
  ```python Python theme={null}
  import base64

  # Read and encode a PDF
  with open("report.pdf", "rb") as f:
      pdf_data = base64.b64encode(f.read()).decode()

  task = valyu.deepresearch.create(
      query="Summarize the key findings from this report and compare with current market trends",
      mode="heavy",
      files=[{
          "data": f"data:application/pdf;base64,{pdf_data}",
          "filename": "report.pdf",
          "mediaType": "application/pdf",
          "context": "Q4 2024 financial report"
      }]
  )
  ```

  ```typescript TypeScript theme={null}
  import * as fs from "fs";

  // Read and encode a PDF
  const pdfBuffer = fs.readFileSync("report.pdf");
  const pdfData = pdfBuffer.toString("base64");

  const task = await valyu.deepresearch.create({
    query: "Summarize the key findings from this report and compare with current market trends",
    mode: "heavy",
    files: [{
      data: `data:application/pdf;base64,${pdfData}`,
      filename: "report.pdf",
      mediaType: "application/pdf",
      context: "Q4 2024 financial report"
    }]
  });
  ```
</CodeGroup>

Supported file types include PDFs, images (PNG, JPEG, WebP), and documents.

## File Uploads

Deep Research accepts file attachments via the `files` array in the request body. Files are validated on upload and rejected with a `400` status if they violate any constraints.

### Supported File Types

| Type       | MIME Type                                                                   | Extensions      | Max Size |
| ---------- | --------------------------------------------------------------------------- | --------------- | -------- |
| PDF        | `application/pdf`                                                           | .pdf            | 50 MB    |
| PNG        | `image/png`                                                                 | .png            | 20 MB    |
| JPEG       | `image/jpeg`                                                                | .jpg, .jpeg     | 20 MB    |
| GIF        | `image/gif`                                                                 | .gif            | 20 MB    |
| WebP       | `image/webp`                                                                | .webp           | 20 MB    |
| Plain text | `text/plain`                                                                | .txt, .md, .log | 10 MB    |
| CSV        | `text/csv`                                                                  | .csv            | 10 MB    |
| Markdown   | `text/markdown`                                                             | .md             | 10 MB    |
| Word       | `application/vnd.openxmlformats-officedocument.wordprocessingml.document`   | .docx           | 50 MB    |
| Excel      | `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`         | .xlsx           | 20 MB    |
| PowerPoint | `application/vnd.openxmlformats-officedocument.presentationml.presentation` | .pptx           | 50 MB    |

**Total size limit:** 100 MB across all files in a single request.

**Max files per request:** 10.

### How Files Are Processed

Most file types are passed directly to the LLM as native file content parts. The exception is **PPTX**, which is not natively supported by Claude/Gemini. PPTX files are automatically converted to markdown text (slide-by-slide) before being sent to the model.

Extracted text is truncated at 500K characters to prevent context overflow.

### Error Responses

All validation errors return HTTP `400` with a JSON body:

```json theme={null}
{ "error": "..." }
```

### Unsupported file type

Returned when the MIME type is not in the whitelist.

```json theme={null}
{ "error": "files[0]: Unsupported file type \"application/x-msdownload\". Supported types: .pdf, .png, .jpg, .jpeg, .gif, .webp, .txt, .md, .log, .csv, .docx, .xlsx, .pptx" }
```

### Extension mismatch

Returned when the file extension doesn't match the declared MIME type.

```json theme={null}
{ "error": "files[0]: Extension \".txt\" does not match MIME type \"application/pdf\". Expected: .pdf" }
```

### Per-file size exceeded

Returned when a single file exceeds the limit for its type.

```json theme={null}
{ "error": "files[2]: File \"huge.pdf\" is 62.3 MB, exceeds 50 MB limit for application/pdf" }
```

### Total size exceeded

Returned when the combined size of all files exceeds 100 MB.

```json theme={null}
{ "error": "Total file size 112.5 MB exceeds 100 MB limit" }
```

### Structural errors

Returned when the file object is malformed (missing fields, wrong types, invalid data URL format).

```json theme={null}
{ "error": "files[0].data is required and must be a string (data URL)" }
{ "error": "files[0].filename is required and must be a string" }
{ "error": "files[0].mediaType is required and must be a string" }
{ "error": "files[0].data must be a data URL (e.g., \"data:application/pdf;base64,...\")" }
```

### Too many files

```json theme={null}
{ "error": "Maximum 10 files allowed per request" }
```

## Tools

The `tools` parameter controls which optional capabilities the research agent can use during a task. All tools are **off by default** and must be explicitly enabled.

| Tool             | Description                                                                                                               |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `code_execution` | Run Python code in a sandboxed environment. Required for XLSX/PPTX/DOCX deliverable generation.                           |
| `screenshots`    | Capture visual screenshots of web pages. The agent decides when screenshots add value (charts, dashboards, infographics). |

See [Pricing](/pricing#tool-surcharges) for per-tool surcharge details.

<Note>
  You enable tools — the **agent decides when to use them**. For example, enabling `screenshots` does not screenshot every page; the agent selects pages where visual context is valuable.
</Note>

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Analyse Tesla's Q3 2026 earnings. Screenshot their investor relations page and any revenue charts. Calculate YoY revenue growth rates and operating margins.",
      mode="heavy",
      tools={
          "code_execution": True,
          "screenshots": True,
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Analyse Tesla's Q3 2026 earnings. Screenshot their investor relations page and any revenue charts. Calculate YoY revenue growth rates and operating margins.",
    mode: "heavy",
    tools: {
      code_execution: true,
      screenshots: true,
    },
  });
  ```
</CodeGroup>

### Screenshots

When `screenshots` is enabled, the agent can capture visual screenshots of web pages during research. Screenshots appear in the `images` array alongside charts with `image_type: "screenshot"`.

The agent autonomously decides when to screenshot — typically for pages with charts, dashboards, infographics, or other visual content that adds context to the report. Screenshots are embedded in the markdown report and rendered in PDF/PPTX/DOCX deliverables.

**Screenshot-specific fields on `ImageMetadata`:**

| Field         | Type   | Description                                       |
| ------------- | ------ | ------------------------------------------------- |
| `source_url`  | string | The original web page URL that was screenshotted  |
| `captured_at` | int    | Unix timestamp (ms) when the screenshot was taken |

**Limits:**

| Limit                    | Value          |
| ------------------------ | -------------- |
| Max screenshots per task | 15             |
| Max download size        | 5 MB           |
| Report image cap         | 1280 × 4000 px |

### Code execution

When `code_execution` is enabled, the agent can run Python code in a sandboxed environment to process data, perform calculations, or generate deliverables.

**Limits:**

| Limit                 | Value                      |
| --------------------- | -------------------------- |
| Language              | Python only                |
| Timeout per execution | 5–60 seconds (default 30s) |
| Network access        | None (sandbox is isolated) |
| Output                | Text only via `print()`    |

<Warning>
  `code_execution` previously defaulted to `true`. It now defaults to `false`. If you were relying on the implicit default, explicitly opt in via `tools: { "code_execution": true }`.
</Warning>

## URL Extraction

Include specific URLs to analyze:

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Compare the approaches described in these articles",
      urls=[
          "https://example.com/article-1",
          "https://example.com/article-2"
      ]
  )
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Compare the approaches described in these articles",
    urls: [
      "https://example.com/article-1",
      "https://example.com/article-2"
    ]
  });
  ```
</CodeGroup>

## Task Management

### Check Status

<CodeGroup>
  ```python Python theme={null}
  status = valyu.deepresearch.status(task_id)

  print(f"Status: {status.status}")
  if status.progress:
      print(f"Progress: {status.progress.current_step}/{status.progress.total_steps}")
  ```

  ```typescript TypeScript theme={null}
  const status = await valyu.deepresearch.status(taskId);

  console.log(`Status: ${status.status}`);
  if (status.progress) {
    console.log(`Progress: ${status.progress.current_step}/${status.progress.total_steps}`);
  }
  ```
</CodeGroup>

### Add Follow-up Instructions

While a task is running, you can add instructions to refine or adjust the scope of the research report to guide the research and report generation process.

<Warning>
  Follow-up instructions can only be added **before the writing phase starts**. Once research completes and report generation begins, new instructions are rejected.
</Warning>

<CodeGroup>
  ```python Python theme={null}
  # Add first instruction
  valyu.deepresearch.update(
      task_id,
      instruction="Focus more on peer-reviewed sources"
  )

  # Add another instruction
  valyu.deepresearch.update(
      task_id,
      instruction="Include a comparison table of major providers"
  )
  ```

  ```typescript TypeScript theme={null}
  // Add first instruction
  await valyu.deepresearch.update(
    taskId,
    "Focus more on peer-reviewed sources"
  );

  // Add another instruction
  await valyu.deepresearch.update(
    taskId,
    "Include a comparison table of major providers"
  );
  ```

  ```bash cURL theme={null}
  curl -X POST "https://api.valyu.ai/v1/deepresearch/tasks/{task_id}/update" \
    -H "Content-Type: application/json" \
    -H "x-api-key: YOUR_API_KEY" \
    -d '{
      "instruction": "Focus more on peer-reviewed sources"
    }'
  ```
</CodeGroup>

<Note>
  Submit instructions as early as possible during the research phase. Check task status to know when research has completed.
</Note>

### Cancel a Task

<CodeGroup>
  ```python Python theme={null}
  valyu.deepresearch.cancel(task_id)
  ```

  ```typescript TypeScript theme={null}
  await valyu.deepresearch.cancel(taskId);
  ```
</CodeGroup>

### Delete a Task

<CodeGroup>
  ```python Python theme={null}
  valyu.deepresearch.delete(task_id)
  ```

  ```typescript TypeScript theme={null}
  await valyu.deepresearch.delete(taskId);
  ```
</CodeGroup>

### List All Tasks

<CodeGroup>
  ```python Python theme={null}
  tasks = valyu.deepresearch.list(limit=50)

  for task in tasks.data:
      print(f"{task['query']} - {task['status']}")
  ```

  ```typescript TypeScript theme={null}
  const tasks = await valyu.deepresearch.list({ limit: 50 });

  tasks.data?.forEach(task => {
    console.log(`${task.query} - ${task.status}`);
  });
  ```
</CodeGroup>

## Webhooks

Webhooks provide real-time notifications when a DeepResearch task completes or fails, eliminating the need for polling.

```mermaid theme={null}
sequenceDiagram
    participant Client
    participant Valyu
    participant Webhook

    Client->>Valyu: POST /deepresearch/tasks (with webhook_url)
    Valyu-->>Client: task + webhook_secret
    Note over Client: Store secret securely
    Valyu->>Valyu: Research executes
    Valyu->>Webhook: POST with signature headers
    Webhook-->>Valyu: 200 OK
```

### When to Use Webhooks

| Approach | Best For                                    |
| -------- | ------------------------------------------- |
| Webhooks | Event-driven workflows                      |
| Polling  | Simple scripts, real-time progress tracking |

### Setting Up Webhooks

When you provide a `webhook_url`, the server generates a cryptographic secret for signature verification:

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(
      query="Research market trends",
      webhook_url="https://your-app.com/webhooks/deepresearch"
  )

  # IMPORTANT: Save the secret immediately - it's only returned once
  webhook_secret = task.webhook_secret
  print(f"Store this secret securely: {webhook_secret}")
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Research market trends",
    webhookUrl: "https://your-app.com/webhooks/deepresearch"
  });

  // IMPORTANT: Save the secret immediately - it's only returned once
  const webhookSecret = task.webhook_secret;
  console.log(`Store this secret securely: ${webhookSecret}`);
  ```
</CodeGroup>

<Warning>
  The `webhook_secret` is only returned in the initial task creation response. Store it securely in your database or secrets manager—you cannot retrieve it later.
</Warning>

<Note>
  Webhook URLs must use **HTTPS**. HTTP URLs are rejected for security.
</Note>

### Webhook Payload

When the task completes or fails, your endpoint receives a POST request with the full task data:

```json theme={null}
{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "mode": "standard",
  "query": "Research market trends",
  "output_formats": ["markdown"],
  "output": "# Market Trends Analysis\n\n## Overview...",
  "pdf_url": "https://storage.valyu.ai/reports/...",
  "sources": [
    {
      "title": "Market Analysis Report 2024",
      "url": "https://example.com/report",
      "snippet": "Key findings indicate...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "images": [],
  "cost": 0.50,
  "cost_breakdown": {
    "task": 0.50
  },
  "tools": {
    "code_execution": false,
    "screenshots": false
  },
  "error": null,
  "created_at": 1759617800000,
  "updated_at": 1759617836483,
  "completed_at": 1759617836483,
  "search_params": {},
  "current_step": 5,
  "total_steps": 5
}
```

### Request Headers

Each webhook request includes headers for verification:

| Header                | Description                                              |
| --------------------- | -------------------------------------------------------- |
| `X-Webhook-Signature` | HMAC-SHA256 signature in format `sha256=<hex_signature>` |
| `X-Webhook-Timestamp` | Unix timestamp (milliseconds) when the request was sent  |
| `Content-Type`        | `application/json`                                       |
| `User-Agent`          | `Valyu-DeepResearch/1.0`                                 |

### Verifying Webhook Signatures

Always verify the signature to ensure the webhook is authentic:

<CodeGroup>
  ```python Python theme={null}
  import hmac
  import hashlib

  def verify_webhook(
      payload_body: str,
      signature_header: str,
      timestamp_header: str,
      secret: str
  ) -> bool:
      """Verify the webhook signature is valid."""
      # Reconstruct the signed payload: timestamp.payload
      signed_payload = f"{timestamp_header}.{payload_body}"
      
      # Generate expected signature
      expected_signature = "sha256=" + hmac.new(
          secret.encode(),
          signed_payload.encode(),
          hashlib.sha256
      ).hexdigest()
      
      # Use constant-time comparison to prevent timing attacks
      return hmac.compare_digest(expected_signature, signature_header)


  # Example Flask endpoint
  from flask import Flask, request, jsonify

  app = Flask(__name__)
  WEBHOOK_SECRET = "your-stored-secret"  # Load from secure storage

  @app.route("/webhooks/deepresearch", methods=["POST"])
  def handle_webhook():
      signature = request.headers.get("X-Webhook-Signature")
      timestamp = request.headers.get("X-Webhook-Timestamp")
      payload = request.get_data(as_text=True)
      
      if not verify_webhook(payload, signature, timestamp, WEBHOOK_SECRET):
          return jsonify({"error": "Invalid signature"}), 401
      
      data = request.json
      
      if data["status"] == "completed":
          print(f"Research completed: {data['deepresearch_id']}")
          print(f"Output: {data['output'][:200]}...")
      elif data["status"] == "failed":
          print(f"Research failed: {data['error']}")
      
      return jsonify({"received": True}), 200
  ```

  ```typescript TypeScript theme={null}
  import crypto from "crypto";
  import express from "express";

  function verifyWebhook(
    payloadBody: string,
    signatureHeader: string,
    timestampHeader: string,
    secret: string
  ): boolean {
    // Reconstruct the signed payload: timestamp.payload
    const signedPayload = `${timestampHeader}.${payloadBody}`;
    
    // Generate expected signature
    const expectedSignature = "sha256=" + crypto
      .createHmac("sha256", secret)
      .update(signedPayload)
      .digest("hex");
    
    // Use constant-time comparison to prevent timing attacks
    return crypto.timingSafeEqual(
      Buffer.from(expectedSignature),
      Buffer.from(signatureHeader)
    );
  }

  // Example Express endpoint
  const app = express();
  const WEBHOOK_SECRET = "your-stored-secret"; // Load from secure storage

  app.post("/webhooks/deepresearch", express.raw({ type: "application/json" }), (req, res) => {
    const signature = req.headers["x-webhook-signature"] as string;
    const timestamp = req.headers["x-webhook-timestamp"] as string;
    const payload = req.body.toString();
    
    if (!verifyWebhook(payload, signature, timestamp, WEBHOOK_SECRET)) {
      return res.status(401).json({ error: "Invalid signature" });
    }
    
    const data = JSON.parse(payload);
    
    if (data.status === "completed") {
      console.log(`Research completed: ${data.deepresearch_id}`);
      console.log(`Output: ${data.output.substring(0, 200)}...`);
    } else if (data.status === "failed") {
      console.log(`Research failed: ${data.error}`);
    }
    
    return res.status(200).json({ received: true });
  });
  ```
</CodeGroup>

### Retry Behavior

The webhook service automatically retries failed deliveries:

| Property            | Value                                |
| ------------------- | ------------------------------------ |
| Maximum retries     | 5 attempts                           |
| Timeout per request | 15 seconds                           |
| Backoff strategy    | Exponential: 1s → 2s → 4s → 8s → 16s |
| 4xx errors          | No retry (client error)              |
| 5xx errors          | Will retry (server error)            |

<Note>
  Return a `2xx` status code quickly to acknowledge receipt. Process the webhook payload asynchronously to avoid timeouts.
</Note>

### Webhook Events

Webhooks are triggered for:

| Event       | When                           |
| ----------- | ------------------------------ |
| `completed` | Research finished successfully |
| `failed`    | Research encountered an error  |

<Note>
  Webhooks are **not** sent for `cancelled` tasks. If you need to track cancellations, use the status endpoint or list endpoint to check task states.
</Note>

### Complete Webhook Flow

```
1. POST /deepresearch with webhook_url
   ↓
2. API returns task with webhook_secret (store this!)
   ↓
3. Research executes asynchronously
   ↓
4. On completion/failure → POST to your webhook_url
   • Headers: X-Webhook-Signature, X-Webhook-Timestamp
   • Body: Full task payload (output, sources, usage, etc.)
   ↓
5. Your server verifies signature and processes result
   ↓
6. Return 2xx to acknowledge receipt
```

## Progress Callbacks

Track progress in real-time:

<CodeGroup>
  ```python Python theme={null}
  def on_progress(status):
      if status.progress:
          print(f"Step {status.progress.current_step}/{status.progress.total_steps}")
      print(f"Status: {status.status}")

  result = valyu.deepresearch.wait(
      task_id,
      poll_interval=5,
      on_progress=on_progress
  )
  ```

  ```typescript TypeScript theme={null}
  const result = await valyu.deepresearch.wait(taskId, {
    pollInterval: 5000,
    onProgress: (status) => {
      if (status.progress) {
        console.log(`Step ${status.progress.current_step}/${status.progress.total_steps}`);
      }
      console.log(`Status: ${status.status}`);
    }
  });
  ```
</CodeGroup>

## Response Structure

### Completed Task

```json theme={null}
{
  "deepresearch_id": "f992a8ab-4c91-4322-905f-190107bd5a5b",
  "status": "completed",
  "query": "What are the key differences between RAG and fine-tuning?",
  "mode": "standard",
  "output_type": "markdown",
  "output": "# RAG vs Fine-tuning\n\n## Overview...",
  "sources": [
    {
      "title": "Understanding RAG Systems",
      "url": "https://example.com/rag-guide",
      "snippet": "Retrieval-Augmented Generation combines...",
      "source": "web",
      "word_count": 2500
    }
  ],
  "cost": 0.50,
  "completed_at": 1759617836483,
  "pdf_url": "https://s3.amazonaws.com/..."
}
```

## Error Handling

<CodeGroup>
  ```python Python theme={null}
  task = valyu.deepresearch.create(query="Research query")

  if not task.success:
      print(f"Failed to create task: {task.error}")
  else:
      try:
          result = valyu.deepresearch.wait(task.deepresearch_id)
      except TimeoutError:
          print("Task took too long")
          valyu.deepresearch.cancel(task.deepresearch_id)
      except ValueError as e:
          print(f"Task failed: {e}")
  ```

  ```typescript TypeScript theme={null}
  const task = await valyu.deepresearch.create({
    query: "Research query"
  });

  if (!task.success) {
    console.error(`Failed to create task: ${task.error}`);
  } else {
    try {
      const result = await valyu.deepresearch.wait(task.deepresearch_id!, {
        maxWaitTime: 1800000  // 30 minutes
      });
      
      if (result.status === "completed") {
        console.log(result.output);
      } else if (result.status === "failed") {
        console.error(`Task failed: ${result.error}`);
      }
    } catch (error: any) {
      if (error.message.includes("Maximum wait time")) {
        console.log("Task took too long");
        await valyu.deepresearch.cancel(task.deepresearch_id!);
      } else {
        console.error(`Task error: ${error.message}`);
      }
    }
  }
  ```
</CodeGroup>

## Human-in-the-loop

DeepResearch supports optional checkpoints that pause execution for human review. Enable planning questions, plan review, source filtering, or outline review to guide the research process.

<Card title="HITL Guide" icon="hand" href="/guides/deepresearch-hitl">
  Learn how to configure and respond to HITL checkpoints
</Card>

## Best Practices

### Polling strategy

* **Fast mode**: Poll every 2-5 seconds, timeout after 10 minutes
* **Standard mode**: Poll every 5-10 seconds, timeout after 30 minutes
* **Heavy mode**: Poll every 10-30 seconds, timeout after 120 minutes
* **Max mode**: Poll every 30-60 seconds, timeout after 180 minutes
* Use webhooks for production to avoid polling overhead

### Recommended timeouts

| Mode       | Recommended Timeout         |
| ---------- | --------------------------- |
| `fast`     | 10 minutes (600 seconds)    |
| `standard` | 30 minutes (1800 seconds)   |
| `heavy`    | 120 minutes (7200 seconds)  |
| `max`      | 180 minutes (10800 seconds) |

### Cost optimization

* Use `fast` mode for quick lookups and simple questions
* Use `standard` mode for moderate research needs
* Use `heavy` mode for complex topics requiring fact verification
* Use `max` mode only for exhaustive research requiring maximum quality
* Filter sources with `search` config to focus on relevant content
* Set `start_date` and `end_date` to limit scope

### Research quality

* Provide clear, specific research queries
* Use `research_strategy` to guide the research approach — tell the agent what sources, methodology, and angles to prioritise
* Use `report_format` to control the output — specify structure, length, style, and any tables or sections you want
* Add `context` to file attachments

## Limitations

| Limit                                          | Value                      |
| ---------------------------------------------- | -------------------------- |
| Query length                                   | 25,000 characters          |
| `research_strategy` + `report_format` combined | 15,000 characters          |
| File attachment `context`                      | 10,000 characters per file |
| Maximum files per request                      | 10                         |
| Maximum URLs per request                       | 10                         |
| Maximum MCP servers                            | 5                          |
| Maximum previous reports for context           | 3                          |
| Recommended file size                          | \< 10MB per file           |

## Next Steps

<CardGroup cols={2}>
  <Card title="Batch Processing" icon="layer-group" href="/guides/deepresearch-batching">
    Process multiple research tasks efficiently
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/endpoint/deepresearch-create">
    Complete endpoint documentation
  </Card>

  <Card title="Python SDK" icon="python" href="/sdk/python-sdk/deepresearch">
    Python SDK reference
  </Card>

  <Card title="TypeScript SDK" icon="js" href="/sdk/typescript-sdk/deepresearch">
    TypeScript SDK reference
  </Card>
</CardGroup>
