> ## Documentation Index
> Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasources API

> A tool manifest for AI agents to discover available data sources

Valyu is a search API. We have **state-of-the-art web search**, plus 36+ integrated data sources spanning research, finance, healthcare, and more. You can either:

1. **Let us find it** - Call `/search` and we'll find the most relevant information across all sources
2. **Filter by source** - Use `included_sources` or `excluded_sources` to target specific datasets, or `source_biases` to soft-rank results toward preferred domains

The **Datasources API** tells AI agents what's available—a **tool manifest** for dynamic discovery.

## Why This Exists

Modern AI agents face a scaling problem: loading 50+ tool definitions into context consumes 10-20K tokens and degrades selection accuracy. Claude's [Tool Search](https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/tool-search-tool) solves this with deferred loading—tools are discovered at runtime, not loaded upfront.

**This API is built for that paradigm.** Instead of hardcoding knowledge of available datasets, agents can:

1. Query `/datasources/categories` to understand the landscape
2. Filter to relevant category
3. Get full schemas only for datasources they need
4. Use `example_queries` for few-shot prompting

## Quick Start

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.get(
      "https://api.valyu.ai/v1/datasources",
      headers={"x-api-key": "YOUR_API_KEY"}
  )
  datasources = response.json()["datasources"]

  # Each datasource includes:
  # - id, name, description
  # - category, topics, modality
  # - example_queries (for few-shot prompting)
  # - pricing.cpm (cost per million tokens)
  # - response_schema (JSON schema for parsing)
  ```

  ```javascript JavaScript theme={null}
  const response = await fetch("https://api.valyu.ai/v1/datasources", {
    headers: { "x-api-key": "YOUR_API_KEY" }
  });
  const { datasources } = await response.json();
  ```

  ```bash cURL theme={null}
  curl https://api.valyu.ai/v1/datasources \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

## Filter by Category

<CodeGroup>
  ```python Python theme={null}
  # Get only financial market datasources
  response = requests.get(
      "https://api.valyu.ai/v1/datasources",
      params={"category": "markets"},
      headers={"x-api-key": "YOUR_API_KEY"}
  )
  ```

  ```bash cURL theme={null}
  curl "https://api.valyu.ai/v1/datasources?category=markets" \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

**Available categories:**

| Category         | Description        | Example Sources                       |
| ---------------- | ------------------ | ------------------------------------- |
| `research`       | Academic papers    | arXiv, PubMed, bioRxiv                |
| `healthcare`     | Medical data       | Clinical trials, drug labels, WHO     |
| `markets`        | Financial data     | Stocks, crypto, forex, ETFs           |
| `company`        | Corporate data     | SEC filings, earnings, insider trades |
| `economic`       | Government stats   | FRED, BLS, World Bank                 |
| `predictions`    | Prediction markets | Polymarket, Kalshi                    |
| `transportation` | Transit data       | UK Rail, ship tracking                |
| `legal`          | Case law           | UK legislation, court cases           |
| `politics`       | Parliamentary data | UK Parliament                         |
| `patents`        | IP filings         | Global patents                        |

## List Categories

Get all categories with dataset counts:

<CodeGroup>
  ```python Python theme={null}
  response = requests.get(
      "https://api.valyu.ai/v1/datasources/categories",
      headers={"x-api-key": "YOUR_API_KEY"}
  )
  categories = response.json()["categories"]
  # [{"id": "research", "name": "Research & Academic", "dataset_count": 4}, ...]
  ```

  ```bash cURL theme={null}
  curl https://api.valyu.ai/v1/datasources/categories \
    -H "x-api-key: YOUR_API_KEY"
  ```
</CodeGroup>

## What's in a Datasource

Each datasource includes everything an agent needs:

```json theme={null}
{
  "id": "valyu/valyu-arxiv",
  "name": "Arxiv",
  "description": "Over 1M pre-print research papers from physics, CS, math, and more",
  "category": "research",
  "type": "paper",
  "modality": ["text", "images"],
  "topics": ["Research Papers", "Computer Science", "Physics"],
  "example_queries": [
    "What are the latest advancements in self-healing materials?",
    "How have ML models improved financial risk assessment?"
  ],
  "pricing": {
    "cpm": 0.5
  },
  "response_schema": {
    "id": {"type": "string"},
    "title": {"type": "string"},
    "content": {"type": "string"},
    "authors": {"type": "array", "item_type": "string"},
    "publication_date": {"type": "string"}
  },
  "update_frequency": "Monthly",
  "size": 1000000
}
```

## Using with Search

Once you know which datasources you want, use them with the Search API:

<CodeGroup>
  ```python Python theme={null}
  from valyu import Valyu

  valyu = Valyu()

  # Search only arxiv and pubmed
  results = valyu.search(
      query="latest transformer architecture improvements",
      included_sources=["valyu/valyu-arxiv", "valyu/valyu-pubmed"]
  )
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.valyu.ai/v1/search \
    -H "x-api-key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "query": "latest transformer architecture improvements",
      "included_sources": ["valyu/valyu-arxiv", "valyu/valyu-pubmed"]
    }'
  ```
</CodeGroup>

## For AI Agent Developers

If you're building agents that use Valyu as a tool:

1. **Don't hardcode datasources** - Query this API to discover what's available
2. **Use `example_queries`** - They're optimized for few-shot prompting
3. **Check `response_schema`** - Know exactly what fields to expect
4. **Estimate costs with `pricing.cpm`** - Budget before making requests

This turns Valyu from "an API you need to know" into "an API that teaches itself to your agent."

<CardGroup cols={2}>
  <Card title="Search API" icon="magnifying-glass" href="/api-reference/endpoint/search">
    Use discovered datasources with Search
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/endpoint/datasources-list">
    Full endpoint documentation
  </Card>
</CardGroup>
