Datasources API

Valyu is a search API. We have state-of-the-art web search, plus 36+ integrated data sources spanning research, finance, healthcare, and more. You can either:

Let us find it - Call /search and we’ll find the most relevant information across all sources
Filter by source - Use included_sources or excluded_sources to target specific datasets

The Datasources API tells AI agents what’s available—a tool manifest for dynamic discovery.

Why This Exists

Modern AI agents face a scaling problem: loading 50+ tool definitions into context consumes 10-20K tokens and degrades selection accuracy. Claude’s Tool Search solves this with deferred loading—tools are discovered at runtime, not loaded upfront. This API is built for that paradigm. Instead of hardcoding knowledge of available datasets, agents can:

Query /datasources/categories to understand the landscape
Filter to relevant category
Get full schemas only for datasources they need
Use example_queries for few-shot prompting

Quick Start

import requests

response = requests.get(
    "https://api.valyu.ai/v1/datasources",
    headers={"x-api-key": "YOUR_API_KEY"}
)
datasources = response.json()["datasources"]

# Each datasource includes:
# - id, name, description
# - category, topics, modality
# - example_queries (for few-shot prompting)
# - pricing.cpm (cost per million tokens)
# - response_schema (JSON schema for parsing)

Filter by Category

# Get only financial market datasources
response = requests.get(
    "https://api.valyu.ai/v1/datasources",
    params={"category": "markets"},
    headers={"x-api-key": "YOUR_API_KEY"}
)

Available categories:

Category	Description	Example Sources
`research`	Academic papers	arXiv, PubMed, bioRxiv
`healthcare`	Medical data	Clinical trials, drug labels, WHO
`markets`	Financial data	Stocks, crypto, forex, ETFs
`company`	Corporate data	SEC filings, earnings, insider trades
`economic`	Government stats	FRED, BLS, World Bank
`predictions`	Prediction markets	Polymarket, Kalshi
`transportation`	Transit data	UK Rail, ship tracking
`legal`	Case law	UK legislation, court cases
`politics`	Parliamentary data	UK Parliament
`patents`	IP filings	Global patents

Get all categories with dataset counts:

response = requests.get(
    "https://api.valyu.ai/v1/datasources/categories",
    headers={"x-api-key": "YOUR_API_KEY"}
)
categories = response.json()["categories"]
# [{"id": "research", "name": "Research & Academic", "dataset_count": 4}, ...]

What’s in a Datasource

Each datasource includes everything an agent needs:

{
  "id": "valyu/valyu-arxiv",
  "name": "Arxiv",
  "description": "Over 1M pre-print research papers from physics, CS, math, and more",
  "category": "research",
  "type": "paper",
  "modality": ["text", "images"],
  "topics": ["Research Papers", "Computer Science", "Physics"],
  "example_queries": [
    "What are the latest advancements in self-healing materials?",
    "How have ML models improved financial risk assessment?"
  ],
  "pricing": {
    "cpm": 0.5
  },
  "response_schema": {
    "id": {"type": "string"},
    "title": {"type": "string"},
    "content": {"type": "string"},
    "authors": {"type": "array", "item_type": "string"},
    "publication_date": {"type": "string"}
  },
  "update_frequency": "Monthly",
  "size": 1000000
}

Using with Search

Once you know which datasources you want, use them with the Search API:

from valyu import Valyu

valyu = Valyu()

# Search only arxiv and pubmed
results = valyu.search(
    query="latest transformer architecture improvements",
    included_sources=["valyu/valyu-arxiv", "valyu/valyu-pubmed"]
)

For AI Agent Developers

If you’re building agents that use Valyu as a tool:

Don’t hardcode datasources - Query this API to discover what’s available
Use example_queries - They’re optimized for few-shot prompting
Check response_schema - Know exactly what fields to expect
Estimate costs with pricing.cpm - Budget before making requests

This turns Valyu from “an API you need to know” into “an API that teaches itself to your agent.”

Search API

Use discovered datasources with Search

API Reference

Full endpoint documentation

Getting Started

Guides & Best Practices

AI SDK Tooling Guides

Use Cases

Core Concepts

Data Sources

Compare

Important Updates

Account & Pricing

Other

Datasources API

Why This Exists

Quick Start

Filter by Category

What’s in a Datasource

Using with Search

For AI Agent Developers

Search API

API Reference

Getting Started

Guides & Best Practices

AI SDK Tooling Guides

Use Cases

Core Concepts

Data Sources

Compare

Important Updates

Account & Pricing

Other

​Why This Exists

​Quick Start

​Filter by Category

​List Categories

​What’s in a Datasource

​Using with Search

​For AI Agent Developers

Search API

API Reference

Why This Exists

Quick Start

Filter by Category

List Categories

What’s in a Datasource

Using with Search

For AI Agent Developers