Skip to main content
Valyu is a search API. We have state-of-the-art web search, plus 36+ integrated data sources spanning research, finance, healthcare, and more. You can either:
  1. Let us find it - Call /search and we’ll find the most relevant information across all sources
  2. Filter by source - Use included_sources or excluded_sources to target specific datasets
The Datasources API tells AI agents what’s available—a tool manifest for dynamic discovery.

Why This Exists

Modern AI agents face a scaling problem: loading 50+ tool definitions into context consumes 10-20K tokens and degrades selection accuracy. Claude’s Tool Search solves this with deferred loading—tools are discovered at runtime, not loaded upfront. This API is built for that paradigm. Instead of hardcoding knowledge of available datasets, agents can:
  1. Query /datasources/categories to understand the landscape
  2. Filter to relevant category
  3. Get full schemas only for datasources they need
  4. Use example_queries for few-shot prompting

Quick Start

import requests

response = requests.get(
    "https://api.valyu.ai/v1/datasources",
    headers={"x-api-key": "YOUR_API_KEY"}
)
datasources = response.json()["datasources"]

# Each datasource includes:
# - id, name, description
# - category, topics, modality
# - example_queries (for few-shot prompting)
# - pricing.cpm (cost per million tokens)
# - response_schema (JSON schema for parsing)

Filter by Category

# Get only financial market datasources
response = requests.get(
    "https://api.valyu.ai/v1/datasources",
    params={"category": "markets"},
    headers={"x-api-key": "YOUR_API_KEY"}
)
Available categories:
CategoryDescriptionExample Sources
researchAcademic papersarXiv, PubMed, bioRxiv
healthcareMedical dataClinical trials, drug labels, WHO
marketsFinancial dataStocks, crypto, forex, ETFs
companyCorporate dataSEC filings, earnings, insider trades
economicGovernment statsFRED, BLS, World Bank
predictionsPrediction marketsPolymarket, Kalshi
transportationTransit dataUK Rail, ship tracking
legalCase lawUK legislation, court cases
politicsParliamentary dataUK Parliament
patentsIP filingsGlobal patents

List Categories

Get all categories with dataset counts:
response = requests.get(
    "https://api.valyu.ai/v1/datasources/categories",
    headers={"x-api-key": "YOUR_API_KEY"}
)
categories = response.json()["categories"]
# [{"id": "research", "name": "Research & Academic", "dataset_count": 4}, ...]

What’s in a Datasource

Each datasource includes everything an agent needs:
{
  "id": "valyu/valyu-arxiv",
  "name": "Arxiv",
  "description": "Over 1M pre-print research papers from physics, CS, math, and more",
  "category": "research",
  "type": "paper",
  "modality": ["text", "images"],
  "topics": ["Research Papers", "Computer Science", "Physics"],
  "example_queries": [
    "What are the latest advancements in self-healing materials?",
    "How have ML models improved financial risk assessment?"
  ],
  "pricing": {
    "cpm": 0.5
  },
  "response_schema": {
    "id": {"type": "string"},
    "title": {"type": "string"},
    "content": {"type": "string"},
    "authors": {"type": "array", "item_type": "string"},
    "publication_date": {"type": "string"}
  },
  "update_frequency": "Monthly",
  "size": 1000000
}
Once you know which datasources you want, use them with the Search API:
from valyu import Valyu

valyu = Valyu()

# Search only arxiv and pubmed
results = valyu.search(
    query="latest transformer architecture improvements",
    included_sources=["valyu/valyu-arxiv", "valyu/valyu-pubmed"]
)

For AI Agent Developers

If you’re building agents that use Valyu as a tool:
  1. Don’t hardcode datasources - Query this API to discover what’s available
  2. Use example_queries - They’re optimized for few-shot prompting
  3. Check response_schema - Know exactly what fields to expect
  4. Estimate costs with pricing.cpm - Budget before making requests
This turns Valyu from “an API you need to know” into “an API that teaches itself to your agent.”