> ## Documentation Index
> Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasources API

> Discover available data sources and categories with the Valyu Python SDK

The Datasources API provides a tool manifest for AI agents to discover available data sources. Use it to understand what datasources are available before making search requests with `included_sources` or `excluded_sources`.

## Basic Usage

```python theme={null}
from valyu import Valyu

valyu = Valyu()

# List all available datasources
response = valyu.datasources()

print(f"Found {len(response.datasources)} datasources")
for ds in response.datasources:
    print(f"{ds.id}: {ds.name} ({ds.category})")
```

## Methods

### datasources()

List all available datasources with optional category filtering.

```python theme={null}
response = valyu.datasources(category="research")
```

#### Parameters

| Parameter  | Type        | Description                               | Default |
| ---------- | ----------- | ----------------------------------------- | ------- |
| `category` | str \| None | Filter by category (see categories below) | None    |

#### Available Categories

| Category         | Description                              |
| ---------------- | ---------------------------------------- |
| `research`       | Academic papers (arXiv, PubMed, bioRxiv) |
| `healthcare`     | Clinical trials, drug info, health data  |
| `markets`        | Stocks, crypto, forex, ETFs              |
| `company`        | SEC filings, earnings, insider trades    |
| `economic`       | FRED, BLS, World Bank data               |
| `predictions`    | Polymarket, Kalshi                       |
| `transportation` | UK Rail, ship tracking                   |
| `legal`          | Case law, legislation                    |
| `politics`       | Parliamentary data                       |
| `patents`        | Global patent filings                    |

### datasources\_categories()

List all available categories with dataset counts.

```python theme={null}
response = valyu.datasources_categories()

for cat in response.categories:
    print(f"{cat.id}: {cat.name} ({cat.dataset_count} datasets)")
```

## Response Format

### DatasourcesResponse

```python theme={null}
class DatasourcesResponse:
    success: bool
    error: Optional[str]
    datasources: List[Datasource]

class Datasource:
    id: str                              # e.g., "valyu/valyu-arxiv"
    name: str                            # e.g., "Arxiv"
    description: str                     # Full description
    category: str                        # e.g., "research"
    type: Optional[str]                  # e.g., "paper", "dataset"
    modality: Optional[List[str]]        # e.g., ["text", "images"]
    topics: Optional[List[str]]          # e.g., ["Research Papers", "Physics"]
    languages: Optional[List[str]]       # e.g., ["English"]
    source: Optional[str]                # Data provider
    example_queries: Optional[List[str]] # Sample queries for few-shot prompting
    pricing: Optional[DatasourcePricing] # Cost information
    response_schema: Optional[dict]      # JSON schema for responses
    update_frequency: Optional[str]      # e.g., "Monthly", "Quarterly"
    size: Optional[int]                  # Number of records
    coverage: Optional[DatasourceCoverage] # Date range coverage

class DatasourcePricing:
    cpm: float  # Cost per million tokens

class DatasourceCoverage:
    start_date: Optional[str]
    end_date: Optional[str]
```

### DatasourceCategoriesResponse

```python theme={null}
class DatasourceCategoriesResponse:
    success: bool
    error: Optional[str]
    categories: List[DatasourceCategory]

class DatasourceCategory:
    id: str              # e.g., "research"
    name: str            # e.g., "Research & Academic"
    description: Optional[str]
    dataset_count: int   # Number of datasources in category
```

## Use Case Examples

### Dynamic Source Discovery for AI Agents

Build agents that discover relevant datasources at runtime:

```python theme={null}
def find_relevant_sources(query_domain: str) -> List[str]:
    """Find datasources relevant to a query domain."""
    valyu = Valyu()

    # Map query domains to categories
    domain_to_category = {
        "academic": "research",
        "medical": "healthcare",
        "financial": "markets",
        "corporate": "company",
        "economic": "economic",
    }

    category = domain_to_category.get(query_domain)
    response = valyu.datasources(category=category)

    if response.success:
        return [ds.id for ds in response.datasources]
    return []

# Use discovered sources in search
sources = find_relevant_sources("academic")
search_response = valyu.search(
    "transformer architecture improvements",
    included_sources=sources
)
```

### Few-Shot Prompting with Example Queries

Use `example_queries` from datasources to improve search quality:

```python theme={null}
def get_example_queries(category: str) -> List[str]:
    """Get example queries for a category to use in few-shot prompting."""
    valyu = Valyu()
    response = valyu.datasources(category=category)

    examples = []
    if response.success:
        for ds in response.datasources:
            if ds.example_queries:
                examples.extend(ds.example_queries[:2])
    return examples

# Get examples for research queries
research_examples = get_example_queries("research")
print("Example research queries:")
for example in research_examples:
    print(f"  - {example}")
```

### Cost Estimation

Estimate costs before making search requests:

```python theme={null}
def estimate_search_cost(category: str) -> dict:
    """Estimate costs for searching a category."""
    valyu = Valyu()
    response = valyu.datasources(category=category)

    if not response.success:
        return {"error": response.error}

    costs = []
    for ds in response.datasources:
        if ds.pricing:
            costs.append({
                "source": ds.id,
                "cpm": ds.pricing.cpm,
                "name": ds.name
            })

    avg_cpm = sum(c["cpm"] for c in costs) / len(costs) if costs else 0

    return {
        "sources": len(costs),
        "average_cpm": avg_cpm,
        "min_cpm": min(c["cpm"] for c in costs) if costs else 0,
        "max_cpm": max(c["cpm"] for c in costs) if costs else 0,
        "details": costs
    }

# Check costs for financial data
costs = estimate_search_cost("markets")
print(f"Average CPM for markets: ${costs['average_cpm']:.2f}")
```

### List All Sources by Category

Get a complete overview of available data:

```python theme={null}
def list_all_sources():
    """List all datasources organized by category."""
    valyu = Valyu()

    # Get categories first
    categories = valyu.datasources_categories()

    if not categories.success:
        print(f"Error: {categories.error}")
        return

    for cat in categories.categories:
        print(f"\n{cat.name} ({cat.dataset_count} sources)")
        print("-" * 40)

        # Get datasources for this category
        sources = valyu.datasources(category=cat.id)
        if sources.success:
            for ds in sources.datasources:
                pricing = f"${ds.pricing.cpm:.1f} CPM" if ds.pricing else "N/A"
                print(f"  {ds.id}: {ds.name} [{pricing}]")

list_all_sources()
```

## Error Handling

```python theme={null}
response = valyu.datasources(category="research")

if not response.success:
    print(f"Error fetching datasources: {response.error}")
else:
    print(f"Found {len(response.datasources)} research datasources")
    for ds in response.datasources:
        print(f"  - {ds.id}: {ds.name}")
```

## Using with Search API

Once you've discovered relevant datasources, use them with the Search API:

```python theme={null}
# Discover research datasources
datasources = valyu.datasources(category="research")
research_sources = [ds.id for ds in datasources.datasources]

# Use them in a search
results = valyu.search(
    "latest transformer architecture improvements",
    included_sources=research_sources,
    max_num_results=10
)
```

<Tip>
  For more information on filtering by sources, see the [Source Filtering Guide](/search/filtering/sources).
</Tip>
