> ## Documentation Index
> Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# arXiv API for AI Agents - Search 2.5M+ Research Papers

> Access arXiv's 2.5M+ preprints through one API. Full-text search across physics, CS, mathematics, and quantitative finance for AI agents and research applications.

Search arXiv's 2.5M+ preprints through Valyu's unified API. Get full-text semantic search across physics, computer science, mathematics, quantitative finance, and economics.

## Dataset Overview

| Property      | Value                                                     |
| ------------- | --------------------------------------------------------- |
| **Source ID** | `valyu/valyu-arxiv`                                       |
| **Size**      | 2.5M+ papers                                              |
| **Coverage**  | Physics, CS, mathematics, quantitative finance, economics |
| **Updates**   | Monthly                                                   |
| **Data Type** | Unstructured (full-text)                                  |

## What You Get

* **Full-text search** - Search across entire papers, not just abstracts
* **Author metadata** - Author names and affiliations
* **Citation data** - DOIs and arXiv IDs for proper attribution
* **Category filtering** - Filter by arXiv categories (cs.AI, quant-ph, etc.)
* **Semantic ranking** - Results ranked by relevance to your query

## Quick Start

<CodeGroup>
  ```python Python theme={null}
  from valyu import Valyu

  valyu = Valyu()

  response = valyu.search(
      "transformer architecture attention mechanism improvements",
      search_type="proprietary",
      included_sources=["valyu/valyu-arxiv"],
      max_num_results=10
  )

  for result in response.results:
      print(f"Title: {result.title}")
      print(f"Authors: {', '.join(result.authors) if result.authors else 'N/A'}")
      print(f"URL: {result.url}")
      print(f"Content: {result.content[:300]}...")
  ```

  ```javascript JavaScript theme={null}
  import { Valyu } from "valyu-js";

  const valyu = new Valyu();

  const response = await valyu.search(
    "transformer architecture attention mechanism improvements",
    {
      searchType: "proprietary",
      includedSources: ["valyu/valyu-arxiv"],
      maxNumResults: 10
    }
  );

  response.results.forEach((result) => {
    console.log(`Title: ${result.title}`);
    console.log(`Authors: ${result.authors?.join(", ") || "N/A"}`);
    console.log(`URL: ${result.url}`);
    console.log(`Content: ${result.content.slice(0, 300)}...`);
  });
  ```
</CodeGroup>

## Use Cases

* **ML/AI research** - Find latest papers on models, architectures, and techniques
* **Physics research** - Access preprints across all physics subdisciplines
* **Mathematics** - Search pure and applied mathematics literature
* **Quantitative finance** - Find papers on pricing, risk, and market microstructure
* **AI training data** - Build research datasets for AI applications

## Key Research Areas

| Category     | Examples                                                     |
| ------------ | ------------------------------------------------------------ |
| **cs.AI**    | Artificial intelligence, reasoning, knowledge representation |
| **cs.LG**    | Machine learning, deep learning, neural networks             |
| **cs.CL**    | Natural language processing, computational linguistics       |
| **cs.CV**    | Computer vision, image processing                            |
| **quant-ph** | Quantum computing, quantum information                       |
| **stat.ML**  | Statistical machine learning                                 |

## Combine with Other Sources

Combine arXiv with other academic sources for comprehensive research:

```python theme={null}
response = valyu.search(
    "large language model scaling laws",
    search_type="proprietary",
    included_sources=[
        "valyu/valyu-arxiv",
        "valyu/valyu-pubmed",
        "valyu/valyu-biorxiv"
    ],
    max_num_results=20
)
```

## Related Data Sources

<CardGroup cols={2}>
  <Card title="PubMed" icon="book-medical" href="/sources/pubmed">
    37M+ biomedical papers
  </Card>

  <Card title="Patents" icon="scroll" href="/sources/patents">
    8M+ US patent filings
  </Card>
</CardGroup>
