Skip to main content
Search arXiv’s 2.5M+ preprints through Valyu’s unified API. Get full-text semantic search across physics, computer science, mathematics, quantitative finance, and economics.

Dataset Overview

PropertyValue
Source IDvalyu/valyu-arxiv
Size2.5M+ papers
CoveragePhysics, CS, mathematics, quantitative finance, economics
UpdatesMonthly
Data TypeUnstructured (full-text)

What You Get

  • Full-text search - Search across entire papers, not just abstracts
  • Author metadata - Author names and affiliations
  • Citation data - DOIs and arXiv IDs for proper attribution
  • Category filtering - Filter by arXiv categories (cs.AI, quant-ph, etc.)
  • Semantic ranking - Results ranked by relevance to your query

Quick Start

from valyu import Valyu

valyu = Valyu()

response = valyu.search(
    "transformer architecture attention mechanism improvements",
    search_type="proprietary",
    included_sources=["valyu/valyu-arxiv"],
    max_num_results=10
)

for result in response.results:
    print(f"Title: {result.title}")
    print(f"Authors: {', '.join(result.authors) if result.authors else 'N/A'}")
    print(f"URL: {result.url}")
    print(f"Content: {result.content[:300]}...")

Use Cases

  • ML/AI research - Find latest papers on models, architectures, and techniques
  • Physics research - Access preprints across all physics subdisciplines
  • Mathematics - Search pure and applied mathematics literature
  • Quantitative finance - Find papers on pricing, risk, and market microstructure
  • AI training data - Build research datasets for AI applications

Key Research Areas

CategoryExamples
cs.AIArtificial intelligence, reasoning, knowledge representation
cs.LGMachine learning, deep learning, neural networks
cs.CLNatural language processing, computational linguistics
cs.CVComputer vision, image processing
quant-phQuantum computing, quantum information
stat.MLStatistical machine learning

Combine with Other Sources

Combine arXiv with other academic sources for comprehensive research:
response = valyu.search(
    "large language model scaling laws",
    search_type="proprietary",
    included_sources=[
        "valyu/valyu-arxiv",
        "valyu/valyu-pubmed",
        "valyu/valyu-biorxiv"
    ],
    max_num_results=20
)