Dataset Overview
| Property | Value |
|---|---|
| Source ID | valyu/valyu-arxiv |
| Size | 2.5M+ papers |
| Coverage | Physics, CS, mathematics, quantitative finance, economics |
| Updates | Monthly |
| Data Type | Unstructured (full-text) |
What You Get
- Full-text search - Search across entire papers, not just abstracts
- Author metadata - Author names and affiliations
- Citation data - DOIs and arXiv IDs for proper attribution
- Category filtering - Filter by arXiv categories (cs.AI, quant-ph, etc.)
- Semantic ranking - Results ranked by relevance to your query
Quick Start
Use Cases
- ML/AI research - Find latest papers on models, architectures, and techniques
- Physics research - Access preprints across all physics subdisciplines
- Mathematics - Search pure and applied mathematics literature
- Quantitative finance - Find papers on pricing, risk, and market microstructure
- AI training data - Build research datasets for AI applications
Key Research Areas
| Category | Examples |
|---|---|
| cs.AI | Artificial intelligence, reasoning, knowledge representation |
| cs.LG | Machine learning, deep learning, neural networks |
| cs.CL | Natural language processing, computational linguistics |
| cs.CV | Computer vision, image processing |
| quant-ph | Quantum computing, quantum information |
| stat.ML | Statistical machine learning |

