Skip to main content

Overview

Valyu integrates seamlessly with LlamaIndex as a comprehensive tool spec, allowing you to enhance your AI agents and RAG applications with real-time web search and proprietary data sources. The integration provides LLM-ready context from multiple sources including web pages, academic journals, financial data, and more. The package includes two main functions:
  • search(): Deep search operations with comprehensive parameter control
  • get_contents(): Extract clean content from specific URLs

Installation

Install the official LlamaIndex Valyu package:
pip install llama-index-tools-valyu
Configure your credentials by setting the following environment variable:
export VALYU_API_KEY="your-api-key-here"

Free Credits

Get your API key with $10 credit from the Valyu Platform.

Basic Usage

import os
from llama_index.tools.valyu import ValyuToolSpec

# Set your API key
os.environ["VALYU_API_KEY"] = "your-api-key-here"

# Initialize the tool with comprehensive configuration
valyu_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    verbose=True,
    # Search API parameters
    max_price=100,  # Default maximum cost
    relevance_threshold=0.5,  # Minimum relevance score
    fast_mode=False,  # Quality vs speed trade-off
    included_sources=None,  # Optional source filtering
    excluded_sources=None,  # Optional source exclusion
    response_length=None,  # Content length control
    country_code=None,  # Geographic bias
)

# Perform a search
search_results = valyu_tool.search(
    query="What are agentic search-enhanced large reasoning models?",
    search_type="all",  # "all", "web", or "proprietary"
    max_num_results=5,
    start_date=None,  # Optional time filtering
    end_date=None,
    fast_mode=None  # Uses tool default if None
)

print("Search Results:")
for doc in search_results:
    print(f"Title: {doc.metadata['title']}")
    print(f"Content: {doc.text[:200]}...")
    print(f"Source: {doc.metadata['url']}")
    print(f"Relevance: {doc.metadata['relevance_score']}")
    print("---")

Using ValyuToolSpec for Content Extraction

import os
from llama_index.tools.valyu import ValyuToolSpec

# Initialize tool with content extraction configuration
valyu_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    verbose=True,
    contents_summary=True,  # Enable AI summarization
    contents_extract_effort="high",  # Thorough extraction
    contents_response_length="medium",  # More detailed content
)

# Extract content from URLs
urls = [
    "https://arxiv.org/abs/1706.03762",  # Attention is All You Need paper
    "https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)"
]

content_results = valyu_tool.get_contents(urls=urls)

print("Extracted Content:")
for doc in content_results:
    print(f"URL: {doc.metadata['url']}")
    print(f"Title: {doc.metadata['title']}")
    print(f"Content: {doc.text[:300]}...")
    if 'summary' in doc.metadata:
        print(f"Summary: {doc.metadata['summary']}")
    print("---")

Using with LlamaIndex OpenAI Agents

The most powerful way to use Valyu is within LlamaIndex agents, where the AI can dynamically decide when and how to search:
import os
from llama_index.agent.openai import OpenAIAgent
from llama_index.tools.valyu import ValyuToolSpec

# Set API keys
os.environ["VALYU_API_KEY"] = "your-valyu-api-key"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

# Initialize Valyu tool with comprehensive configuration
valyu_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=100,  # Default maximum cost
    fast_mode=True,  # Enable fast mode for quicker responses
    # Contents API configuration
    contents_summary=True,  # Enable AI summarization for content extraction
    contents_extract_effort="normal",  # Extraction thoroughness
    contents_response_length="medium",  # Content length per URL
)

# Create OpenAI agent with Valyu tools
agent = OpenAIAgent.from_tools(
    valyu_tool.to_tool_list(),
    verbose=True,
)

# Example 1: Deep search query
print("=== Search Example ===")
search_response = agent.chat(
    "What are the key considerations and empirical evidence for implementing statistical arbitrage strategies using cointegrated pairs trading, specifically focusing on the optimal lookback period for calculating correlation coefficients and the impact of transaction costs on strategy profitability in high-frequency trading environments?"
)
print(search_response)

# Example 2: URL content extraction
print("\n=== URL Content Extraction Example ===")
content_response = agent.chat(
    "Please extract and summarize the content from these URLs: https://arxiv.org/abs/1706.03762 and https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)"
)
print(content_response)

Advanced Configuration

Comprehensive Parameter Configuration

The ValyuToolSpec supports extensive configuration during initialization:
from llama_index.tools.valyu import ValyuToolSpec

# Initialize with comprehensive configuration
valyu_tool = ValyuToolSpec(
    api_key="your-api-key",
    verbose=True,
    # Search API parameters (set at initialization)
    max_price=100,  # Maximum cost in dollars for search operations
    relevance_threshold=0.5,  # Minimum relevance score (0.0-1.0)
    fast_mode=False,  # Quality vs speed trade-off
    included_sources=["arxiv.org", "pubmed.ncbi.nlm.nih.gov"],  # Include specific sources
    excluded_sources=["reddit.com", "twitter.com"],  # Exclude sources
    response_length="medium",  # "short", "medium", "large", "max", or int
    country_code="US",  # 2-letter ISO country code for geo-bias
    # Contents API parameters
    contents_summary=True,  # Enable AI summarization
    contents_extract_effort="high",  # "normal", "high", or "auto"
    contents_response_length="large",  # Content length per URL
)

# Search with time filtering (parameters set per search)
results = valyu_tool.search(
    query="quantum computing breakthroughs 2024",
    search_type="proprietary",  # Focus on academic sources
    max_num_results=10,
    start_date="2024-01-01",  # Time-filtered search
    end_date="2024-12-31",
    fast_mode=None  # Uses tool default (False in this case)
)

Source Filtering Examples

# Academic-focused configuration
academic_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    included_sources=[
        "arxiv.org", 
        "pubmed.ncbi.nlm.nih.gov", 
        "ieee.org",
        "nature.com",
        "sciencedirect.com"
    ],
    response_length="large",  # More detailed academic content
    relevance_threshold=0.7  # Higher quality threshold
)

# News and current events configuration
news_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    excluded_sources=["reddit.com", "twitter.com", "facebook.com"],
    fast_mode=True,  # Faster for current events
    country_code="US",  # US-focused news
    response_length="short"  # Concise news summaries
)

Multi-Agent Workflows

Use Valyu in specialized agent configurations:
import os
from llama_index.agent.openai import OpenAIAgent
from llama_index.tools.valyu import ValyuToolSpec

# Create specialized research agent
research_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=100,
    included_sources=["arxiv.org", "pubmed.ncbi.nlm.nih.gov", "ieee.org"],
    response_length="large",
    relevance_threshold=0.7,
    fast_mode=False  # Prioritize quality for research
)

research_agent = OpenAIAgent.from_tools(
    research_tool.to_tool_list(),
    verbose=True,
    system_prompt="You are a research specialist. Use Valyu to find authoritative academic sources and provide well-cited answers. Focus on peer-reviewed papers and scholarly articles."
)

# Create analysis agent for current events
analysis_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=100,
    excluded_sources=["reddit.com", "twitter.com", "facebook.com"],
    fast_mode=True,  # Faster for current data
    response_length="medium",
    country_code="US"
)

analysis_agent = OpenAIAgent.from_tools(
    analysis_tool.to_tool_list(),
    verbose=True,
    system_prompt="You are a market analyst. Use current data to provide insights and recommendations. Focus on authoritative news sources and financial data."
)

# Use agents for different purposes
print("=== Research Agent Example ===")
research_response = research_agent.chat(
    "Find recent papers on transformer architecture improvements and summarize key innovations"
)
print(research_response)

print("\n=== Analysis Agent Example ===")
analysis_response = analysis_agent.chat(
    "Analyze current market trends in AI chip demand and semiconductor industry"
)
print(analysis_response)

Example Applications

Financial Research Assistant

import os
from llama_index.agent.openai import OpenAIAgent
from llama_index.tools.valyu import ValyuToolSpec

# Create financial research agent with tailored configuration
financial_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=100,
    fast_mode=True,  # Financial data changes rapidly
    excluded_sources=["reddit.com", "twitter.com"],  # Exclude social media
    response_length="medium",
    country_code="US",  # US financial markets focus
    # Content extraction for financial reports
    contents_summary=True,
    contents_extract_effort="high",
    contents_response_length="large"
)

financial_agent = OpenAIAgent.from_tools(
    financial_tool.to_tool_list(),
    verbose=True,
    system_prompt="""You are a financial research assistant. Use Valyu to search for:
    - Real-time market data and news
    - Academic research on financial models
    - Economic indicators and analysis
    - Financial reports and regulatory filings

    Always cite your sources and provide context about data recency."""
)

# Query financial markets
response = financial_agent.chat(
    "What are the latest developments in cryptocurrency regulation and their impact on institutional adoption? Include both recent news and academic research on the topic."
)
print(response)

Academic Research Agent

from llama_index.tools.valyu import ValyuToolSpec

# Configure for academic research
academic_tool = ValyuToolSpec(
    api_key=os.environ["VALYU_API_KEY"],
    max_price=100,
    included_sources=["arxiv.org", "pubmed.ncbi.nlm.nih.gov", "nature.com"],
    response_length="large",  # Detailed academic content
    relevance_threshold=0.7,  # Higher quality threshold
    fast_mode=False  # Prioritize quality over speed
)

# Search academic sources specifically
academic_results = academic_tool.search(
    query="CRISPR gene editing safety protocols",
    search_type="proprietary",  # Focus on academic datasets
    max_num_results=8,
    start_date="2020-01-01",  # Recent research
    end_date="2024-12-31"
)

print("Academic Sources Found:", len(academic_results))
for doc in academic_results:
    print(f"Title: {doc.metadata['title']}")
    print(f"Source: {doc.metadata['source']}")
    print(f"Relevance: {doc.metadata['relevance_score']}")
    print(f"Data Type: {doc.metadata['data_type']}")
    print("---")

Best Practices

1. Search Type Selection

# Web search for current events
web_results = valyu_tool.context(
    query="latest AI policy developments",
    search_type="web",
    max_num_results=5
)

# Proprietary search for academic research
academic_results = valyu_tool.context(
    query="machine learning interpretability methods",
    search_type="proprietary",
    max_num_results=8
)

# Combined search for comprehensive coverage
all_results = valyu_tool.context(
    query="climate change economic impact",
    search_type="all",
    max_num_results=10
)

2. Error Handling and Fallbacks

from llama_index.tools.valyu import ValyuToolSpec

def robust_search(query: str, fallback_query: str = None):
    tool = ValyuToolSpec(
        api_key=os.environ["VALYU_API_KEY"],
        max_price=30.0
    )

    try:
        # Primary search
        results = tool.context(
            query=query,
            max_price=30.0,
            max_num_results=5
        )
        return results
    except Exception as e:
        print(f"Primary search failed: {e}")

        if fallback_query:
            try:
                # Fallback with simpler query
                results = tool.context(
                    query=fallback_query,
                    max_price=30.0,
                    max_num_results=3,
                    search_type="web"
                )
                return results
            except Exception as e2:
                print(f"Fallback search also failed: {e2}")
                return []

        return []

# Usage
results = robust_search(
    "complex quantum entanglement applications",
    "quantum entanglement basics"
)

3. Agent System Messages

from llama_index.core.agent.workflow import AgentWorkflow

# Optimize agent behavior with good system messages
system_message = """You are an AI research assistant with access to Valyu search.

SEARCH GUIDELINES:
- Use search_type="proprietary" for academic/scientific queries
- Use search_type="web" for current events and news
- Use search_type="all" for comprehensive research
- Set higher relevance_threshold (0.6+) for precise results
- Use async/await patterns with AgentWorkflow for better performance
- Always cite sources from search results

RESPONSE FORMAT:
- Provide direct answers based on search results
- Include source citations with URLs when available
- Mention publication dates for time-sensitive information
- Indicate if information might be outdated"""

agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=valyu_tool.to_tool_list(),
    llm=llm,
    system_prompt=system_message
)

Integration with Other LlamaIndex Components

Custom Query Engines

from llama_index.core.query_engine import CustomQueryEngine
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.schema import QueryBundle

class ValyuQueryEngine(CustomQueryEngine):
    def __init__(self, valyu_tool: ValyuToolSpec):
        self.valyu_tool = valyu_tool

    def custom_query(self, query_bundle: QueryBundle):
        results = self.valyu_tool.context(
            query=query_bundle.query_str,
            search_type="all",
            max_num_results=5
        )

        # Process results into response
        response_text = "\n\n".join([
            f"**{doc.metadata.get('title', 'Source')}**\n{doc.text}"
            for doc in results
        ])

        return response_text

# Use custom query engine
valyu_tool = ValyuToolSpec(api_key=os.environ["VALYU_API_KEY"])
query_engine = ValyuQueryEngine(valyu_tool)

response = query_engine.query("What is LlamaIndex?")
print(response)

Integration with Retrievers

from llama_index.core.retrievers import BaseRetriever
from llama_index.tools.valyu import ValyuToolSpec
from llama_index.core.schema import NodeWithScore, QueryBundle

class ValyuRetriever(BaseRetriever):
    def __init__(self, valyu_tool: ValyuToolSpec, search_type="all", max_results=5):
        self.valyu_tool = valyu_tool
        self.search_type = search_type
        self.max_results = max_results

    def _retrieve(self, query_bundle: QueryBundle):
        results = self.valyu_tool.context(
            query=query_bundle.query_str,
            search_type=self.search_type,
            max_num_results=self.max_results
        )

        # Convert to NodeWithScore objects
        nodes = []
        for doc in results:
            node = NodeWithScore(
                node=doc,
                score=doc.metadata.get('relevance_score', 0.5)
            )
            nodes.append(node)

        return nodes

# Use custom retriever
valyu_tool = ValyuToolSpec(api_key=os.environ["VALYU_API_KEY"])
retriever = ValyuRetriever(valyu_tool, search_type="proprietary", max_results=8)

nodes = retriever.retrieve("machine learning safety")
print(f"Retrieved {len(nodes)} nodes")

API Reference

For complete parameter documentation, see the Valyu API Reference.

ValyuToolSpec Initialization Parameters

  • api_key (required): Valyu API key
  • verbose: Enable verbose logging (default: False)
  • max_price: Maximum cost in dollars for search operations (default: 100)
  • relevance_threshold: Minimum relevance score 0.0-1.0 (default: 0.5)
  • fast_mode: Enable fast mode for faster results (default: False)
  • included_sources: List of URLs/domains to include (optional)
  • excluded_sources: List of URLs/domains to exclude (optional)
  • response_length: Content length - int, “short”, “medium”, “large”, “max” (optional)
  • country_code: 2-letter ISO country code for geo-bias (optional)
  • contents_summary: AI summary config - bool, str, or dict (optional)
  • contents_extract_effort: “normal”, “high”, or “auto” (default: “normal”)
  • contents_response_length: Content length per URL (default: “short”)

search() Method Parameters

  • query (required): Natural language search query
  • search_type: "all", "web", or "proprietary" (default: “all”)
  • max_num_results: 1-20 results (default: 5)
  • start_date/end_date: Time filtering in YYYY-MM-DD format (optional)
  • fast_mode: Override tool default fast mode setting (optional)

get_contents() Method Parameters

  • urls (required): List of URLs to extract content from (max 10 per request)

Additional Resources

I