> ## Documentation Index
> Fetch the complete documentation index at: https://docs.valyu.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Contents

> Reference for the Valyu Contents endpoint that extracts clean, structured content from any URL via POST /v1/contents.



## OpenAPI

````yaml POST /v1/contents
openapi: 3.0.3
info:
  title: Valyu API
  version: 2.2.0
  description: >
    The search API built for AI agents. One API that gives your AI unified
    access to web search, academic papers, financial data, SEC filings, clinical
    trials, patents, and 36+ proprietary data sources.


    ## Products


    | Product | Endpoint | Description |

    | --- | --- | --- |

    | **Search** | `POST /v1/search` | Multi-source search across web and
    proprietary datasets |

    | **Contents** | `POST /v1/contents` | Extract clean, structured content
    from URLs |

    | **Answer** | `POST /v1/answer` | AI-powered answers with real-time search
    |

    | **DeepResearch** | `POST /v1/deepresearch/tasks` | Async research agents
    for comprehensive analysis |

    | **Datasources** | `GET /v1/datasources` | Discover available data sources
    and their schemas |


    ## Authentication


    All endpoints require an API key passed via the `X-API-Key` header. Get your
    key at [platform.valyu.ai](https://platform.valyu.ai).


    ```

    X-API-Key: your_api_key_here

    ```


    ## Pricing


    Valyu uses transparent, pay-per-use pricing:

    - **Search**: CPM-based (cost per mille tokens retrieved)

    - **Contents**: $0.001 per URL extracted, +$0.001 with AI features

    - **Answer**: $0.10 per request + variable search and AI costs

    - **DeepResearch**: Fixed per-task pricing by mode ($0.10 - $15.00)


    See [docs.valyu.ai](https://docs.valyu.ai) for full pricing details.


    ## SDKs


    - [Python SDK](https://pypi.org/project/valyu/) - `pip install valyu`

    - [TypeScript SDK](https://www.npmjs.com/package/valyu) - `npm install
    valyu`
  contact:
    name: Valyu Support
    url: https://valyu.ai
    email: support@valyu.ai
  x-logo:
    url: https://valyu.ai/logo.png
    altText: Valyu
servers:
  - url: https://api.valyu.ai
    description: Production
security:
  - ApiKeyAuth: []
tags:
  - name: Search
    description: >-
      Search across web, academic, financial, and proprietary data sources.
      Returns results ready for RAG pipelines, AI agents, and applications.
    externalDocs:
      description: Search API Guide
      url: https://docs.valyu.ai/search/quickstart
  - name: Contents
    description: >-
      Extract clean, structured content from web pages at scale. Supports batch
      processing, AI-powered summaries, structured extraction via JSON schemas,
      and async jobs for large URL sets.
    externalDocs:
      description: Contents API Guide
      url: https://docs.valyu.ai/guides/content-extraction
  - name: Answer
    description: >-
      Get AI-powered answers grounded in real-time search results. The Answer
      API searches across web, academic, and financial sources, then uses AI to
      generate a readable response via Server-Sent Events streaming.
    externalDocs:
      description: Answer API Guide
      url: https://docs.valyu.ai/guides/answer-api
  - name: DeepResearch
    description: >-
      Async research agents that perform comprehensive, multi-step research.
      DeepResearch searches multiple sources, analyzes content, and generates
      detailed reports with citations. Tasks run in the background and can take
      minutes to complete.
    externalDocs:
      description: DeepResearch Guide
      url: https://docs.valyu.ai/guides/deepresearch
  - name: Batches
    description: >-
      Run multiple DeepResearch tasks in parallel with shared configuration,
      unified monitoring, and aggregated cost tracking.
    externalDocs:
      description: Batch API Guide
      url: https://docs.valyu.ai/guides/deepresearch-batch-quickstart
  - name: Datasources
    description: >-
      Discover available data sources and their schemas. A tool manifest for AI
      agents - instead of hardcoding knowledge of available datasets, agents can
      query this API to discover sources, filter by category, and use
      `example_queries` for few-shot prompting.
    externalDocs:
      description: Datasources Guide
      url: https://docs.valyu.ai/guides/datasources
externalDocs:
  description: Complete API Documentation
  url: https://docs.valyu.ai
paths:
  /v1/contents:
    post:
      tags:
        - Contents
      summary: Extract clean, structured content from URLs
      description: >-
        Turn any web page into clean, structured data. Extract content from up
        to 50 URLs with optional AI-powered summaries, structured extraction via
        JSON schemas, and screenshot capture.


        - **1-10 URLs**: Processed synchronously (default) or asynchronously

        - **11-50 URLs**: Requires `async: true`, returns a `job_id` for polling

        - **Pay-per-success**: Only charged for URLs that are successfully
        processed ($0.001/URL, +$0.001 with AI features)
      operationId: extractContents
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ContentsRequest'
            examples:
              basic:
                summary: Basic content extraction
                value:
                  urls:
                    - https://en.wikipedia.org/wiki/Quantum_computing
              With Summary:
                summary: Extraction with AI summary
                value:
                  urls:
                    - https://arxiv.org/abs/2401.00001
                  summary: Summarize the key findings and methodology
                  response_length: medium
              structured:
                summary: Structured data extraction with JSON schema
                value:
                  urls:
                    - https://example.com/product-page
                  summary:
                    type: object
                    properties:
                      product_name:
                        type: string
                      price:
                        type: number
                      features:
                        type: array
                        items:
                          type: string
              Async Batch:
                summary: Async batch processing with webhook
                value:
                  urls:
                    - https://example.com/page1
                    - https://example.com/page2
                  async: true
                  webhook_url: https://your-app.com/webhooks/valyu
      responses:
        '200':
          description: All URLs processed successfully (synchronous).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ContentsResponse'
              examples:
                basic:
                  summary: Basic extraction
                  value:
                    success: true
                    error: null
                    tx_id: tx_c1d2e3f4-a5b6-7890-cdef-123456789abc
                    urls_requested: 2
                    urls_processed: 2
                    urls_failed: 0
                    results:
                      - url: https://en.wikipedia.org/wiki/Quantum_computing
                        status: success
                        title: Quantum computing - Wikipedia
                        content: >-
                          # Quantum computing\n\nQuantum computing is a type of
                          computation that harnesses quantum mechanical
                          phenomena...
                        description: >-
                          Quantum computing is a type of computation that
                          harnesses quantum mechanical phenomena.
                        source: web
                        price: 0.001
                        length: 24500
                        data_type: unstructured
                        source_type: website
                        image_url:
                          '0': https://upload.wikimedia.org/quantum.png
                        screenshot_url: null
                    total_cost_dollars: 0.002
                    total_characters: 49000
                structured:
                  summary: Structured data extraction
                  value:
                    success: true
                    error: null
                    tx_id: tx_struct-1234-5678-abcd-ef0123456789
                    urls_requested: 1
                    urls_processed: 1
                    urls_failed: 0
                    results:
                      - url: https://example.com/product-page
                        status: success
                        title: Product Page
                        content:
                          product_name: Widget Pro
                          price: 29.99
                          features:
                            - Feature A
                            - Feature B
                        source: web
                        price: 0.002
                        length: 150
                        data_type: structured
                        summary_success: true
                    total_cost_dollars: 0.002
                    total_characters: 150
        '202':
          description: Async job created. Poll the `poll_url` for results.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ContentsAsyncResponse'
              examples:
                example:
                  summary: Async job created
                  value:
                    success: true
                    job_id: cj_a1b2c3d4e5f6g7h8
                    status: pending
                    urls_total: 25
                    poll_url: /contents/jobs/cj_a1b2c3d4e5f6g7h8
                    tx_id: tx_async-1234-5678-abcd-ef0123456789
                    webhook_secret: >-
                      a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef12345678
        '206':
          description: Partial success - some URLs failed processing.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ContentsResponse'
        '400':
          $ref: '#/components/responses/BadRequest'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '402':
          $ref: '#/components/responses/PaymentRequired'
        '403':
          $ref: '#/components/responses/Forbidden'
        '429':
          $ref: '#/components/responses/RateLimited'
        '500':
          $ref: '#/components/responses/InternalServerError'
components:
  schemas:
    ContentsRequest:
      type: object
      required:
        - urls
      additionalProperties: false
      description: Content extraction request parameters.
      properties:
        urls:
          type: array
          description: >-
            URLs to extract content from. Must be valid HTTP/HTTPS URLs.
            Duplicates are silently removed.
          items:
            type: string
            format: uri
          minItems: 1
          maxItems: 50
          example:
            - https://en.wikipedia.org/wiki/Quantum_computing
            - https://arxiv.org/abs/2401.00001
        response_length:
          description: |-
            Maximum character length of extracted content per URL.

            - `"short"` - 25,000 characters (default)
            - `"medium"` - 50,000 characters
            - `"large"` - 100,000 characters
            - `"max"` - No limit
            - Any positive integer for a custom character limit
          oneOf:
            - type: string
              enum:
                - short
                - medium
                - large
                - max
            - type: integer
              minimum: 1
          default: short
        max_price_dollars:
          type: number
          description: >-
            Maximum budget in USD for this request. If estimated cost exceeds
            this, the request is rejected. Defaults to 2x the maximum possible
            cost.
          minimum: 0
          exclusiveMinimum: true
          example: 0.05
        summary:
          description: >-
            Controls AI-powered content processing. Adds $0.001 per URL when
            enabled.


            - `false` - No AI processing, raw extracted content only (default)

            - `true` - Generate a text summary of each page

            - `"string"` - Generate a summary guided by custom instructions (max
            500 characters)

            - `{json_schema}` - Extract structured data matching the provided
            JSON schema
          oneOf:
            - type: boolean
            - type: string
              maxLength: 500
            - type: object
              description: >-
                JSON Schema for structured extraction. Supports types: object,
                array, string, number, integer, boolean.
          default: false
        extract_effort:
          type: string
          description: |-
            Controls how pages are rendered for extraction.

            - `"auto"` - Automatically selects the best method per URL (default)
            - `"normal"` - HTTP-only extraction, fastest
            - `"high"` - Full Chrome rendering, best for JavaScript-heavy pages
          enum:
            - auto
            - normal
            - high
          default: auto
        screenshot:
          type: boolean
          description: >-
            Capture a screenshot of each page. Screenshot URL returned in
            `screenshot_url` field of each result.
          default: false
        async:
          type: boolean
          description: >-
            Process URLs asynchronously. **Required** when submitting more than
            10 URLs. Returns a `job_id` for polling.
          default: false
        webhook_url:
          type: string
          format: uri
          description: >-
            HTTPS URL to receive a webhook POST when async processing completes.
            Only used when `async` is `true`. The webhook payload is signed with
            HMAC-SHA256.
          example: https://your-app.com/webhooks/valyu
    ContentsResponse:
      type: object
      description: Synchronous content extraction response.
      required:
        - success
        - tx_id
        - urls_requested
        - urls_processed
        - urls_failed
        - results
        - total_cost_dollars
        - total_characters
      properties:
        success:
          type: boolean
          description: Whether the extraction completed successfully.
          example: true
        error:
          type: string
          nullable: true
          description: Error message if some URLs failed. `null` on complete success.
          example: null
        tx_id:
          type: string
          description: Transaction ID for tracking.
          example: tx_a1b2c3d4-e5f6-7890-abcd-ef1234567890
        urls_requested:
          type: integer
          description: Number of deduplicated URLs submitted.
          example: 3
        urls_processed:
          type: integer
          description: Number of URLs successfully processed.
          example: 3
        urls_failed:
          type: integer
          description: Number of URLs that failed processing.
          example: 0
        results:
          type: array
          description: Extraction results for each URL.
          items:
            $ref: '#/components/schemas/ContentResult'
        total_cost_dollars:
          type: number
          description: Total cost in USD (only successful URLs are charged).
          example: 0.003
        total_characters:
          type: integer
          description: Sum of all result content lengths.
          example: 45230
    ContentsAsyncResponse:
      type: object
      description: Response returned when async processing is initiated.
      required:
        - success
        - job_id
        - status
        - urls_total
        - poll_url
        - tx_id
      properties:
        success:
          type: boolean
          description: Whether the async job was created successfully.
          example: true
        job_id:
          type: string
          description: Job identifier for polling status.
          example: cj_a1b2c3d4e5f6g7h8
        status:
          type: string
          description: Initial job status.
          enum:
            - pending
          example: pending
        urls_total:
          type: integer
          description: Total number of URLs queued for processing.
          example: 25
        poll_url:
          type: string
          description: Relative URL to poll for job status.
          example: /contents/jobs/cj_a1b2c3d4e5f6g7h8
        tx_id:
          type: string
          description: Transaction ID.
          example: tx_a1b2c3d4-e5f6-7890-abcd-ef1234567890
        webhook_secret:
          type: string
          description: >-
            HMAC-SHA256 secret for verifying webhook payloads. Only present when
            `webhook_url` was provided.
          example: a1b2c3d4e5f6...64_hex_characters
    ContentResult:
      type: object
      description: Extracted content from a single URL.
      required:
        - url
        - status
      properties:
        url:
          type: string
          format: uri
          description: The URL that was processed.
          example: https://en.wikipedia.org/wiki/Quantum_computing
        status:
          type: string
          description: Processing status for this URL.
          enum:
            - success
            - failed
          example: success
        title:
          type: string
          description: Page title. Present when `status` is `"success"`.
          example: Quantum computing - Wikipedia
        content:
          description: >-
            Extracted content. Markdown string when `data_type` is
            `"unstructured"`, JSON object when `data_type` is `"structured"`.
            Present when `status` is `"success"`.
          oneOf:
            - type: string
              description: Markdown-formatted content.
            - type: object
              description: Structured data matching the provided JSON schema.
          example: |-
            # Quantum computing

            Quantum computing is a type of computation...
        description:
          type: string
          description: Meta description of the page.
          example: >-
            Quantum computing is a type of computation that harnesses quantum
            mechanical phenomena.
        source:
          type: string
          description: Always `"web"` for content extraction.
          example: web
        price:
          type: number
          description: Cost in USD for extracting this URL.
          example: 0.001
        length:
          type: integer
          description: Character count of the `content` field.
          example: 24500
        data_type:
          type: string
          description: >-
            `"unstructured"` for text content, `"structured"` when JSON schema
            extraction succeeded.
          enum:
            - unstructured
            - structured
          example: unstructured
        source_type:
          type: string
          description: Classification of the content type.
          enum:
            - website
            - pdf
            - txt
            - json
            - xml
            - sitemap
            - csv
          example: website
        publication_date:
          type: string
          description: Publication date in `YYYY-MM-DD` format, or empty string if unknown.
          example: '2024-03-15'
        id:
          type: string
          description: Identifier for this result (same as `url`).
          example: https://en.wikipedia.org/wiki/Quantum_computing
        image_url:
          type: object
          description: Image URLs found on the page, keyed by index.
          additionalProperties:
            type: string
            format: uri
          example:
            '0': https://upload.wikimedia.org/wikipedia/commons/quantum.png
        screenshot_url:
          type: string
          format: uri
          nullable: true
          description: Screenshot URL if `screenshot` was enabled. `null` otherwise.
          example: null
        summary:
          description: >-
            AI-generated summary or structured extraction result. Only present
            when `summary` was enabled in the request.
          oneOf:
            - type: string
            - type: object
              additionalProperties: true
        summary_success:
          type: boolean
          description: >-
            Whether AI processing succeeded. Only present when `summary` was
            enabled.
        structured_metadata:
          type: object
          description: >-
            Metadata from structured data extraction. Only present when
            `data_type` is `"structured"`.
          additionalProperties: true
        error:
          type: string
          description: Error message. Only present when `status` is `"failed"`.
          example: 'Failed to fetch URL: connection timeout'
    Error:
      type: object
      description: Standard error response.
      required:
        - success
        - error
      properties:
        success:
          type: boolean
          description: Always `false` for error responses.
          example: false
        error:
          type: string
          description: Human-readable error message describing what went wrong.
          example: Invalid request parameters
  responses:
    BadRequest:
      description: The request was malformed or contained invalid parameters.
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Invalid parameters
              value:
                success: false
                error: >-
                  Invalid search_type. Must be 'all', 'web', 'proprietary', or
                  'news'
    Unauthorized:
      description: Missing or invalid API key.
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Invalid API key
              value:
                success: false
                error: Invalid API key
    PaymentRequired:
      description: Insufficient credits or monthly limit exceeded.
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Insufficient credits
              value:
                success: false
                error: >-
                  Insufficient credits. Please top up your account at
                  platform.valyu.ai
    Forbidden:
      description: The API key does not have permission for this operation.
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Permission denied
              value:
                success: false
                error: API key does not have inference permission
    RateLimited:
      description: Too many requests. Slow down and retry after the indicated period.
      headers:
        Retry-After:
          description: Seconds to wait before retrying.
          schema:
            type: integer
        X-RateLimit-Limit:
          description: Maximum requests allowed per window.
          schema:
            type: integer
        X-RateLimit-Remaining:
          description: Remaining requests in the current window.
          schema:
            type: integer
        X-RateLimit-Reset:
          description: Unix timestamp when the rate limit window resets.
          schema:
            type: integer
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Rate limit exceeded
              value:
                success: false
                error: >-
                  Rate limit exceeded. Please retry after the Retry-After
                  period.
    InternalServerError:
      description: An unexpected error occurred on the server.
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
          examples:
            example:
              summary: Internal error
              value:
                success: false
                error: Internal server error. Please try again later.
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-Key
      description: >-
        API key for authentication. Get yours at
        [platform.valyu.ai](https://platform.valyu.ai).

````