Getting Started

Comprehensive guide to NextRows API data extraction

This comprehensive guide will walk you through everything you need to know to start extracting data with NextRows API effectively.

Understanding NextRows API

NextRows API is designed to solve the complex challenge of web data extraction by combining:

  • AI-powered understanding of web page structure and content
  • Natural language processing to interpret your extraction requirements
  • Smart content extraction to handle modern websites
  • Schema validation to ensure data quality and consistency

When to Use NextRows

NextRows excels in scenarios where traditional scraping approaches fall short:

  • Dynamic websites with JavaScript-rendered content
  • Complex data structures that require intelligent parsing
  • One-time or periodic data extraction tasks
  • Websites that change structure frequently
  • Data that requires semantic understanding

Core Concepts

Extraction Types

NextRows supports two main extraction approaches:

1. URL Extraction

Extract data directly from web pages by providing URLs:

{
  "type": "url",
  "data": ["https://example.com/page1", "https://example.com/page2"],
  "prompt": "Extract all product information including name, price, and rating"
}

2. Text Extraction

Extract data from raw text content you provide:

{
  "type": "text", 
  "data": ["Product: iPhone 14\nPrice: $999\nRating: 4.5/5"],
  "prompt": "Extract product details in a structured format"
}

Natural Language Prompts

The key to effective extraction is crafting clear, specific prompts:

Good prompts:

  • "Extract company name, job title, salary range, and location from each job posting"
  • "Get article title, author, publication date, and full text content"
  • "Find product name, price, availability status, and customer ratings"

Avoid vague prompts:

  • "Get all data"
  • "Extract everything important"
  • "Find product info"

Schema Validation

For consistent, reliable data, define a schema using JSON Schema:

{
  "type": "url",
  "data": ["https://jobs.example.com"],
  "prompt": "Extract job postings",
  "schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "company": {"type": "string"},
        "salary": {"type": "string"},
        "location": {"type": "string"},
        "posted_date": {"type": "string"}
      },
      "required": ["title", "company"]
    }
  }
}

Authentication

All requests require authentication using your API key in the Authorization header:

Authorization: Bearer sk-nr-your-api-key-here

Keep your API key secure! Never expose it in client-side code or public repositories.

Making Requests

Basic Request Structure

curl -X POST https://api.nextrows.com/v1/extract \
  -H "Authorization: Bearer sk-nr-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "url",
    "data": ["https://example.com"],
    "prompt": "Your extraction prompt here"
  }'

Response Format

Successful responses return structured data:

{
  "success": true,
  "data": [
    {
      "field1": "value1",
      "field2": "value2"
    }
  ]
}

Error responses include details for troubleshooting:

{
  "success": false,
  "error": "Failed to extract data from URLs"
}

Advanced Features

Multiple URL Processing

Process multiple URLs in a single request:

{
  "type": "url",
  "data": [
    "https://site1.com/page1",
    "https://site1.com/page2", 
    "https://site2.com/products"
  ],
  "prompt": "Extract product information from each page"
}

Error Handling

NextRows handles various error scenarios gracefully:

  • Partial failures: If some URLs fail, successful extractions are still returned
  • Timeout protection: Long-running extractions are automatically managed
  • Rate limiting: Built-in backoff for respectful scraping

Best Practices

Crafting Effective Prompts

  1. Be specific about data fields:

    ❌ "Get product data"
    ✅ "Extract product name, price in USD, star rating, and number of reviews"
  2. Specify data format when needed:

    ✅ "Extract publication date in YYYY-MM-DD format"
    ✅ "Get price as a number without currency symbols" 
  3. Handle edge cases:

    ✅ "Extract salary range, use 'Not specified' if salary is not mentioned"

Managing Credits Efficiently

  • Use specific prompts to avoid re-processing unnecessary data
  • Monitor your credit usage and API performance
  • Consider batch processing for large datasets

Handling Different Website Types

E-commerce sites:

{
  "prompt": "Extract product name, current price, original price if on sale, rating score, number of reviews, and availability status"
}

Job boards:

{
  "prompt": "Extract job title, company name, location, salary range, experience level, and application deadline"
}

News sites:

{
  "prompt": "Extract article headline, author name, publication date, article text, and tags or categories"
}

Development Workflow

  1. Start with simple prompts to test your extractions
  2. Refine prompts based on initial results
  3. Add schema validation for production consistency
  4. Implement error handling in your application
  5. Monitor performance and optimize as needed

Rate Limits and Scaling

Current Limits

  • Requests per minute: 20
  • Maximum URLs per request: 20

Scaling Strategies

For high-volume use cases:

  1. Implement request queuing to handle bursts
  2. Use batch processing to maximize throughput
  3. Optimize request patterns to stay within limits

Next Steps