Getting Started
Comprehensive guide to NextRows API data extraction
This comprehensive guide will walk you through everything you need to know to start extracting data with NextRows API effectively.
Understanding NextRows API
NextRows API is designed to solve the complex challenge of web data extraction by combining:
- AI-powered understanding of web page structure and content
- Natural language processing to interpret your extraction requirements
- Smart content extraction to handle modern websites
- Schema validation to ensure data quality and consistency
When to Use NextRows
NextRows excels in scenarios where traditional scraping approaches fall short:
- Dynamic websites with JavaScript-rendered content
- Complex data structures that require intelligent parsing
- One-time or periodic data extraction tasks
- Websites that change structure frequently
- Data that requires semantic understanding
Core Concepts
Extraction Types
NextRows supports two main extraction approaches:
1. URL Extraction
Extract data directly from web pages by providing URLs:
{
"type": "url",
"data": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract all product information including name, price, and rating"
}
2. Text Extraction
Extract data from raw text content you provide:
{
"type": "text",
"data": ["Product: iPhone 14\nPrice: $999\nRating: 4.5/5"],
"prompt": "Extract product details in a structured format"
}
Natural Language Prompts
The key to effective extraction is crafting clear, specific prompts:
Good prompts:
- "Extract company name, job title, salary range, and location from each job posting"
- "Get article title, author, publication date, and full text content"
- "Find product name, price, availability status, and customer ratings"
Avoid vague prompts:
- "Get all data"
- "Extract everything important"
- "Find product info"
Schema Validation
For consistent, reliable data, define a schema using JSON Schema:
{
"type": "url",
"data": ["https://jobs.example.com"],
"prompt": "Extract job postings",
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"company": {"type": "string"},
"salary": {"type": "string"},
"location": {"type": "string"},
"posted_date": {"type": "string"}
},
"required": ["title", "company"]
}
}
}
Authentication
All requests require authentication using your API key in the Authorization header:
Authorization: Bearer sk-nr-your-api-key-here
Keep your API key secure! Never expose it in client-side code or public repositories.
Making Requests
Basic Request Structure
curl -X POST https://api.nextrows.com/v1/extract \
-H "Authorization: Bearer sk-nr-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"type": "url",
"data": ["https://example.com"],
"prompt": "Your extraction prompt here"
}'
Response Format
Successful responses return structured data:
{
"success": true,
"data": [
{
"field1": "value1",
"field2": "value2"
}
]
}
Error responses include details for troubleshooting:
{
"success": false,
"error": "Failed to extract data from URLs"
}
Advanced Features
Multiple URL Processing
Process multiple URLs in a single request:
{
"type": "url",
"data": [
"https://site1.com/page1",
"https://site1.com/page2",
"https://site2.com/products"
],
"prompt": "Extract product information from each page"
}
Error Handling
NextRows handles various error scenarios gracefully:
- Partial failures: If some URLs fail, successful extractions are still returned
- Timeout protection: Long-running extractions are automatically managed
- Rate limiting: Built-in backoff for respectful scraping
Best Practices
Crafting Effective Prompts
-
Be specific about data fields:
❌ "Get product data" ✅ "Extract product name, price in USD, star rating, and number of reviews"
-
Specify data format when needed:
✅ "Extract publication date in YYYY-MM-DD format" ✅ "Get price as a number without currency symbols"
-
Handle edge cases:
✅ "Extract salary range, use 'Not specified' if salary is not mentioned"
Managing Credits Efficiently
- Use specific prompts to avoid re-processing unnecessary data
- Monitor your credit usage and API performance
- Consider batch processing for large datasets
Handling Different Website Types
E-commerce sites:
{
"prompt": "Extract product name, current price, original price if on sale, rating score, number of reviews, and availability status"
}
Job boards:
{
"prompt": "Extract job title, company name, location, salary range, experience level, and application deadline"
}
News sites:
{
"prompt": "Extract article headline, author name, publication date, article text, and tags or categories"
}
Development Workflow
- Start with simple prompts to test your extractions
- Refine prompts based on initial results
- Add schema validation for production consistency
- Implement error handling in your application
- Monitor performance and optimize as needed
Rate Limits and Scaling
Current Limits
- Requests per minute: 20
- Maximum URLs per request: 20
Scaling Strategies
For high-volume use cases:
- Implement request queuing to handle bursts
- Use batch processing to maximize throughput
- Optimize request patterns to stay within limits