Troubleshooting

This guide helps you diagnose and resolve common issues when using NextRows. Most problems can be solved quickly with the right approach.

Common Error Messages

Authentication Errors

`401: The API key was not provided`

Cause: Missing or incorrect Authorization header.

Solution:

# Missing Authorization header (incorrect)
curl -X POST https://api.nextrows.com/v1/extract

# Correct Authorization header
curl -X POST https://api.nextrows.com/v1/extract \
  -H "Authorization: Bearer sk-nr-your-api-key"

`401: Invalid API key`

Cause: API key is incorrect or has been revoked.

Solutions:

Verify your API key is correctly formatted
Ensure you're using the complete key (starts with sk-nr-)
Generate a new API key if the current one is compromised

Credit and Billing Errors

`402: Credits exhausted`

Cause: Your account has run out of available credits.

Solutions:

Verify you have sufficient credits
Purchase additional credits
Wait for credit renewal if on a subscription plan

`429: Rate limit exceeded`

Cause: Too many requests in a short time period.

Solutions:

Implement request throttling in your code:

import time
import requests

def make_request_with_retry(url, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, json=data)
        
        if response.status_code == 429:
            # Wait before retrying
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
            continue
            
        return response
    
    raise Exception("Max retries exceeded")

Contact support for higher rate limits if needed

Extraction Errors

`400: Failed to extract data from URLs`

Cause: The target website is inaccessible or blocked the request.

Troubleshooting steps:

Verify URL accessibility:
```
curl -I https://target-website.com
```
Check if the site blocks automated requests:
- Try accessing the URL in an incognito browser
- Look for CAPTCHA or bot detection messages

Test with a simpler URL:

{
  "type": "url",
  "data": ["https://httpbin.org/html"],
  "prompt": "Extract any text content"
}

`400: No structured data found`

Cause: The AI couldn't identify relevant data matching your prompt.

Solutions:

Make your prompt more specific:

// ❌ Too vague
{"prompt": "Extract data"}

// ✅ Specific
{"prompt": "Extract product name, price, and rating from product listings"}

Check if the page contains the expected data:
- Manually inspect the page source
- Ensure JavaScript hasn't changed the content structure

Try extracting simpler data first:

{"prompt": "Extract all text content from the page"}

Data Quality Issues

Incomplete or Missing Data

Symptoms: Some fields are empty or missing in the extracted data.

Diagnosis:

Check the source page:
- Manually verify the data exists on the page
- Look for data that loads dynamically with JavaScript

Review your prompt:

// ❌ Doesn't handle missing data
{"prompt": "Extract price and rating"}

// ✅ Handles missing data explicitly
{"prompt": "Extract price and rating, use 'N/A' if not available"}

Use schema validation to catch issues:

{
  "schema": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "title": {"type": "string", "minLength": 1},
        "price": {"type": "string"}
      },
      "required": ["title"]
    }
  }
}

Incorrect Data Types

Symptoms: Numbers returned as strings, dates in wrong format, etc.

Solutions:

Specify expected formats in your prompt:

{"prompt": "Extract price as a number without currency symbols, and date in YYYY-MM-DD format"}

Use schema validation for type conversion:

{
  "schema": {
    "properties": {
      "price": {"type": "number"},
      "date": {"type": "string", "format": "date"}
    }
  }
}

Post-process the data in your application:

def clean_price(price_str):
    if isinstance(price_str, str):
        return float(price_str.replace('$', '').replace(',', ''))
    return price_str

Performance Issues

Slow Response Times

Causes and Solutions:

Complex websites: Handle JavaScript-heavy sites with smart content processing
Large pages: Extract only the data you need
Multiple URLs: Process in smaller batches

# Process all URLs at once (incorrect)
urls = [f"https://site.com/page{i}" for i in range(100)]
response = requests.post(api_url, json={"data": urls})

# Process in batches (correct)
import itertools

def batch_process(urls, batch_size=10):
    results = []
    for batch in itertools.batched(urls, batch_size):
        response = requests.post(api_url, json={"data": list(batch)})
        results.extend(response.json()["data"])
    return results

High Credit Usage

Optimization strategies:

Use more specific prompts:

// ❌ Processes entire page
{"prompt": "Extract all information"}

// ✅ Targets specific data
{"prompt": "Extract only product name and price from the product info section"}

Website-Specific Issues

JavaScript-Heavy Websites

Symptoms: Missing content that's visible in the browser but not in extracted data.

Solution: The website likely loads content dynamically. Try these approaches:

Check if NextRows already handles it (most modern frameworks are supported automatically)
Wait for specific elements to load before extraction
Allow extra time for content to fully load

Anti-Bot Protection

Symptoms: Different data than what you see in the browser, or blocked requests.

Indicators:

CAPTCHA challenges
"Access Denied" messages
Significantly different content

Solutions:

Respect website policies when extracting data

Respect rate limits and implement delays:

import time

def respectful_extraction(urls, delay=2):
    results = []
    for url in urls:
        result = extract_data(url)
        results.append(result)
        time.sleep(delay)  # Be respectful
    return results

Contact the website owner for API access if available

Symptoms: Extraction returns login pages instead of actual content.

Current limitations: NextRows doesn't support authenticated sessions.

Workarounds:

Look for public versions of the data
Use the website's official API if available
Use publicly accessible content for extraction

Debugging Strategies

Step-by-Step Debugging

Start simple:
```
{"prompt": "Extract page title"}
```

Gradually increase complexity:

{"prompt": "Extract title and main headings"}

Add specific requirements:

{"prompt": "Extract title, headings, and any price information"}

Testing Different Approaches

def debug_extraction(url):
    test_prompts = [
        "Extract all text content",
        "Extract any structured data",
        "List all links and their text",
        "Find any price or number information"
    ]
    
    for prompt in test_prompts:
        print(f"Testing: {prompt}")
        try:
            result = extract_data(url, prompt)
            print(f"Success: {len(result.get('data', []))} items")
        except Exception as e:
            print(f"Error: {e}")
        print("-" * 50)

API Testing Strategies

Use these approaches for effective debugging:

Start with simple URLs: Test with basic pages first
Validate your prompts: Ensure prompts are specific and clear
Check response formats: Verify the structure matches your expectations
Monitor error patterns: Look for common failure points

Additional Resources

Self-Help Guide

Check this troubleshooting guide
Test with a simpler example
Review the API documentation
Try different prompt variations

Debugging Checklist

When encountering issues:

Verify request format: Ensure all required parameters are included
Check error messages: Look for specific guidance in the response
Test with minimal examples: Start simple and build complexity
Validate API key format: Ensure it follows the correct pattern

Documentation Resources

API Reference: Complete parameter and response documentation
Examples: Real-world use cases and code samples
Features Guide: Advanced capabilities and best practices

Most issues can be resolved by adjusting your prompts or breaking complex extractions into simpler steps. The AI works best with clear, specific instructions.

Prevention Best Practices

Robust Code Patterns

import requests
import time
from typing import Optional, Dict, Any

class NextRowsClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.nextrows.com/v1"
        
    def extract_with_retry(
        self, 
        url: str, 
        prompt: str, 
        max_retries: int = 3,
        retry_delay: int = 1
    ) -> Optional[Dict[Any, Any]]:
        
        for attempt in range(max_retries):
            try:
                response = requests.post(
                    f"{self.base_url}/extract",
                    headers={"Authorization": f"Bearer {self.api_key}"},
                    json={
                        "type": "url",
                        "data": [url],
                        "prompt": prompt
                    }
                )
                
                if response.status_code == 200:
                    return response.json()
                elif response.status_code == 429:
                    # Rate limited, wait and retry
                    time.sleep(retry_delay * (2 ** attempt))
                    continue
                else:
                    print(f"Error {response.status_code}: {response.text}")
                    return None
                    
            except requests.exceptions.RequestException as e:
                print(f"Request failed: {e}")
                if attempt < max_retries - 1:
                    time.sleep(retry_delay)
                    
        return None

Monitoring and Alerting

Set up monitoring for production use:

import logging
from datetime import datetime

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def log_extraction_result(url, prompt, result):
    logger.info(f"Extraction completed", extra={
        'url': url,
        'prompt': prompt,
        'success': result is not None,
        'timestamp': datetime.now().isoformat()
    })

This comprehensive troubleshooting guide should help you resolve most issues quickly. Remember that clear, specific prompts and proper error handling are key to successful data extraction with NextRows.

Troubleshooting

On this page