Troubleshooting

Solutions for common NextRows issues and error messages

This guide helps you diagnose and resolve common issues when using NextRows. Most problems can be solved quickly with the right approach.

Common Error Messages

Authentication Errors

401: The API key was not provided

Cause: Missing or incorrect Authorization header.

Solution:

# Missing Authorization header (incorrect)
curl -X POST https://api.nextrows.com/v1/extract

# Correct Authorization header
curl -X POST https://api.nextrows.com/v1/extract \
  -H "Authorization: Bearer sk-nr-your-api-key"

401: Invalid API key

Cause: API key is incorrect or has been revoked.

Solutions:

  1. Verify your API key is correctly formatted
  2. Ensure you're using the complete key (starts with sk-nr-)
  3. Generate a new API key if the current one is compromised

Credit and Billing Errors

402: Credits exhausted

Cause: Your account has run out of available credits.

Solutions:

  1. Verify you have sufficient credits
  2. Purchase additional credits
  3. Wait for credit renewal if on a subscription plan

429: Rate limit exceeded

Cause: Too many requests in a short time period.

Solutions:

  1. Implement request throttling in your code:
import time
import requests

def make_request_with_retry(url, data, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, json=data)
        
        if response.status_code == 429:
            # Wait before retrying
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)
            continue
            
        return response
    
    raise Exception("Max retries exceeded")
  1. Contact support for higher rate limits if needed

Extraction Errors

400: Failed to extract data from URLs

Cause: The target website is inaccessible or blocked the request.

Troubleshooting steps:

  1. Verify URL accessibility:

    curl -I https://target-website.com
  2. Check if the site blocks automated requests:

    • Try accessing the URL in an incognito browser
    • Look for CAPTCHA or bot detection messages
  3. Test with a simpler URL:

    {
      "type": "url",
      "data": ["https://httpbin.org/html"],
      "prompt": "Extract any text content"
    }

400: No structured data found

Cause: The AI couldn't identify relevant data matching your prompt.

Solutions:

  1. Make your prompt more specific:

    // ❌ Too vague
    {"prompt": "Extract data"}
    
    // ✅ Specific
    {"prompt": "Extract product name, price, and rating from product listings"}
  2. Check if the page contains the expected data:

    • Manually inspect the page source
    • Ensure JavaScript hasn't changed the content structure
  3. Try extracting simpler data first:

    {"prompt": "Extract all text content from the page"}

Data Quality Issues

Incomplete or Missing Data

Symptoms: Some fields are empty or missing in the extracted data.

Diagnosis:

  1. Check the source page:

    • Manually verify the data exists on the page
    • Look for data that loads dynamically with JavaScript
  2. Review your prompt:

    // ❌ Doesn't handle missing data
    {"prompt": "Extract price and rating"}
    
    // ✅ Handles missing data explicitly
    {"prompt": "Extract price and rating, use 'N/A' if not available"}
  3. Use schema validation to catch issues:

    {
      "schema": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "title": {"type": "string", "minLength": 1},
            "price": {"type": "string"}
          },
          "required": ["title"]
        }
      }
    }

Incorrect Data Types

Symptoms: Numbers returned as strings, dates in wrong format, etc.

Solutions:

  1. Specify expected formats in your prompt:

    {"prompt": "Extract price as a number without currency symbols, and date in YYYY-MM-DD format"}
  2. Use schema validation for type conversion:

    {
      "schema": {
        "properties": {
          "price": {"type": "number"},
          "date": {"type": "string", "format": "date"}
        }
      }
    }
  3. Post-process the data in your application:

    def clean_price(price_str):
        if isinstance(price_str, str):
            return float(price_str.replace('$', '').replace(',', ''))
        return price_str

Performance Issues

Slow Response Times

Causes and Solutions:

  1. Complex websites: Handle JavaScript-heavy sites with smart content processing
  2. Large pages: Extract only the data you need
  3. Multiple URLs: Process in smaller batches
# Process all URLs at once (incorrect)
urls = [f"https://site.com/page{i}" for i in range(100)]
response = requests.post(api_url, json={"data": urls})

# Process in batches (correct)
import itertools

def batch_process(urls, batch_size=10):
    results = []
    for batch in itertools.batched(urls, batch_size):
        response = requests.post(api_url, json={"data": list(batch)})
        results.extend(response.json()["data"])
    return results

High Credit Usage

Optimization strategies:

  1. Use more specific prompts:
    // ❌ Processes entire page
    {"prompt": "Extract all information"}
    
    // ✅ Targets specific data
    {"prompt": "Extract only product name and price from the product info section"}

Website-Specific Issues

JavaScript-Heavy Websites

Symptoms: Missing content that's visible in the browser but not in extracted data.

Solution: The website likely loads content dynamically. Try these approaches:

  1. Check if NextRows already handles it (most modern frameworks are supported automatically)

  2. Wait for specific elements to load before extraction

  3. Allow extra time for content to fully load

Anti-Bot Protection

Symptoms: Different data than what you see in the browser, or blocked requests.

Indicators:

  • CAPTCHA challenges
  • "Access Denied" messages
  • Significantly different content

Solutions:

  1. Respect website policies when extracting data

  2. Respect rate limits and implement delays:

    import time
    
    def respectful_extraction(urls, delay=2):
        results = []
        for url in urls:
            result = extract_data(url)
            results.append(result)
            time.sleep(delay)  # Be respectful
        return results
  3. Contact the website owner for API access if available

Login-Required Content

Symptoms: Extraction returns login pages instead of actual content.

Current limitations: NextRows doesn't support authenticated sessions.

Workarounds:

  1. Look for public versions of the data
  2. Use the website's official API if available
  3. Use publicly accessible content for extraction

Debugging Strategies

Step-by-Step Debugging

  1. Start simple:

    {"prompt": "Extract page title"}
  2. Gradually increase complexity:

    {"prompt": "Extract title and main headings"}
  3. Add specific requirements:

    {"prompt": "Extract title, headings, and any price information"}

Testing Different Approaches

def debug_extraction(url):
    test_prompts = [
        "Extract all text content",
        "Extract any structured data",
        "List all links and their text",
        "Find any price or number information"
    ]
    
    for prompt in test_prompts:
        print(f"Testing: {prompt}")
        try:
            result = extract_data(url, prompt)
            print(f"Success: {len(result.get('data', []))} items")
        except Exception as e:
            print(f"Error: {e}")
        print("-" * 50)

API Testing Strategies

Use these approaches for effective debugging:

  1. Start with simple URLs: Test with basic pages first
  2. Validate your prompts: Ensure prompts are specific and clear
  3. Check response formats: Verify the structure matches your expectations
  4. Monitor error patterns: Look for common failure points

Additional Resources

Self-Help Guide

  1. Check this troubleshooting guide
  2. Test with a simpler example
  3. Review the API documentation
  4. Try different prompt variations

Debugging Checklist

When encountering issues:

  • Verify request format: Ensure all required parameters are included
  • Check error messages: Look for specific guidance in the response
  • Test with minimal examples: Start simple and build complexity
  • Validate API key format: Ensure it follows the correct pattern

Documentation Resources

  • API Reference: Complete parameter and response documentation
  • Examples: Real-world use cases and code samples
  • Features Guide: Advanced capabilities and best practices

Most issues can be resolved by adjusting your prompts or breaking complex extractions into simpler steps. The AI works best with clear, specific instructions.

Prevention Best Practices

Robust Code Patterns

import requests
import time
from typing import Optional, Dict, Any

class NextRowsClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.nextrows.com/v1"
        
    def extract_with_retry(
        self, 
        url: str, 
        prompt: str, 
        max_retries: int = 3,
        retry_delay: int = 1
    ) -> Optional[Dict[Any, Any]]:
        
        for attempt in range(max_retries):
            try:
                response = requests.post(
                    f"{self.base_url}/extract",
                    headers={"Authorization": f"Bearer {self.api_key}"},
                    json={
                        "type": "url",
                        "data": [url],
                        "prompt": prompt
                    }
                )
                
                if response.status_code == 200:
                    return response.json()
                elif response.status_code == 429:
                    # Rate limited, wait and retry
                    time.sleep(retry_delay * (2 ** attempt))
                    continue
                else:
                    print(f"Error {response.status_code}: {response.text}")
                    return None
                    
            except requests.exceptions.RequestException as e:
                print(f"Request failed: {e}")
                if attempt < max_retries - 1:
                    time.sleep(retry_delay)
                    
        return None

Monitoring and Alerting

Set up monitoring for production use:

import logging
from datetime import datetime

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def log_extraction_result(url, prompt, result):
    logger.info(f"Extraction completed", extra={
        'url': url,
        'prompt': prompt,
        'success': result is not None,
        'timestamp': datetime.now().isoformat()
    })

This comprehensive troubleshooting guide should help you resolve most issues quickly. Remember that clear, specific prompts and proper error handling are key to successful data extraction with NextRows.