Basic Scraping
This guide covers the fundamentals of web scraping using WebscrapingHQ API. You'll learn core concepts, basic usage patterns, and how to handle common scenarios.
Understanding Web Scraping
Web scraping is the process of extracting data from websites programmatically. Modern websites can be challenging to scrape due to:
- Dynamic Content: JavaScript-rendered content
- Anti-Bot Measures: Rate limiting, CAPTCHAs, IP blocking
- Complex Interactions: Forms, authentication, multi-step flows
- Varying Formats: Different HTML structures and data patterns
WebscrapingHQ API handles these challenges automatically, providing a simple interface for complex scraping tasks.
Basic Request Structure
Every scraping request follows this basic structure:
{
"url": "https://example.com",
"renderJs": false,
"screenshot": false,
"waitFor": 0
}
Required Parameters
Parameter | Type | Description |
---|---|---|
url | string | The target URL to scrape (must be a valid HTTP/HTTPS URL) |
Optional Parameters
Parameter | Type | Default | Description |
---|---|---|---|
renderJs | boolean | false | Enable JavaScript rendering for dynamic content |
screenshot | boolean | false | Capture a screenshot of the page |
waitFor | number | 0 | Wait time in milliseconds before capturing content |
deviceType | string | "desktop" | Device type: "desktop" or "mobile" |
country_code | string | - | Country code for geolocation (e.g., "US", "UK") |
Content Types
Static HTML Pages
For simple websites with static content, a basic request is sufficient:
curl -X POST https://app.webscrapinghq.com/api/v1/scrape \
-H "X-API-KEY: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article"
}'
Use cases:
- News articles
- Blog posts
- Product catalogs
- Static content pages
JavaScript-Rendered Content
Many modern websites require JavaScript to load content. Enable JavaScript rendering:
{
"url": "https://spa-example.com",
"renderJs": true,
"waitFor": 3000
}
Use cases:
- Single Page Applications (SPAs)
- React/Vue/Angular applications
- Dynamic content loading
- Infinite scroll pages
Mobile-Optimized Content
Some websites serve different content to mobile devices:
{
"url": "https://mobile-site.com",
"deviceType": "mobile",
"renderJs": true
}
Response Structure
Understanding the response structure helps you extract the data you need:
{
"creditsLeft": 995,
"cost": 5,
"initial-status-code": 200,
"resolved-url": "https://example.com/final-url",
"type": "html",
"body": "<!DOCTYPE html>...",
"features_used": {
"javascript": true,
"screenshot": false,
"geolocation": null
}
}
Key Response Fields
Status Information
initial-status-code
: HTTP status code from the target websiteresolved-url
: Final URL after any redirectstype
: Content type (usually "html")
Usage Information
creditsLeft
: Remaining credits in your accountcost
: Credits consumed by this requestfeatures_used
: Object showing which features were used
Content
body
: The HTML content of the scraped pagescreenshot
: Base64-encoded screenshot (if requested)
Common Scraping Patterns
1. News Articles
{
"url": "https://news-site.com/article/123",
"renderJs": false
}
News sites typically use static HTML, so JavaScript rendering is usually unnecessary.
2. E-commerce Products
{
"url": "https://shop.com/product/456",
"renderJs": true,
"waitFor": 2000
}
Product pages often load prices and availability dynamically.
3. Social Media Posts
{
"url": "https://social-platform.com/post/789",
"renderJs": true,
"waitFor": 5000,
"deviceType": "mobile"
}
Social media requires JavaScript and often serves better content to mobile devices.
4. Search Results
{
"url": "https://search-engine.com/search?q=web+scraping",
"renderJs": true,
"waitFor": 3000
}
Search results are typically JavaScript-rendered and may take time to load.