How to Scrape Any Website for Free Using AI & N8N

In this tutorial, I’ll show you how to create a powerful AI web scraping workflow that can:
- πΈοΈ Process website sitemap XML files
- π€ Extract structured data using Google Gemini AI
- π Save results automatically to Google Sheets
- πΈ Work completely free (no paid APIs required)
β οΈ IMPORTANT DISCLAIMER
Only scrape websites you have explicit permission to access. Respect robots.txt files and terms of service.
HERE ARE THE LINKS YOU NEED π
Step 1: Set Up N8N with Docker
- Install Docker Desktop for your OS
- Open Docker and search for “n8nio” image
- Pull the latest image and run container on port 5678
- Access N8N at
http://localhost:5678
Step 2: Configure Google Gemini AI
- Get free API key from Google AI Studio
- In N8N, create new Google Gemini credentials
- Select “gemini-1.5-flash” model for best performance
Step 3: Connect Google Sheets
- Enable Google Sheets API in Cloud Console
- Create OAuth 2.0 credentials with redirect URI from N8N
- Connect your Google account in N8N’s Google Sheets node
Step 4: Import the Scraping Workflow
- Download the workflow JSON file
- In N8N, create new workflow β Import from file
- Configure nodes:
- Google Gemini: Paste API key
- Google Sheets: Link your spreadsheet
Step 5: Run Your Web Scraper
- Create Google Sheet with “URL” and “Scraped Data” columns
- Paste website sitemap URL in the chat trigger
- Watch N8N automatically:
- Process XML sitemap
- Scrape each page
- Extract data with AI
- Save markdown results to Sheets
Pro Tip: Customize AI Extraction
Modify the prompt in the “AI Agent” node to extract specific data:
"Extract all article content in markdown format with headers.
Include metadata like publish date and author if available."
Why This Rocks for Developers
This workflow gives you:
- π« No scraping API costs
- π Fully automated pipeline
- π€ AI-powered data extraction
- π Scalable for large websites
Remember to respect website owners and use this power responsibly!