Skip to content

How to Scrape Any Website for Free Using AI & N8N

Difficulty Level: 0

Estimated Duration: minutes

Tools Required:

In this tutorial, I’ll show you how to create a powerful AI web scraping workflow that can:

  • πŸ•ΈοΈ Process website sitemap XML files
  • πŸ€– Extract structured data using Google Gemini AI
  • πŸ“Š Save results automatically to Google Sheets
  • πŸ’Έ Work completely free (no paid APIs required)

⚠️ IMPORTANT DISCLAIMER

Only scrape websites you have explicit permission to access. Respect robots.txt files and terms of service.

HERE ARE THE LINKS YOU NEED πŸ‘‡

Step 1: Set Up N8N with Docker

  1. Install Docker Desktop for your OS
  2. Open Docker and search for “n8nio” image
  3. Pull the latest image and run container on port 5678
  4. Access N8N at http://localhost:5678

Step 2: Configure Google Gemini AI

  1. Get free API key from Google AI Studio
  2. In N8N, create new Google Gemini credentials
  3. Select “gemini-1.5-flash” model for best performance

Step 3: Connect Google Sheets

  1. Enable Google Sheets API in Cloud Console
  2. Create OAuth 2.0 credentials with redirect URI from N8N
  3. Connect your Google account in N8N’s Google Sheets node

Step 4: Import the Scraping Workflow

  1. Download the workflow JSON file
  2. In N8N, create new workflow β†’ Import from file
  3. Configure nodes:
    • Google Gemini: Paste API key
    • Google Sheets: Link your spreadsheet

Step 5: Run Your Web Scraper

  1. Create Google Sheet with “URL” and “Scraped Data” columns
  2. Paste website sitemap URL in the chat trigger
  3. Watch N8N automatically:
    • Process XML sitemap
    • Scrape each page
    • Extract data with AI
    • Save markdown results to Sheets

Pro Tip: Customize AI Extraction

Modify the prompt in the “AI Agent” node to extract specific data:

"Extract all article content in markdown format with headers. 
Include metadata like publish date and author if available."

Video Walkthrough

https://youtu.be/NgSJjOWJuXY

Why This Rocks for Developers

This workflow gives you:

  • 🚫 No scraping API costs
  • πŸ” Fully automated pipeline
  • πŸ€– AI-powered data extraction
  • πŸ“ˆ Scalable for large websites

Remember to respect website owners and use this power responsibly!