Tutorial
Beginner Friendly
15 min read

Website Crawler Tutorial

Learn how to use the SEO by the Hour website crawler to analyze any website for SEO insights, technical issues, and content opportunities.

Interactive Tutorial Slides

New

Navigate through our comprehensive visual guide using the interactive slide deck below. Use the navigation controls, thumbnails, or keyboard arrows to explore each step of the crawler tutorial.

Introduction
1 of 9
Website Crawler Tutorial: Master SEO Analysis
1 / 9

Website Crawler Tutorial: Master SEO Analysis

Learn how to use the SEO by the Hour website crawler to analyze any website for SEO insights, technical issues, and content opportunities.

Use arrow keys ← → to navigate slides, or click thumbnails to jump to specific slides

1What is the Website Crawler?

The SEO by the Hour website crawler is a powerful tool that automatically visits and analyzes web pages on any website. It extracts valuable SEO data including page titles, meta descriptions, headings, content length, and technical information.

SEO Analysis

Analyze titles, meta descriptions, headings, and content structure

Technical Insights

Check status codes, response times, and technical issues

Content Audit

Review content length, word counts, and content gaps

2Getting Started

Accessing the Crawler

Navigate to the crawler tool using the sidebar navigation or by visiting /crawler directly.

Navigation Path

Sidebar → Crawler → Website Crawler Dashboard

System Requirements

  • Modern web browser (Chrome, Firefox, Safari, Edge)
  • Stable internet connection
  • JavaScript enabled

3Running Your First Crawl

1Enter the Website URL

In the URL input field, enter the website you want to crawl. You can enter:

  • Full URL: https://example.com
  • Domain only: example.com
  • Subdomain: blog.example.com

2Set Page Limit

Choose how many pages you want to crawl (1-500 pages). For your first crawl, we recommend starting with 10-20 pages.

Recommended Limits:

  • Small sites: 10-50 pages
  • Medium sites: 50-150 pages
  • Large sites: 150-500 pages

3Start the Crawl

Click the "Start Crawl" button to begin the analysis. The crawler will:

  1. 1Connect to the website
  2. 2Discover pages to crawl
  3. 3Extract content and metadata
  4. 4Analyze SEO elements
  5. 5Present results

4Advanced Settings

Crawl Type Options

URL Crawl (Default)

Starts from the entered URL and follows internal links to discover pages.

Recommended for most sites

Sitemap Crawl

Uses the website's XML sitemap to find pages to crawl.

Best for large sites
Respect Robots.txt

When enabled, the crawler will check and follow the website's robots.txt file rules.

5Analyzing Results

Understanding the Results Table

Once the crawl completes, you'll see a comprehensive table with the following columns:

URL Column
  • • Full page URL
  • • Clickable external link
  • • Truncated for readability
Title Column
  • • Page title tag content
  • • Shows "No title" if missing
  • • Critical for SEO ranking
H1 Column
  • • Main heading content
  • • Should match title theme
  • • Important for content structure
Status Column
  • • HTTP response codes
  • • Green: 200 (OK)
  • • Red: 400/500 (Errors)
Word Count
  • • Total words on page
  • • Indicates content depth
  • • Helps identify thin content

6Saving and Exporting Data

Save to Project

Click the "Save Results" button to save your crawl data to a project for future reference.

  1. 1Click "Save Results" button
  2. 2Enter a descriptive name
  3. 3Select or create a project
  4. 4Confirm to save

Project Benefits

  • • Access saved crawls anytime
  • • Compare crawls over time
  • • Share results with team members
  • • Build a knowledge base

7Best Practices

Do's
  • • Start with smaller page limits for testing
  • • Keep "Respect Robots" enabled
  • • Save important crawls to projects
  • • Use descriptive names when saving
  • • Crawl during off-peak hours for large sites
  • • Review results systematically
Don'ts
  • • Don't crawl the same site repeatedly in short periods
  • • Don't set extremely high page limits unnecessarily
  • • Don't ignore robots.txt restrictions
  • • Don't crawl sites you don't have permission to analyze
  • • Don't rely solely on automated analysis

8Troubleshooting

Common Issues & Solutions

Crawl Fails to Start

  • • Check if the URL is valid and accessible
  • • Ensure the website is online
  • • Try with "https://" prefix
  • • Check your internet connection

No Pages Found

  • • Website may block crawlers
  • • Check robots.txt restrictions
  • • Try switching to sitemap crawl mode
  • • Verify the site has internal links

Crawl Takes Too Long

  • • Reduce the page limit
  • • Website may have slow response times
  • • Try crawling during off-peak hours
  • • Cancel and retry with fewer pages

Missing Data in Results

  • • Some pages may lack title tags or H1s
  • • JavaScript-heavy sites may not render fully
  • • Check if pages require authentication
  • • Verify page structure is standard HTML

Ready to Start Crawling?

Now that you've seen the visual guide and read the detailed instructions, try the crawler on your own website or a competitor's site to discover SEO opportunities and technical issues.

Activity Timeline

1h 0mremaining
0