Website Crawler Tutorial
Learn how to use the SEO by the Hour website crawler to analyze any website for SEO insights, technical issues, and content opportunities.
Interactive Tutorial Slides
Navigate through our comprehensive visual guide using the interactive slide deck below. Use the navigation controls, thumbnails, or keyboard arrows to explore each step of the crawler tutorial.

Website Crawler Tutorial: Master SEO Analysis
Learn how to use the SEO by the Hour website crawler to analyze any website for SEO insights, technical issues, and content opportunities.
Use arrow keys ← → to navigate slides, or click thumbnails to jump to specific slides
Pro Tip
For a comprehensive written walkthrough, continue reading the detailed sections below.
1What is the Website Crawler?
The SEO by the Hour website crawler is a powerful tool that automatically visits and analyzes web pages on any website. It extracts valuable SEO data including page titles, meta descriptions, headings, content length, and technical information.
SEO Analysis
Analyze titles, meta descriptions, headings, and content structure
Technical Insights
Check status codes, response times, and technical issues
Content Audit
Review content length, word counts, and content gaps
Pro Tip
2Getting Started
Accessing the Crawler
Navigate to the crawler tool using the sidebar navigation or by visiting /crawler
directly.
Navigation Path
Sidebar → Crawler → Website Crawler Dashboard
System Requirements
- Modern web browser (Chrome, Firefox, Safari, Edge)
- Stable internet connection
- JavaScript enabled
3Running Your First Crawl
1Enter the Website URL
In the URL input field, enter the website you want to crawl. You can enter:
- Full URL:
https://example.com
- Domain only:
example.com
- Subdomain:
blog.example.com
2Set Page Limit
Choose how many pages you want to crawl (1-500 pages). For your first crawl, we recommend starting with 10-20 pages.
Recommended Limits:
- Small sites: 10-50 pages
- Medium sites: 50-150 pages
- Large sites: 150-500 pages
3Start the Crawl
Click the "Start Crawl" button to begin the analysis. The crawler will:
- 1Connect to the website
- 2Discover pages to crawl
- 3Extract content and metadata
- 4Analyze SEO elements
- 5Present results
4Advanced Settings
URL Crawl (Default)
Starts from the entered URL and follows internal links to discover pages.
Sitemap Crawl
Uses the website's XML sitemap to find pages to crawl.
When enabled, the crawler will check and follow the website's robots.txt file rules.
5Analyzing Results
Understanding the Results Table
Once the crawl completes, you'll see a comprehensive table with the following columns:
- • Full page URL
- • Clickable external link
- • Truncated for readability
- • Page title tag content
- • Shows "No title" if missing
- • Critical for SEO ranking
- • Main heading content
- • Should match title theme
- • Important for content structure
- • HTTP response codes
- • Green: 200 (OK)
- • Red: 400/500 (Errors)
- • Total words on page
- • Indicates content depth
- • Helps identify thin content
SEO Analysis Tips
6Saving and Exporting Data
Save to Project
Click the "Save Results" button to save your crawl data to a project for future reference.
- 1Click "Save Results" button
- 2Enter a descriptive name
- 3Select or create a project
- 4Confirm to save
Project Benefits
- • Access saved crawls anytime
- • Compare crawls over time
- • Share results with team members
- • Build a knowledge base
7Best Practices
- • Start with smaller page limits for testing
- • Keep "Respect Robots" enabled
- • Save important crawls to projects
- • Use descriptive names when saving
- • Crawl during off-peak hours for large sites
- • Review results systematically
- • Don't crawl the same site repeatedly in short periods
- • Don't set extremely high page limits unnecessarily
- • Don't ignore robots.txt restrictions
- • Don't crawl sites you don't have permission to analyze
- • Don't rely solely on automated analysis
Ethical Crawling
8Troubleshooting
Crawl Fails to Start
- • Check if the URL is valid and accessible
- • Ensure the website is online
- • Try with "https://" prefix
- • Check your internet connection
No Pages Found
- • Website may block crawlers
- • Check robots.txt restrictions
- • Try switching to sitemap crawl mode
- • Verify the site has internal links
Crawl Takes Too Long
- • Reduce the page limit
- • Website may have slow response times
- • Try crawling during off-peak hours
- • Cancel and retry with fewer pages
Missing Data in Results
- • Some pages may lack title tags or H1s
- • JavaScript-heavy sites may not render fully
- • Check if pages require authentication
- • Verify page structure is standard HTML