What is the best tool to crawl a site with millions of pages?

iCrossing_UK

I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages.

What tools will allow me to crawl a site with millions of pages without crashing?

McCannSEO

Don't forget to exclude pages that don't contain the information you are looking for - exclude query parameters which just result in duplicate content, system files, etc. That may help to bring the amount down.

iCrossing_UK

Only basic stuff: URL, Title, Description, and a few HTML elements.

I am aware that building a crawler would be fairly easy, but is there one out there that already does it without consuming too many resources?

YannickVeys

For what purpose do you want to crawl the site?

A web crawler isn't really hard to write. In 100 lines of code you can probably code one. The question is of course: what do you want out of the crawl?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

What is the best tool to crawl a site with millions of pages?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Beta Site Removal best practices

Google Indexed Site A's Content On Site B, Site C etc

Removing massive number of no index follow page that are not crawled

On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile

Huge e-commerce site migration - what to do with product pages?

Where is the best place to put a sitemap for a site with local content?

How to generate xml sitemape for an ecommerce site with more than 50000 pages?

1200 pages no followed and blocked by robots on my site. Is that normal?