How to extract URLs from a site (without bringing the server down!)

neooptic

Hi everybody.

One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.

However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.

Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!

Dr-Pete

Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):

http://www.screamingfrog.co.uk/seo-spider/

It's a good tool, and nice to have around, IMO.

Dan-Petrovic

Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?

AlanMosley

why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv

neooptic

Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?

YannickVeys

Scrape Google?
Make your own scraper and keep the requests per second really low ?
Maybe the site has an automated sitemap somewhere ?
Google webmaster tools -> download "internal links" table

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How to extract URLs from a site (without bringing the server down!)

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?

Category URL Pagination where URLs don't change between pages

Numbers in URL

Inurl: search shows results without keyword in URL

Staging site and "live" site have both been indexed by Google

Special characters in URL

Is there a great tool for URL mapping old to new web site?

Using a third party server to host site elements

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved