How to extract URLs from a site (without bringing the server down!)

neooptic

Hi everybody.

One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.

However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.

Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!

Dr-Pete

Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):

http://www.screamingfrog.co.uk/seo-spider/

It's a good tool, and nice to have around, IMO.

Dan-Petrovic

Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?

AlanMosley

why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv

neooptic

Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?

YannickVeys

Scrape Google?
Make your own scraper and keep the requests per second really low ?
Maybe the site has an automated sitemap somewhere ?
Google webmaster tools -> download "internal links" table

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How to extract URLs from a site (without bringing the server down!)

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Our clients Magento 2 site has lots of obsolete categories. Advice on SEO best practice for setting server level redirects so I can delete them?

Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?

Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.

Absolute URL or Relative URL in my sitemap?

Inurl: search shows results without keyword in URL

Trailing Slashes In Url use Canonical Url or 301 Redirect?

Urls with or without .html ending

Should me URLs be uppercase or lowercase

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved