How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I make a new URL just so it can include a target keyword, then 301 redirect the old URL?
This is for an ecommerce site, and the company I'm working with has started selling a new line of products they want to promote.Should I make a new URL just so it can include a target keyword, then 301 redirect the old URL? One of my concerns is losing a little bit of link value from redirecting. Thank you for reading!
Technical SEO | | DA20130 -
What is the best practice to seperate different locations and languages in an URL? At the moment the URL is www.abc.com/ch/de. Is there a better way to structure the URL from an SEO perspective?
I am looking for a solution for using a new URL structure without using www.abc.com**/ch/de** in the URL to deliver the right languages in specific countries where more than one language are spoken commonly. I am looking forward to your ideas!
Technical SEO | | eviom0 -
Will sitemap generated in Yoast for a combined wordpress/magento site map entire site ?
Hi For an ecommerce site thats been developed via a combination of wordpress and magento and has yoast installed, will the sitemap (& other yoast features) map (& apply to) the entire site or just wordpress aspects ? In other words does one need to do anything else to have a full sitemap for a combined magento/wordpress site or will Yoast cover it all ? This link seems to suggest should be fine but seeing if anyone else encountered this and had problems or if straightforward ? http://fishpig.co.uk/wordpress-integration/docs/plugins.html cheers dan
Technical SEO | | Dan-Lawrence0 -
Why my site is not indexing in google
In google webmaster i have updated my sitemap in Mar 6th..There is around 22000 links..But google fetched only 5300 links for long time...
Technical SEO | | Rajesh.Chandran
I waited for 1 month till no improvement in google index..So apr6th we have uploaded new sitemap (1200 links totally)..,But only 4 links indexed in google ..
why google not indexing my urls? Is this affect our ranking in SERP? How many links are advisable to submit in sitemap for a website?0 -
How to link site.com/blog or site.com/blog/
Hello friends, I have a very basic question but I can not find the right answer... I have made my blog linkbuilding using the adress "mysite.com/blog" but now im not sure if is better to do the linkbuilding to "mysite.com**/blog/ "** Is there any diference? Thanks...
Technical SEO | | lans27870 -
Redirect old URL's from referring sites?
Hi I have just came across some URL's from the previous web designer and the site structure has now changed. There are some links on the web however that are still pointing at the old deep weblinks. Without having to contact each site it there a way to automatically sort the links from the old structure www.mydomain.com/show/english/index.aspx to just www.mydomain.com Many Thanks
Technical SEO | | ocelot0 -
Young site trying hard, but banging head against the wall -- Site Review
Hi All New to PRO but we're seriously committed to getting this working. And firstly thank you to anyone who offers any useful thoughts and insights. We've launched a new site, unfortunately late to the market for the season and are really struggling to get search engine recognition. Site: http://www.ignitehats.co.uk/ We're continuously adding new content, slowly gathering more links and working hard to promote socially. But even on our clearest search terms like "Ignite hats" we're down on page 4. Both GWT and the Seomoz tools highlight no big problems (a few titles that are too long) but otherwise nothing. Maybe wrongly we requested that the Google spam team review our site incase it was being penalised, but got a template response saying the site was not in their spam system (phew, there wasn't a reason it should be we believe). We're wondering if this is just that our site is just too young? It's been live for 6 weeks. But worry maybe this is not the case. We've had success with another site we run much sooner than this. Any help or pointers would be really appreciated. Similar stories and what others have done, at least to give us some confidence to carry on would be great. Thanks for reading.
Technical SEO | | JHill0 -
Recently revamped site structure - now not even ranking for brand name, but lots of content - what happened? (Yup, the site has been crawled a few times since) Any ideas? Did I make a classic mistake? Any advise appreciated :)
I've completely disappeared off Google - what happened? Even my brand name keyword does not bring up my website - I feel lost, confused and baffled on what my next steps should be. ANY advice would be welcome, since there's no going back to the way the site was set up.
Technical SEO | | JeanieWalker0