How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do you think about my new site?
Hi everyone, I'm looking for a review for my new site www.interlive.it Could you please let me know what do you think about the work that I did for my site. I'll be very happy to receive your suggestions. Regards, Mike
Technical SEO | | salvyy0 -
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated! Thanks
Technical SEO | | BestRide0 -
Duplicate content with same URL?
SEOmoz is saying that I have duplicate content on: http://www.XXXX.com/content.asp?ID=ID http://www.XXXX.com/CONTENT.ASP?ID=ID The only difference I see in the URL is that the "content.asp" is capitalized in the second URL. Should I be worried about this or is this an issue with the SEOmoz crawl? Thanks for any help. Mike
Technical SEO | | Mike.Goracke0 -
Have my site been penalised ?
Hi, I recently hired a link builder (who works for a digital marketing agency and wanted to earn some extra cash ) to do some link building for me on keyword Carpet Cleaner Hire Basically , May my ranking for this work has moved from 29 up to 13 , then down to 500 + and now up to 67 ?. It's been all over the place and I am worried that maybe google thinks its unnatural or to many links etc. I looked at some of the work this chap did and there was a log of bookmarks done , apparently to increase social awareness and some articles on random forums I never heard of but the articles didn't look to good and in my eyes were a little spammy.. I don't have any messages in WMT saying I have been penalised or anything but I am naturally worried that this chap may have caused damaged and that I was manually penalised. I have since asked him to stop immediately.. does this sound normal for the ranking to be all over the place in this manner when link building ? How can I tell if I have been manually penalized in any way ? What;s the general consensus ?. Should I contact google or leave it and see what happens ? Other keywords seems to be okay though. Any advice much appreciated. thanks Sarah.
Technical SEO | | SarahCollins1 -
Redirecting a old aged site to a new exact match site?
Hi All, I have a question. I have 2 sites with me in the same sector and want some help. site 1 is a old site started back in 2003 and has some amount of links to it and has a pr 3 with some good links to it but doesn't rank much for any keywords for the timing. site 2 is a aged domain but newly developed with unique content and has a good amount of exact match with a .com version. so will there be any benefit by redirecting site 1 to site 2 to get the seo benefits and a start for link bulding? or is it best to develop and work on each site? the sector is health insurance. Thanks
Technical SEO | | macky71 -
Site maintenance and crawling
Hey all, Rarely, but sometimes we require to take down our site for server maintenance, upgrades or various other system/network reasons. More often than not these downtimes are avoidable and we can redirect or eliminate the client side downtime. We have a 'down for maintenance - be back soon' page that is client facing. ANd outages are often no more than an hour tops. My question is, if the site is crawled by Bing/Google at the time of site being down, what is the best way of ensuring the indexed links are not refreshed with this maintenance content? (ie: this is what the pages look like now, so this is what the SE will index). I was thinking that add a no crawl to the robots.txt for the period of downtime and remove it once back up, but will this potentially affect results as well?
Technical SEO | | Daylan1 -
301 an old URL with a ? in the URL?
I am redoing a site and the URL's are changing structure. The client's site was in magento and in the store they would get two URLs, for example: /store/categoryname/productname and /store/categoryname/productname?SID=dslkajsfdoiu947598whouieht983hg98 Do I have to 301 redirect both of these URL's to their new counterpart? Both go to the same content but magento seemed to add these SIDs into the navigation and Google has both versions in the index.
Technical SEO | | DanDeceuster0