How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How are Server side redirects perceived compared to direct links (on a Directory site)
Hi, Im creating some listings for a client on a relevant b2b directory (a good quality directory) I asked if the links are 'followed' or no 'followed' and they said they are 'server side redirects' so no direct links. Does anyone know how these are likely to be perceived by Google ? All BEst Dan
Technical SEO | | Dan-Lawrence1 -
Are there any negative side effects of having millions of URLs on your site?
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning. Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning? Any insight appreciated!
Technical SEO | | Deluxe0 -
URL Understanding -
Hello everyone! Can anyone help me understanding this url? Product.asp?PID=1236 cheers
Technical SEO | | PremioOscar0 -
Will sitemap generated in Yoast for a combined wordpress/magento site map entire site ?
Hi For an ecommerce site thats been developed via a combination of wordpress and magento and has yoast installed, will the sitemap (& other yoast features) map (& apply to) the entire site or just wordpress aspects ? In other words does one need to do anything else to have a full sitemap for a combined magento/wordpress site or will Yoast cover it all ? This link seems to suggest should be fine but seeing if anyone else encountered this and had problems or if straightforward ? http://fishpig.co.uk/wordpress-integration/docs/plugins.html cheers dan
Technical SEO | | Dan-Lawrence0 -
What can we do to improve our site
Hi. I am hoping that some of you can help me with the in2town site www.in2town.co.uk The site is a news/lifestyle magazine site. The site is a cross between, huffington post, digital spy, female first and the sun newspaper. Basically the site is a news site as well as covering showbiz news, travel news, health news and advice etc What i would like is for people to look at the site and let me know what they feel i should do to improve the site to make it better for our readers and to gain more readership. I would also like to hear from people on how they find moving around the site as well as the speed of the site. At the moment the site is with an american hosting company and i am in the process of talking to UK hosting companies to move the site. The site is currently on a dedicated server. It would mean a lot if people could give me their advice on how to improve the site and make it a beter experience for our readers while at the same time being able to generate income with the site. Just a quick note, all content is original and we have a number of people who write for the site. many thanks
Technical SEO | | ClaireH-1848860 -
How to find all the links to my site
hi i have been trying to find all the links that i have to my site http://www.clairehegarty.co.uk but i am not having any luck. I have used the open explorer but it is not showing all the links but when i go to my google webmaster page it shows me more pages than it does on the semoz tool. can anyone help me sort this out and find out exactly what links are going into my site many thanks
Technical SEO | | ClaireH-1848860 -
Where to place your brandname in your URL?
Hello everybody! Quick and short question: What is better when you want to rank for your your brandname? www.jobsbrandname.com or www.brandnamejobs.com I think for SEO it's better to use the last one but marketing has the wish to use the first one. Thanks for your responce!
Technical SEO | | ltom0 -
Way to find how many sites within a given set link to a specific site?
Hi, Does anyone have an idea on how to determine how many sites within a list of 50 sites link to a specific site? Thanks!
Technical SEO | | SparkplugDigital0