How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing URL structure of site, including AMP - redirect AMP too?
So, I'm changing all the URLs of a site, including all its AMP URLs, I'll be redirecting all the normal URLs, but do I need to also redirect all the AMP pages?
Technical SEO | | alksfjasldfu934341 -
Breadcrumb JSON Extraction?
Sorry the title may not make the most sense as I'm not entirely sure what my question would be phrased as. https://developers.google.com/structured-data/breadcrumbs#examples We have breadcrumbs on our site, these are generated by a plugin. So for example we have: Where am I: Homepage Page 1 [Page 2](../../Page 2 "Page 2") <a id="ctl00_RptBreadcrumbs_ctl04_link" title="Page 3">Page 3</a> Do we have any way where we can implement this without development being involved? Alternatively is there anyway to use the current url? (as we do use folders) so an example being: http://domain.com/page1/page2/page3 Probably not possible but I live in hope.
Technical SEO | | ThomasHarvey0 -
Friendly URL
Can be Friendly URL installed on a custom made jobsite using mod rewrite / apache without any big interference to the system itself? Thank you.
Technical SEO | | tomaz770 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
Redirecting the .com of our site
Hey guys, A company I consult for has a different site for its users depending on the geography. Example: When a visitor goes to www.company.com if the user is from the EU, it gets redirected to http://eu.company.com If the user is from the US, it goes to http://us.company.com And so on. I have two questions: Does having a redirect on the .com will influence rankings on each specific sub-site? I suspect it will affect the .com since it will simply not get indexed but not sure if affects the sub domains. The content on this sub-sites are not different (I´m still trying to figure out why they are using the sub-domains). Will they get penalized for duplicate content? Thanks!
Technical SEO | | FDSConsulting0 -
Wordpress site, combine Blog without hurting SEO - Need Expert Advice
Hi, I come from the old html days of Frontpage and then moved to Dreamweaver. I first worked with Wordpress at version 2.7 and was not all that impressed, but then recently I worked in the new version and was extremely impressed. So my knowledge of Wordpress is VERY limited and plan to build future sites with it. I need to know the best way to solve an issue for a customer. The client is http://www.nextgenrestoration.com/ Site was built years ago with Frontpage. The popularity of Blogs was hot so someone told them that if they add new content it would be better to use a blog, so they added a blog. So you have the following: www.nextgenrestoration.com (main site) then they installed wordpress in a folder (blog) www.nextgenrestoration.com/blog Original person that built the site quit. New person took over and said the main site needed to changed to Wordpress because they did not have Frontpage and all they knew was Wordpress. Main site was converted to Wordpress. They wanted to keep the original design so they did not use a stock template, they just built it with their design. I guess from looking at the Editor, they manually went in and put the design in to match. Now.. this last month, the person that had changed
Technical SEO | | Force7
the site to Wordpress quit. So I got involved because the new person they hired could not add content to the main website. If you add a page, it does not show up, you have to manually go in the php and add the link to the category. The new person knows how to use Wordpress but she knows nothing about PHP so is lost when it comes to manually adding content to the site. Here was my Thoughts. The main site needs to be rebuilt in a stock template so it automatically creates new pages, blog posts. I have to make sure that if we change the
main website that we could keep all the same links and page names. The girl
that built the site, if you hover over the links that she put it under ‘florida’,
that must be a category. But we would need to keep the same page names. I know
we could do a 301 redirect but this guy cannot lose traffic. He is already down
in hits after the last Panda update. My thought was, rebuild the main site in a stock template so
someone can actually add content easily to the site. Also build a new blog
section so it all matches. (personally the existing design looks old and dated and needs updating) If you look at the site now. The blog looks totally
different and it is not helping if a customer comes to the blog but cannot see
the navigation for the whole site. My thought was to just leave the old blog, it has a LOT of backlinks. But just add a new blog to the main site and all new content goes there. The old blog would stay just make sure we did build in some call to action so it sends them to the main site. Also, we found we cannot create a Blog on the
wordpress we have installed in the main directory. I am guessing because it
wants to name it /blog? I want to be sure we give this client the best advice on what to do without
hurting his existing seo and traffic. As you can tell, I am not qualified to really give the best advice since I am so new to Wordpress. This is a small company that really needs some help. Thanks in advance for your time! Force70 -
Should I create mini-sites with keyword rich domain names pointing to my main site?
Hi, I'm new to seomoz (and seo in general) and loving it so far. My main domain name is more of a brandname than a search engine friendly list of keywords. I rank well for some keywords I optimized for, and less so for the more competitive keywords. I was wondering if making one page minisites hosted on keyword rich domain names could help in this respect? What I want to do is just have a single page with a few paragraphs of content and links to the main site. I am not looking for links to boost the main site, just for the minisites to do better for several keywords. Will this help? Is this ok, or against some Google policy? Can this hurt the main site rankings? Thank you! **Edit: **I noticed that sites ranking above me on the first page for some keywords have much less on-page elements than my page, have about the same domain trust and also very little inbound links. The only factor I can see is the exact match of keywords in the domain name.
Technical SEO | | Eladla1 -
How do I set up a site review for a password protected site?
We need to conduct a SEO analysis for a website that is on a private, password protected development site -- is there anyway for SEOMoz tools to access and analyze a PW protected site? Thank you, Sara Merten
Technical SEO | | kev110