New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
We have new website abexpress.ae. The website is not ranking in the first page when I search with the Domain name itself. Any tips to improve the SEO for a new website?
We have launched a website abexpress.ae. The website is not ranking well even if we search with the Domain name. The Domain authority of the website it 10. How to increase this?
On-Page Optimization | | AlliedTransport0 -
301 vs. keeping identical URL
Hey everybody! I have a question pertaining to our redesign. The situation is as follows: /drug-rehab/alcohol-withdrawal-los-angeles gets a decent amount of views on out website, and we would like it to be on our redesigned site. I was curious what impact, if any, I would see given the two scenarios below. 301 to /alcohol-withdrawal make the new page /drug-rehab/alcohol-withdrawal-los-angeles as well The second situation is that there are a serious of other pages which don't seem to be of drastic benefit, which I don't feel NEED to be on the website. For example: /post-acute-alcohol-withdrawal-treatment/drug-los-angeles /rehabs-resources/drug-abuse/sub-acute-alcohol-withdrawal etc It appears to me that the content on these pages is rather similar, and I feel like they don't really say anything special. Can I 301 them to the new page? Should I let them die in the black hat inferno they were made in? Any thoughts are greatly appreciated! Thanks guys!
On-Page Optimization | | HashtagHustler0 -
Keyword in URL?
I have a website that has been live for about 8yrs. I do not have any significant rankings for my main keywords but am now starting SEO on my site. I am contemplating changing the url to contain the main keyword prefixed by my brand name. Any views on the ranking benefits and or CTR benefits.
On-Page Optimization | | Johnnyh
Example:
Main Volume keyword - 'car leasing'
current url - www.bobleasing.co.uk (made up name) thinking of changing to - www.bobcarleasing.co.uk (made up name) Any advice would be much appreciated. John0 -
Blocked By Meta Robots
Hi I logged in the other day to find that over night I received 8347 notices saying certain pages are being kept out of the search engine indexes by meta-robots. I have not changed my robots.txt in years and I certainly didn't block Google from visiting those pages. Is this a fault on Roger Mozbot behalf? Or is there a bot preventing 8000+ pages being indexed? Is there a way to find out what meta-robot is doing this and where? And how I can get rid of it? I usually rank between #3 and #5 for the term 'sex toys' on google.com.au, but I now rank #7 to #9 so it would seem some of my pages/content is being blocked. My website is www.theloveshop.com THIS IS AN ADULT TOYS SITE. There is no porn videos or anything like that on it, but just in case you don't wish to look at sex toys or are around kids I thought I would mention it. Blake
On-Page Optimization | | wayne10 -
Importance of URL Structure
We are trying to restructure our onpage SEO and want to make sure we have our URLs correct. The problem is we did the URLs incorrectly in the first place and the ones we currently have are several years olds. We have some URLs such as: http://www.firebrandtraining.co.uk/courses/management/prince2.asp and
On-Page Optimization | | RobertChapman
http://www.firebrandtraining.co.uk/courses/cisco/ccna_2007.asp which are not ideal but user experience aside does it make sense for us to change the URLs and use 301 redirects to the new ones or is the damage done to our natural rankings simply not worth making the change? I have read different articles saying different things, some say that URL structure has little weight (if any weight at all) on rankings while other people seem to say it is quite important. In addition we have heard that changing the URLs with a 301 redirect will cause a large drop in ranking which will take months to recover from and contrarily that 301s are now considered "ok" by Google and we shouldn't see too much change at all in our rankings. Any advice would be much appreciated.0 -
Photogallery and Robots.txt
Hey everyone SEOMOZ is telling us that there are to many onpage links on the following page: http://www.surfcampinportugal.com/photos-of-the-camp/ Should we stop it from being indexed via Robots.txt? best regards and thanks in advance... Simon
On-Page Optimization | | Rapturecamps0 -
Duplicate page title issues with a CMS
I am using MODx as a CMS on a site and trying to eliminate duplicate page titles. url.com/ url.com/[~897~] which is really {~897~} its a resource number. url.com/home/ How can I resolve this issue when its all one page in the CMS? thanks
On-Page Optimization | | tjsherrill0 -
Site URL's
We are redeveloping our website, and have the option to amend URLs (with 301 redirects from old URL to new), so my question is: Would 'golfsite.com/golf-clubs' achieve superior rankings than 'golfsite.com/clubs' for the search term 'golf clubs' if all other factors were the same? Should the URL reflect the intended search term wherever possible?
On-Page Optimization | | swgolf1230