New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I use £ character in title tag?
Hi - Can you use something like "save £££s" in the title tag. Have looked but can't see an answer. Thanks.
On-Page Optimization | | StevieD0 -
Help recover lost traffic (70%) from robots.txt error.
Our site is a company information site with 15 million indexed pages (mostly company profiles). Recently we had an issue with a server that we replaced, and in the processes mistakenly copied the robots.txt block from the staging server to a live server. By the time we realized the error, we lost 2/3 of our indexed pages and a comparable amount of traffic. Apparently this error took place on 4/7/19, and was corrected two weeks later. We have submitted new sitemaps to Google and asked them to validate the fix approximately a week ago. Given the close to 10 million pages that need to be validated, so far we have not seen any meaningful change. Will we ever get this traffic back? How long will it take? Any assistance will be greatly appreciated. On another note, these indexed pages were never migrated to SSL for fear of losing traffic. If we have already lost the traffic and/or if it is going to take a long time to recover, should we migrate these pages to SSL? Thanks,
On-Page Optimization | | akin671 -
Changing url (permalink) structure of website??
Currently I'm working on SEO of one website www.mocomi.com. I want to change url (permalink) structure of entire website which has more than 5000 pages. Currently website have structure of http://mocomi.com/tenali-raman-the-kings-condition/ Which I want to change it to http://mocomi.com/fun/stories/tenali-raman/tenali-raman-the-kings-condition/ Likewise I want to change entire website permalink url structure to make site architecture more SEO friendly. Which means I'am going to add only categories & subcategories before actual link. Kindly guide with following questions which I need to move forward with this step. How much is it worth to change URL structure? Checklist or factors I need to consider while making this decision? Is it a good practice to change URL's of entire website at once or Should I change it in Parts? How much time it takes google to rank those urls again? Which are the best practices to do so?
On-Page Optimization | | Mocomi1 -
Use of the word Find in Title Tags
Hey, So i'm looking to make content that is optimized for Finding an injury lawyer in boston. The Phrase "Personal Injury Lawyers in Boston" get's a lot more searches than "Find Personal Injury Lawyers in Boston" but with the Find is it less competitive? The same thing goes for "Find lawyers in Boston" vs. "Lawyers in Boston." My question is, is it better to put the word FInd in front or not? Is there a downside?
On-Page Optimization | | RafeTLouis0 -
Appropriate Use of Rel Canonical
When I'm checking my page on SEOmoz should I use http://www. or http:// or www. or just keyword.com? And I get this for my check Appropriate Use of Rel Canonical Moderate fix <dl> <dt>Canonical URL</dt> <dd>XXX</dd> <dt>Explanation</dt> <dd>If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL.</dd> <dt>Recommendation</dt> <dd>We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply.</dd> <dd>I have absolutely NO idea what this means 😞
On-Page Optimization | | 678648631264
</dd> </dl>0 -
Wordpress categories tags and robots.txt
I am relatively new at this and see a variety of people that seem to disagree on if you should block google from indexing category and tag pages through robot.txt or no-follow because of google viewing it as duplicate content. I tryst this communities answers over the web at large obviosly, so what do you all think? Thanks, Steven
On-Page Optimization | | sfmatthews0 -
Using new domain in existing website
We have already a site and we want to make business in another country. We don't want to make a copy of our site because it requires more resources, but we want to use a different domain because the business will be run by a partner in that country. The idea is to make a folder in our site, www.mysite.com/countryname/ and associate a new domain www.newdomain.com, so when users go to the new domain they see the content under www.mysite.com/country/. Since de www.newdomain.com will use DNS redirection, the current domain won't be seen. Is this correct from a SEO perspective? Thanks!
On-Page Optimization | | Xopie1 -
What reasons exist to use noindex / robots.txt?
Hi everyone. I realise this may appear to be a bit of an obtuse question, but that's only because it is an obtuse question. What I'm after is a cataloguing of opinion - what reasons have SEOs had to implement noindex or add pages to their robots.txt on the sites they manage?
On-Page Optimization | | digitalstream0