New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using Bold text for keywords
Hello I am updating an old e-commerce website of mine and many keywords are in bold - shall I remove the bold tag or keep them there? This is for SEO.
On-Page Optimization | | xdunningx0 -
Does a / at the end of a URL create a duplicate page?
Hello, I have just used (the amazing) Screaming Frog to check my site and it is listing the two following pages as having duplicate titles, making me think it is seeing them as duplicate pages. http://zenplugs.com/zenplugs-molded-earphones/ http://zenplugs.com/zenplugs-molded-earphones Do I need to redirect one of these? Thanks in advance! Toby
On-Page Optimization | | T0BY0 -
Removing & redirecting old pages
Hi In the case of an e-commerce store when you remove/delete an old sub department page because you no longer sell the products that page was for can you/should you redirect it to its parent main category/dept page first or just delete and let become a 404 ? Being a sub department it is obviously closely related to the main category/dept but seem strange to 301 it since its not really moved permanently to that page but i hear that's what people do to transfer that pages authority before deleting it so its not lost cheers dan
On-Page Optimization | | Dan-Lawrence0 -
To change or not to change site URL structure?
I am learning my way around SEO, having always used professionals for it in the past on previous businesses i have decided to do it myself and learn more about it. Now the dilemma i am up against is i recently changed some of my permalinks on quite a few main pages throughout the site. The site launched in April this year so we're quite new. The problem is since my last change i have not seen any increase...a decrease which in fact hasn't recovered at all. Having now analysed them in more depth and read up more on the whole subject of SEO, (which is endless) i have put together a complete new strategy; with this increased understanding of what i am doing (but by no means conclusive) i want to complete a full overhaul on all SEO (via Wordpress which i use along with YOAST SEO tools), ensuring i have all my keywords, permalinks and descriptions spot on throughout every page, post and picture. I spent a lot of time mapping these out, ensuring there is no Focus keyword duplication, and that the site is relatively flat in terms of its layout. What i am unsure about now is whether changing my permalinks again is a bad thing to do?
On-Page Optimization | | MrCostello
Could it permanently damage my rep going forward?
Should i just focus on my content and keywords/descriptions? I am at a loss as i don't want to do irreparable damage to our reputation. The site is still reasonable easy to manage so changing now is the best time to do it, but if changing the URLs is a waste of time then i may just forget that and just work on the keywords, descriptions and content. Advice is 'oh so welcome' 🙂0 -
Why Isnt My New Article Indexed?
I posted this article last night: http://www.londontri.com/325/tomtom-runner-gps-watch-review It didn't appear in Google's index this morning despite me pointing a few high quality links to it (not keyword optimized links, just links from high quality forum posts) On closer examination I thought that the problem could be due to a keyword stuffing penalty so I have made sure that I am not repeating too many words/word combinations using a keyword density checker but the article is still not indexed. Any ideas what could be going on?
On-Page Optimization | | ross88guy0 -
Canonical URL tags help I am not sure what this is
I am trying to get an A grade on my webpage and this is one of the critical steps canonical URL tags I cant find much information as to what this even is never mind fixing it. Thanks I am a total newbe at this any advice is appreciated
On-Page Optimization | | gemfirez0 -
Custom Landing Page URLs
I will begin creating custom landing pages optimized for long-tail keywords. Placing the keywords in the URL is obviously important -- Question: would it be detrimental to rankings to have extra characters extending past the keyword? I'm not able to use tracking code, but need to put an identifier in the URL (clp = custom landing page). For example, is "www.domain.com/silver-fish.html" going to perform meaningfully better than "www.domain.com/silver-fish-clp.html" for the kw phrase "silver fish"? There will obviously be a lot of on-page optimization in addition to just structuring the URLs. Thank you. SIMbiz
On-Page Optimization | | SIMbiz0 -
The Better Title to Use?
Hello Mozers- I am targeting the keywords "liposuction scottsdale", "liposuction phoenix", "liposuction mesa", "liposuction arizona". Out of the following two Titles below, one would you consider the better one? And Why? I am leaning towards the second example. If you have any more ideas that would be great. 1). Tummy Tuck Scottsdale, Phoenix, Mesa, Arizona 2).Tummy Tuck Scottsdale | Tummy Tuck Phoenix, Mesa, Arizona. Thanks!!
On-Page Optimization | | Red_Spot_Interactive0