New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is using hyphens in a URL to separate words good practice?
Hi guys, I have a client who wants to use a hyphen to separate two words in the URL to make each work stand out. Is is good or bad practice to use a hyphen in a URL and will it affect rankings? Thanks!
On-Page Optimization | | StoryScout0 -
Using Canonical Tags on Every Page
I'm doing competitive research and noticed that one of our competitors (who outranks us) uses canonical tags on every page on their site. The canonical tags reference the page they are on. For example. www.competitor.com/product has a canonical tag of www.competitor.com/product. Does anyone use this practice? It seems strange to me. Thank you, Kristen
On-Page Optimization | | Ksink0 -
301 vs. keeping identical URL
Hey everybody! I have a question pertaining to our redesign. The situation is as follows: /drug-rehab/alcohol-withdrawal-los-angeles gets a decent amount of views on out website, and we would like it to be on our redesigned site. I was curious what impact, if any, I would see given the two scenarios below. 301 to /alcohol-withdrawal make the new page /drug-rehab/alcohol-withdrawal-los-angeles as well The second situation is that there are a serious of other pages which don't seem to be of drastic benefit, which I don't feel NEED to be on the website. For example: /post-acute-alcohol-withdrawal-treatment/drug-los-angeles /rehabs-resources/drug-abuse/sub-acute-alcohol-withdrawal etc It appears to me that the content on these pages is rather similar, and I feel like they don't really say anything special. Can I 301 them to the new page? Should I let them die in the black hat inferno they were made in? Any thoughts are greatly appreciated! Thanks guys!
On-Page Optimization | | HashtagHustler0 -
URL advice
Hi & thanks for looking, I'm not sure if I've adopted the best SEO URL structure for my site, www.vintageheirloom.com For instance, www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/ Works great for the top level category 'All bags', as I'm trying to keyword authentic designer vintage bags. However the sub categories for instance 'Clutch bags' appears as, www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/vintage-clutch-bags/. As you can see at the moment this URL contains duplicate terms vintage & bags. I'm guessing that duplicate keywords in a url isn't too smart, but should amend with Option 1, 2, 3 or something completely different? Option 1 - keep the top level category url the same, change the subcategory: www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/clutch/ Option 2 - amend the top level category: www.vintageheirloom.com/product-category/authentic-designer/vintage-clutch-bags/ Option 3 - amend the top level category as this: www.vintageheirloom.com/product-category/bags/authentic-designer-vintage-clutch/ By the way I'm using WordPress with Woocommerce. I've asked but it's not possible with some technical issues to remove the /product-category/ section. But each product is for example just: www.vintageheirloom.com/shop/vintage-coach-yellow-duffel-sac-bag/ .... sweet. Thanks again !!
On-Page Optimization | | well-its-1-louder0 -
Description tag not showing in the SERPs because page is blocked by Robots, but the page isn't blocked. Any help?
While checking some SERP results for a few pages of a site this morning I noticed that some pages were returning this message instead of a description tag, A description for this result is not avaliable because of this site's robot.s.txt The odd thing is the page isn't blocked in the Robots.txt. The page is using Yoast SEO Plugin to populate meta data though. Anyone else had this happen and have a fix?
On-Page Optimization | | mac22330 -
Modify URL, how to re-index
hello, I have just modified URL, do I need to re-submit sitemap or something else to search engines?
On-Page Optimization | | JohnHuynh0 -
URL structure for a new WordPress site
Hi I'm building a new next big thing website from scratch (for a translation agency) and I encountered an issue with the URL structure. I need to chose the URL for important targeted keyword pages and I have a conflict between two tools I'm using. Please read below the situation: domain: mashtranslation.com target keyword: french translation services which URL you think is better from a SEO point of view (and possibly for users): mashtranslation.com/services/french/ OR mashtranslation.com/french-translation-services/ I'm asking this because one WordPress plugin (Wordpress SEO by Yoast) says the URL structure is not optimised while another tool (Market Samurai) says the URL is optimised.
On-Page Optimization | | flo20 -
All Caps in URL
Hello, we're working with a corporate client to make changes to their URL structure. We recommended that they use a structure like "domain.com/state/city/location". Their IT department is on board, but they just mentioned that all of the state and city info will be pulled from a database where it is all caps. So it would be like this "domain.com/STATE/CITY/location" I'm concerned that it may be spammy, but I can't find any definitive information online. We usually like to test issues like this on our own sites before advising clients, but we're working up against a quick deadline. Any help with this issue would be great, especially real world experience, not just theories. Thank you.
On-Page Optimization | | interactivek0