New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate URL's in Sitemap? Is that a problem?
I submitted a sitemap to on Search Console - but noticed that there are duplicate URLs, is that a problem for Google?
On-Page Optimization | | Luciana_BAH0 -
Display URL changed for Organic Regional Searches?
Hey. We currently have a US version of the site (/us) which is displaying in US Google Search. However the /us domain is also showing for our co.uk Google searches. Is there a way I can ensure the /us is only displayed for US searches? Any thoughts would be greatly appreciated 🙂
On-Page Optimization | | LoveCrafts0 -
URL Structure
What's the best way to set up a url structure? When a user goes through the funnel should it show it in the url? Like this: domain.com/thickness/high-density/1-mil-plastic-bags (1 mil plastic bags is a subcategory - when the user is at this page they will see many products. When they select one - it brings them to a product detail page which I think should be done like this: domain.com/product-name regardless of the funnel that brought them there. Does this make sense?) or **domain.com/1-mil-plastic-bags ** Also, is there a limit of how many "/" could be used?
On-Page Optimization | | EcomLkwd0 -
URL extensions naming
I have always wrote URL extensions as www.mysite.com/two_words.html .... when I need to separate two words, I use _ as the separator ... I am a first time SEO Moz user ... I While looking around the tools on SEO Moz, I happened to stumble across the on-page analysis. A great tool indeed, rather worryingly though, one issue it flagged to me was my URL extension "Characters which are less commonly used in URLs may cause problems with accessibility, interpretation and ranking in search engines. It is considered a best practice to stick to standard URL structures to avoid potential problems." Can someone advice me if this really is a problem, its just not this project, its tons of sites I have already developed that I am also worried about ... I always write file extensions with more than one word using _ to separate the words. How should I write the extension, I am almost embarrassed to ask this question ... Surely, even Google's algorithms are not smart enough to decipher two words without some some sort of spacing .... Regards J
On-Page Optimization | | Johnny4B0 -
Getting Google to provide a different URL in SERP
For one of my client’s sites, I have several keywords that are ranking in the top 5 positions. However, they have a high bounce rate. I believe this is because Google is delivering a different URL than the page we have optimized for the keyword. Any suggestions on ways I can get Google to present our preferred page?
On-Page Optimization | | TopFloor0 -
Using categories in Permalinks
I am looking at updating my WP Permalink structure and wanted to know if I should continue to include the category after my domain as in www.maximphotostudio.net/weddings/6081/columbus_wedding_photography/ or maybe www.maximphotostudio.net/6081/columbus_wedding_photography and www.maximphotostudio.net/6082/dayton_wedding_photography. Any help is appreciated.
On-Page Optimization | | maximphotostudio0 -
Duplicate product urls
Our site automatically creates shorter urls for the products. There is a rel canonical tag in place, but webmaster tools shows these urls have duplicate title tags. Here is an example: http://www.colemanfurniture.com/holden-desk.htm http://www.colemanfurniture.com/writing-desks-secretary-desks/holden-desk.htm Should the longer url be redirected to the shorter one?
On-Page Optimization | | thappe0 -
Using commas in the title tag?
Is there a disadvantage/advantage to using commas to separate words in the title tag. Which will be more effective as a title tag: "keyword1 keyword2 - Brand" OR "keyword1, keyword2 - Brand"?
On-Page Optimization | | Audiohype0