New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating Content for over 50,000 Pages
Hi, Our site is a football (soccer) statistics sites. We gather information on upcoming games and post results of past games. At the moment we have over 50,000 pages of results each having in-game data displayed. The main problem I have is none of these match data pages has any text.Mostly tables of stats. Could anyone suggest a way of creating unique content for these pages? If I created some generic a paragraphic of text that changed based on stats and figures would this be seen as duplicate content?
On-Page Optimization | | jtatsubana0 -
Canonical URL Tag Usage
I have a large website, almost 1500 pages that each market different keywords for the trucking logistics industry. I don't really understand the new Canonical URL Tag USAGE. They say to use it so the page is not a duplicate but the page that MOZ is call for to have the tag isn't a duplicate. It promotes 1 keyword that no other page directly promotes. Here is the page address, now what tag would I put up in the HEAD so google don't treat it as a duplicate page. http://www.freightetc.com/c/heavyhaul/heavyhaul.php 1. Number 1 the actual page address because I want it treated like its own page or do I have to use #2 below? 2. I don't know why I would use #2 as I want it to be its own page, and get credit and listed and ranked as its own page. Can anyone clarify this stuff to me as I guess i am just new to this whole tag usage.
On-Page Optimization | | dwebb0070 -
Wordpress / Full URL In Menu Box
I came across an article online (not Moz) that says adding the full url in a menu is a Google standard. So when you make a menu link you put "www.example.com/page" instead of "/page". What are your thoughts on this? Any real reason to? Y840pbN lrwZPDj
On-Page Optimization | | InfinityTechnologySolutions0 -
What does Appropriate Use of Rel Canonical mean?
I'm a newb and see this on my report and I don't understand what it means. Any help?
On-Page Optimization | | smartapps0 -
Page Titles For Local - Help on URL Structure
Trying to figure out the best way to construct localized urls for the dental website. For example, If I have the URL:
On-Page Optimization | | Czubmeister
http://www.kooskidental.com/services/cosmetic-dentistry/
and If I want to make it local to the city I would use: http://www.kooskidental.com/services/richardson-tx-cosmetic-dentistry/ But what happens is that I have other options off the menu like: http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/teeth-whitening/ But if I am trying to rank for richardson tx teeth whitening, I would have to do http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/richardson-tx-teeth-whitening/ But that's pretty long and ugly and I don't think I need richardson-tx in their twice. If I am trying to rank for richardson tx cosmetic dentistry and richardson tx teeth whitening, what would be the best structure for the url's?0 -
Long url > 115
Hi, in my web code I have link to my images that are resize online and the link is very long. like this src="http://img.espectador.com/mediadelivery/?fn=&i_enc=1&i_a=L2hvbWUvZXNwZWN0YWRvci93d3cvaW1hZ2VuZXMvMjUwMTY2XzEzNDk5NTQ0NjFfY29uc3RydWNjaW9uLmpwZw==&i_cl=1&i_tr=100&i_q=70&i_rt=0&i_w=250&i_h=188&i_wtmrk=" alt="Paro parcial de Sunca" border="0"/> I have a lot of warning in my reports with this and I would like to omit this warnings How can I do that? noindex? nofollow? Thanks The original page that contain that code is this http://www.espectador.com/noticias/250166/paro-parcial-de-sunca Thanks
On-Page Optimization | | informatica8100 -
Using Transcriptions
Hi everyone, I've spent a long time trying to figure this one out, so I'm looking forward to your insights. I've recently started having our videos transcribed and keyworded. The videos are hosted on youtube and already embedded on our website. Each embedded video is accompanied by an existing keyword-rich article that covers pretty much the same content of the video, but in a little more detail. I'm now going back and having these videos transcribed. The reason I started doing this was to essentially lengthen the article and get more keywords on the page. Question A. My concern is that the transcription covers the same content as the article, so doesn't add that much for the reader. That's why when I post the transcription (below the embedded video), I use a little javascript link for people to click if they want to read it. Then it becomes visible. Otherwise it's not visible. Note that I am NOT trying to hide it from google by doing this - and it will still show up for people who don't have javascript on - so I'm not trying to cheat google at all and I think I'm doing it based on how they want it done. You can see an example here: http://www.healthyeatingstartshere.com/nutrition/healthy-diet-plan-mistakes So my first question is: do you think the javascript method is a good way of doing it? Question B. Does anyone have any insight on whether it would be better to put the transcription:
On-Page Optimization | | philraymond
1. On the same page as the embedded video/article (which I am doing now), or
2. On a different page, linked to from the above page, or
3. On various other websites (wordpress, blogspot, web2.0 sites) that link back to the video/article on our site. I know it's usually best practice to put it on the same page as the video, but I'm wondering from an <acronym title="Search Engine Optimization">SEO</acronym> point of view if I'm wasting a 500 word transcription by posting it on the same page as a 500 article that covers the same topic and uses the same keywords, and I wonder if it would be better to use the transcription elsewhere. Do you have any thoughts on which of the above methods would be best? Thanks so much for reading and any advice you may have.0 -
Repeat Keyword Phrase or Use Variations
Is it better to repeat a keyword phrase on a page's text that you have already used once, or to use a different variation of the keyword phrase?
On-Page Optimization | | SparkplugDigital0