New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords used to land on specific page?
Hi all, Does anyone know if there's anywhere where I can see what keywords are used in search engines to land on a specific page? I have access to the Google Analytics account and linked it to Moz as a campaign, but I can't find this data. I'm curious about this because a very uncommon word is used in a page title for a page I try to optimize. It's the Dutch translation of 'malicious'. And now I wonder if it's better to switch to a word that's used more often. Or if it's better to 'win the battle' on this (probably) rarely used word. I've used Google trends to see how many people use it, but it says there's not enough data to show the interest over time.
On-Page Optimization | | RaoulWB0 -
Where to add new content
I run a vBulletin website and vBulletin isnt very SEO friendly. I do fairly well in Google for most of my keywords, but forums dont necessarily build strong page authority etc. My site deals with fishing reports across the state of VA and drives 15-18k sessions a month and close to 100,000 page views a month based on Google Analytics. I want to start targeting new keywords and I am concerned about vBulletin inability to be SEO friendly. Many of my new keywords arent dynamic like fishing reports that are added by members daily. These are more like campgrounds, marinas etc. My thought is to install a Wordpress blog and build out this content so I can efficiently deal with on page SEO. the vBulletin software is installed in the root so I would install wordpress in something like mydomain/lake123/ Is the right thing to do, and will google see multiple sitemaps (one for vbulletin and another for wordpress) and index appropriately? Am I missing something major here? Thanks ~ Brian
On-Page Optimization | | FCBCO0 -
Properly changing title, URL and content for new keywords without harming other rankings.
Hello - We are looking to try to bring up some keywords in the SERPs that we are currently ranking fairly low for. We sell Christening clothing for children and people will use both Christening and Baptism to search for the same thing. We currently rank very high for Christening (#1 on Google for certain combinations) but we are fairly low on Baptism.
On-Page Optimization | | BabyBeauBelle
I am trying to figure out the best way to start getting Baptism up by changing some title, URL and content pages to include more Baptism keywords. My concern is messing with the existing because we rank so well for Christening. Since we are ecommerce we can vary this quite a bit on our products, but again I'm nervous to do so fearing changing the wrong things, too many products etc and in the process of trying to raise one set of keywords (baptism) we harm the other set (christening).
Any advice would be appreciated!0 -
Title and Url Agreement
In the case of trying to hit a wide taxonomy, is it better to keep your title and URL in agreement, or to vary them slightly for exact search matching. For instance this blog post which has the following url: http://www.simplifiedbuilding.com/blog/build-your-own-standing-desk/ has the title "Make a Stand Up Desk - Better Working, Longer Living" The ideas is that build and make are similar words and "stand up" and "standing" are also similar. So what is the better way to go?
On-Page Optimization | | CPollock0 -
A new relic has been discovered!
Greetings Mozfriends! http://www.google.com/commercesearch/product.html I was wondering peoples thoughts were on Google Commerce Search, and if it is an effective tool. Justin Smith
On-Page Optimization | | FrontlineMobility0 -
Long URL in listing job portal
Hello I have job portal and I am listing job offers by: regions, position, sector, language skill For example, when user searchs job in Bratislava, Programmer, information technology, english. My URL is www.presbium.sk/bratislava/pragrammer/information-technology/english Title:Job in Bratislava, Programmer, Information technology, english I know, that URL is too long and no good for SEO. Is it solution? When user chooses max. 2 items for example: www.presbium.sk/bratislava/programmer/ than I put in every next links atribute nofollow and google will index only pages with max. 2 items: www.presbium.sk/bratislava/ or www.presbium.sk/programmer/ or www.presbium.sk/bratislava/programmer/ but not www.presbium.sk/bratislava/protrammer/english/, because english link has atribut nofollow. And I want to ask, what is the best solution for SEO when I am listing job offers by regions, position, sector, language skill and I have than long URL and Title www.presbium.sk/bratislava/pragrammer/information-technology/english Title:Job in Bratislava, Programmer, Information technology, english
On-Page Optimization | | PeterSEO0 -
Alt tag using photoshop
Simple question i think. Ive started adding alt tags to images using the slice tool in photoshop. This takes up a menu were the last part of is alt tag: This way to add alt tags does work right? I used SEO-browser afterwards and couldnt see the tag. There are maybe other better ways to see if your tags are in there ? Dan L.
On-Page Optimization | | danlae0 -
Url question for seo
Would it be beneficial to use url that is a match for my keyword to help with seo, then have my currently url forward to that one so I don't have to change any marketing materials? I was one of the feedback that I got when doing the on page keyword optimization tool on seo moz. Thanks J
On-Page Optimization | | fertilityhealth0