New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Correct robots.txt for WordPress
Hi. So I recently launched a website on WordPress (1 main page and 5 internal pages). The main page got indexed right off the bat, while other pages seem to be blocked by robots.txt. Would you please look at my robots file and tell me what‘s wrong? I wanted to block the contact page, plugin elements, users’ comments (I got a discussion space on every page of my website) and website search section (to prevent duplicate pages from appearing in google search results). Looks like one of the lines is blocking every page after ”/“ from indexing, even though everything seems right. Thank you so much. FzSQkqB.jpg
On-Page Optimization | | AslanBarselinov1 -
Toxic URL???
Hi I have a URL that produced page 1, number 1 to 3 for most of our industries top phrases. Then we received a google penalty, (as did several of our competitors on the same day). We were effectively wiped from google. After much disavowing we were allowed back into the search results, this took about 3 months. I have employed the services of a top London SEO company for over a year now and have seen no significant improvement. I believe they are doing there best, however there results are VERY poor. According to the various tools, (searchmetrics, woorank, semrush) to name but a few, our site scores very well, yet we are not getting the results. Page one seems to be full of totally new websites, most of which I have never heard of, and have appeared from nowhere. Should I scrap our URL and put up a completely new one, and put a redirect from the original one? This would be a biggy since our url has been around for 20 years. Thanks for reading. Andy
On-Page Optimization | | First-VehicleLeasing0 -
How to block index of link and content
Hi, We have pages where articles are shown and in the sides we have small snippets of Articles which shows the title and close to 25 words and a image. When i search for something in Google the snippet image and content is shown and in Google when clicked it redirects to a page which is not meant to be shown for the keyword the visitor is querying Is there a way i can block all the links and content shown in the right and left side of the page so Google does not get confused with the page content thats not related to that page? thanks
On-Page Optimization | | AlexisWithers0 -
301 redirects, efficiency and dynamic URLs
Hi, I have 2 301 redirect questions. Question 1: I have am working with a designer on the redesign of a website that currently has over 5,000 indexed pages. The majority of these are dynamic URLs from the Stone Locator database. (see below) http://www.domain.com/storelocator.php?zipcode=91784&page=12 How can I efficiently deal with these pages from an SEO perspective when developing the new site? Is there a way to do a bulk 301 redirect to a store locator page, for instance? Question 2: If a rel=canonical tag has been established on a page (www....), is it necessary to add 301 redirects to all of the other versions on: the home page (domain.com , domain.com/index.html, domain.com/index.html, etc.) all other pages with those same extensions ? Thank you for your help! Erin
On-Page Optimization | | HiddenPeak0 -
Should I use rel=canonical in this case
Hi SEO pros, I am working on a website competing on the keyword "USA maps" and would appreciate your advice and comments on the issue below. The site has one major page for USA maps and like 5-6 smaller pages under different categories, e.g. US travel maps (under Travel Maps category), US travel guides (under Travel Guides), US atlases (under Atlases), etc. The smaller pages do not rank in search results and are not optimized well for any keyword. Here are my questions: #1. Do you think that if I add rel=canonical to the main USA maps page from all small pages that will help get higher ranking of the main page? #2 Or should I better try to optimize these small pages for the keywords they target (e.g. "US travel maps") and try to send link juice to the main page from text link within the content? Thank you,
On-Page Optimization | | ParisChildress0 -
Question about URLs
Hello! I have a client that wants to upload an URL like this: www.example.com/keyword/page-name.html The main problem is that www.example.com/keyword/ doesn't exist and gives a 404 error so I'd prefer not doing that...... What do you think about this? And if the client wants to go ahead, is there any solution? A 301 to the final page would help? Thank you in advance!
On-Page Optimization | | Juandbbam0 -
Blocked By Meta Robots
Hi I logged in the other day to find that over night I received 8347 notices saying certain pages are being kept out of the search engine indexes by meta-robots. I have not changed my robots.txt in years and I certainly didn't block Google from visiting those pages. Is this a fault on Roger Mozbot behalf? Or is there a bot preventing 8000+ pages being indexed? Is there a way to find out what meta-robot is doing this and where? And how I can get rid of it? I usually rank between #3 and #5 for the term 'sex toys' on google.com.au, but I now rank #7 to #9 so it would seem some of my pages/content is being blocked. My website is www.theloveshop.com THIS IS AN ADULT TOYS SITE. There is no porn videos or anything like that on it, but just in case you don't wish to look at sex toys or are around kids I thought I would mention it. Blake
On-Page Optimization | | wayne10 -
Changing url in connection with meta title inconsistency
We run a site, which is a directory type one, where visitors can look for local businesses per city as well (at some point similar to the 'Yelp concept'). Now, we use www.example.com as the homepage, and the www.example.com/city1, where city1 is the capital of our country, is 301 redirected to the homepage, so this is your default setting. When you choose e.g city2, your url changes to www.example.com/city2, and the city value is stored in a cookie. Then, when you leave the session, and return to the site later on, you will see the homepage url, but with your previous choice of city (in case you let cookies be stored). My concern is, that the meta title always contains the chosen city name, so when you return to the website, and you previously used city2, you will now see the homepage url (which normally belongs to city1), but with the meta title of city2 or with any other previously chosen city. Does this mean a problem, and should I always use the correct url, which would be www.example.com/cityX, or this could not cause a problem for me? If it does, would you mind sharing me the exact problems as well? Thanks,
On-Page Optimization | | Dilbak0