New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating Content for over 50,000 Pages
Hi, Our site is a football (soccer) statistics sites. We gather information on upcoming games and post results of past games. At the moment we have over 50,000 pages of results each having in-game data displayed. The main problem I have is none of these match data pages has any text.Mostly tables of stats. Could anyone suggest a way of creating unique content for these pages? If I created some generic a paragraphic of text that changed based on stats and figures would this be seen as duplicate content?
On-Page Optimization | | jtatsubana0 -
Backlink URL: With or Without WWW?
When it comes to backlinks. Does it matter with or without WWW? For example my website is without WWW and I backlink with WWW, will it still affect my website rank?
On-Page Optimization | | Japracool0 -
Two URL's for the same page
Hi, on our site we have two separate URL's for a page that has the same content. So, for example - 'www.domain.co.uk/stuff' and 'www.domain.co.uk/things/stuff' both have the same content on the page. We currently rank high in search for 'www.domain.co.uk/things/stuff' for our targeted keyword, but there are numerous links on the site to www.domain.co.uk/stuff and also potentially inbound links to this page. Ideally we want just the www.domain.co.uk/things/stuff URL to be present on the site, what would be the best course of action to take? Would a simple Canonical tag from the '/stuff' URL which points to the '/things/stuff' page be wise? If we were to scrap the '/stuff' URL totally and redirect it to the 'things/stuff' URL and change all our on site links, would this be beneficial and not harm our current ranking for '/things/stuff'? We only want 1 URL for this page for numerous reasons (i.e, easier to track in Analytics), but I'm a bit cautious that changing the page that doesn't rank may have an affect on the page that does rank! Thanks.
On-Page Optimization | | Jaybeamer2 -
URL Question
This url looks bad: http://www.patrickmunoz.com/#!classes/c1vw1 And when you click around the page change doesn't actually occur, it's a fade into the next page. I think this is a major problem for rankings. Although pages are crawled: https://www.google.com/search?q=site%3Ahttp%3A%2F%2Fwww.patrickmunoz.com%2F&oq=site%3A&aqs=chrome.2.69i57j69i58j69i59l3j69i61.3548j0j7&sourceid=chrome&espv=210&es_sm=122&ie=UTF-8 When I search for a simple page - "patrick munoz FAQs" nothing comes up:
On-Page Optimization | | tylerfraser
https://www.google.com/search?q=site%3Ahttp%3A%2F%2Fwww.patrickmunoz.com%2F&oq=site%3A&aqs=chrome.2.69i57j69i58j69i59l3j69i61.3548j0j7&sourceid=chrome&espv=210&es_sm=122&ie=UTF-8#q=patrick+munoz+|+FAQs Do you think this is a bad url configuration? Thanks! Tyler0 -
Optimize URL
Hello, My website have been running over five years. I have just reviewed and seen some URLs had not good. It is http://www.vietnamvisacorp.com/faqs/who-need-visa-to-vietnam---1.html, containing characters "---1". Should I remove unnecessary characters "---"?. Thanks for any advice!
On-Page Optimization | | JohnHuynh0 -
Keyword in URL?
I have a website that has been live for about 8yrs. I do not have any significant rankings for my main keywords but am now starting SEO on my site. I am contemplating changing the url to contain the main keyword prefixed by my brand name. Any views on the ranking benefits and or CTR benefits.
On-Page Optimization | | Johnnyh
Example:
Main Volume keyword - 'car leasing'
current url - www.bobleasing.co.uk (made up name) thinking of changing to - www.bobcarleasing.co.uk (made up name) Any advice would be much appreciated. John0 -
URL best practices, use folders or not ?
Hi I have a question about URLs. Client have all URL written after domain and have only one / slash in all URLs. Is this best practice or i need to use categories,folders? Thanks
On-Page Optimization | | 77Agency0 -
Using Transcriptions
Hi everyone, I've spent a long time trying to figure this one out, so I'm looking forward to your insights. I've recently started having our videos transcribed and keyworded. The videos are hosted on youtube and already embedded on our website. Each embedded video is accompanied by an existing keyword-rich article that covers pretty much the same content of the video, but in a little more detail. I'm now going back and having these videos transcribed. The reason I started doing this was to essentially lengthen the article and get more keywords on the page. Question A. My concern is that the transcription covers the same content as the article, so doesn't add that much for the reader. That's why when I post the transcription (below the embedded video), I use a little javascript link for people to click if they want to read it. Then it becomes visible. Otherwise it's not visible. Note that I am NOT trying to hide it from google by doing this - and it will still show up for people who don't have javascript on - so I'm not trying to cheat google at all and I think I'm doing it based on how they want it done. You can see an example here: http://www.healthyeatingstartshere.com/nutrition/healthy-diet-plan-mistakes So my first question is: do you think the javascript method is a good way of doing it? Question B. Does anyone have any insight on whether it would be better to put the transcription:
On-Page Optimization | | philraymond
1. On the same page as the embedded video/article (which I am doing now), or
2. On a different page, linked to from the above page, or
3. On various other websites (wordpress, blogspot, web2.0 sites) that link back to the video/article on our site. I know it's usually best practice to put it on the same page as the video, but I'm wondering from an <acronym title="Search Engine Optimization">SEO</acronym> point of view if I'm wasting a 500 word transcription by posting it on the same page as a 500 article that covers the same topic and uses the same keywords, and I wonder if it would be better to use the transcription elsewhere. Do you have any thoughts on which of the above methods would be best? Thanks so much for reading and any advice you may have.0