New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using keywords in my URL: Doing a redirect to /keyword
My website in "On Page Grade" received an A.Anyway, I only have 1 thing to optimize:_"Use Keywords in your URL__Using your targeted keywords in the URL string adds relevancy to your page for search engine rankings, assists potential visitors identify the topic of your page from the URL, and provides SEO value when used as the anchor text of referring links."_My website is ranking in top10 for a super high competitive keyword and all my others competitors have the keyword on their domain, but not for my URL.Since I can't change my domain for fixing this suggestion, I would like to know what do you think about doing a 301 redirect from / to mydomainname.com/keyword/So the index of my website would be the /keyword.I don't know if this can make a damage to my SERP for the big change ir it would be a great choice.
On-Page Optimization | | estebanseo0 -
Redirect both / and non-/ URLs?
I am doing SEO on WP site. Due to some duplicate pages (rel canonical was done before) I am doing 301 redirects at the moment. And I wonder if I need to redirect both links w/ and w/o trailing slash. Default is non www, w/o trailing slash. Like there is .com/category/news but there is same page linked in .com/news (well it works when permalink structure is set to /%category%/%postname% and returns 404 error when structure is set to /%postname%).
On-Page Optimization | | OVJ
I redirected .lt/naujienos to .lt/category/naujienos. Should I also redirect .lt/naujienos/ (with trailing slash)? There's absolutely no problem redirecting this, but there are some more pages which I want to edit their URLs and I wonder If I should do both redirects from links /w and w/o slash?1 -
Replacing website, same URL, lose ranking
We had to do a rush job for a client and get a website up very quickly. We had to just replace the existing files on the server, but URL stayed the same. Now "teeth whitening glasgow" which they were number #1 for isn't coming up. Know there is still some on page work to be done - but the link profile should still remain the same? Just looking for some advice to get back to where they were - should we do a website change of address or just verify the site in Webmaster and get all the on page done? Thanks, Laura
On-Page Optimization | | lauratagdigital0 -
Blocking Pages E-Commerce Site
Hello, I am working on a site with 1,000's of product pages, some of which do not have inventory in them. Should be blocking these pages in order to reduce bouce rate? How could i manage so many pages efficiently? It would takes weeks to comb through pages to determine which have inventory and which do not. They are also time sensitive as they are live events so dates are always changing. Thanks!
On-Page Optimization | | TP_Marketing0 -
Search engine friendly URLs
I'm going to create some new content for my site, I'm trying to decide on the best search engine friendly format. Namely, is it ok to use a subdirectory or should I keep all content on root level? Is the SEO effect of either of these URLs superior to the other? domain.com/cooking/lasagna.php vs domain.com/lasagna.php
On-Page Optimization | | limens0 -
Using Transcriptions
Hi everyone, I've spent a long time trying to figure this one out, so I'm looking forward to your insights. I've recently started having our videos transcribed and keyworded. The videos are hosted on youtube and already embedded on our website. Each embedded video is accompanied by an existing keyword-rich article that covers pretty much the same content of the video, but in a little more detail. I'm now going back and having these videos transcribed. The reason I started doing this was to essentially lengthen the article and get more keywords on the page. Question A. My concern is that the transcription covers the same content as the article, so doesn't add that much for the reader. That's why when I post the transcription (below the embedded video), I use a little javascript link for people to click if they want to read it. Then it becomes visible. Otherwise it's not visible. Note that I am NOT trying to hide it from google by doing this - and it will still show up for people who don't have javascript on - so I'm not trying to cheat google at all and I think I'm doing it based on how they want it done. You can see an example here: http://www.healthyeatingstartshere.com/nutrition/healthy-diet-plan-mistakes So my first question is: do you think the javascript method is a good way of doing it? Question B. Does anyone have any insight on whether it would be better to put the transcription:
On-Page Optimization | | philraymond
1. On the same page as the embedded video/article (which I am doing now), or
2. On a different page, linked to from the above page, or
3. On various other websites (wordpress, blogspot, web2.0 sites) that link back to the video/article on our site. I know it's usually best practice to put it on the same page as the video, but I'm wondering from an <acronym title="Search Engine Optimization">SEO</acronym> point of view if I'm wasting a 500 word transcription by posting it on the same page as a 500 article that covers the same topic and uses the same keywords, and I wonder if it would be better to use the transcription elsewhere. Do you have any thoughts on which of the above methods would be best? Thanks so much for reading and any advice you may have.0 -
What is the best practice for changing a url of an existing page
I a looking through the on-page SEO reports in SEOmoz for one of my sites. It suggests that I change the url of a particular page to match the desired search term I want to rank for. In this case it is a site for a local business and the url is example.com/testimonials. when it probabaly should have instead been example.com/city-business-reviews. I have just a couple links to this page and I'm stuck towards the bottom of page 1 in the SERPs currently. Questions... 1. Should I change the url to include the exact keyword term I want the page to rank for? 2. If yes, what is the best method to ensure that any existing link juice to the current url is retained? Would I change the url, then create a new page with the old url and apply a 301 redirect to point it to the new page? Thanks!
On-Page Optimization | | fastestmanalive0 -
All Caps in URL
Hello, we're working with a corporate client to make changes to their URL structure. We recommended that they use a structure like "domain.com/state/city/location". Their IT department is on board, but they just mentioned that all of the state and city info will be pulled from a database where it is all caps. So it would be like this "domain.com/STATE/CITY/location" I'm concerned that it may be spammy, but I can't find any definitive information online. We usually like to test issues like this on our own sites before advising clients, but we're working up against a quick deadline. Any help with this issue would be great, especially real world experience, not just theories. Thank you.
On-Page Optimization | | interactivek0