Blocked by robots
-
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags
his former seo told him this: "We've activated following settings:
- Use noindex for Categories
- Use noindex for Archives
- Use noindex for Tag Archives
to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "is this guy correct?
what would be the problem with indexing these?
am i correct in thinking they should be indexed?
thanks
-
As far as the upgrading of php on a server - this is for a different client, I seem to recall?
I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this
For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.
This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.
Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.
First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.
In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).
This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.
Hope that answers your question?
Paul
-
Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.
Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.
Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.
Paul
-
i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.
also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?
i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com
i feel handcuffed - no reason to run WP if you cant use plugins right?
-
Sorry I missed this, Erik. Happy to have a look in the next day or two.
Paul
-
First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.
There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)
The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.
In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.
This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).
Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding directories to robots nofollow cause pages to have Blocked Resources
In order to eliminate duplicate/missing title tag errors for a directory (and sub-directories) under www that contain our third-party chat scripts, I added the parent directory to the robots disallow list. We are now receiving a blocked resource error (in Webmaster Tools) on all of the pages that have a link to a javascript (for live chat) in the parent directory. My host is suggesting that the warning is only a notice and we can leave things as is without worrying about the page being de-ranked/penalized. I am wondering if this is true or if we should remove the one directory that contains the js from the robots file and find another way to resolve the duplicate title tags?
Technical SEO | | miamiman1000 -
Accidentally blocked Googlebot for 14 days
Today after I noticed a huge drop in organic traffic to inner pages of my sites, I looked into the code and realized a bug in last commit cause the server to showing captcha pages to all Googlebot requests from Apr 24. My site has more than 4,000,000 in the index. Before last code change, Googlebot are exempt from being shown the captcha requests so each inner pages are crawled and indexed perfectly with no problem. The bug broke the whitelisting mechanism and treat requests from Google's ip addresses the same as regular users. It leads to the captcha page being crawled when Googlebot visits thousands of my site's inner pages. This makes Google thinks all my inner pages are identical to each other. Google remove all the inner pages from SERP starting from May 5th before when many of those inner pages have good rankings. I formerly thought this was a manual or algorithm penalty but 1. I did not receive a warning message in GWT
Technical SEO | | Bull135
2. The ranking for main url is good. I tried with "Fetch as Google" in GWT and realize all Googlebot saw in the past 14 days are the same captcha page for all my inner pages. Now, I have fixed the bug and updated the production site. I just wanted to ask: 1. How long will it take for Google to remove the "duplicated content" flag on my inner pages and show them in SERP again? From my experience, Googlebot revisits urls quite often. But once a url is flagged as "contains similar content", it could be difficult to recover, is it correct? 2. Besides waiting for Google to update its index, what else can I do right now? Thanks in advance for your answers.0 -
Site blocked by robots.txt and 301 redirected still in SERPs
I have a vanity URL domain that 301 redirects to my main site. That domain does have a robots.txt to disallow the entire site as well. However, for a branded enough search that vanity domain still shows up in SERPs and has the new Google message of: A description for this result is not available because of this site's robots.txt I get why the message is there - that's not my , my question is shouldn't a 301 redirect trump this domain showing in SERPs, ever? Client isn't happy about it showing at all. How can I get the vanity domain out of the SERPs? THANKS in advance!
Technical SEO | | VMLYRDiscoverability0 -
Block /tag/ or not?
I've asked this question in another area but now i want to ask it as a bigger question. Do we block /tag/ with robots.txt or not. Here's why I ask: My wordpress site does not block /tag/ and I have many /tag/ results in the top 10 results of Google. Have for months. The question is, does Google see /tag/ on WordPress as duplicate content? SEOMoz says it's duplicate content but it's a tag. It's not really content per say. I'm all for optimizing my site but Google is not penalizing me for /tag/ results. I don't want to block /tag/ if Google is not seeing it as duplicate content for only one reason and that's because I have many results in the top 10 on G. So, can someone who knows more about this weigh in on the subject for I really would like a accurate answer. Thanks in advance...
Technical SEO | | MyAllenMedia0 -
Robots.txt query
Quick question, if this appears in a clients robots.txt file, what does it mean? Disallow: /*/_/ Does it mean no pages can be indexed? I have checked and there are no pages in the index but it's a new site too so not sure if this is the problem. Thanks Karen
Technical SEO | | Karen_Dauncey0 -
I am trying to block robots from indexing parts of my site..
I have a few websites that I mocked up for clients to check out my work and get a feel for the style I produce but I don't want them indexed as they have lore ipsum place holder text and not really optimized... I am in the process of optimizing them but for the time being I would like to block them. Most of my warnings and errors on my seomoz dashboard are from these sites and I was going to upload the folioing to the robot.txt file but I want to make sure this is correct: User-agent: * Disallow: /salondemo/ Disallow: /salondemo3/ Disallow: /cafedemo/ Disallow: /portfolio1/ Disallow: /portfolio2/ Disallow: /portfolio3/ Disallow: /salondemo2/ is this all i need to do? Thanks Donny
Technical SEO | | Smurkcreative0 -
Client accidently blocked entire site with robots.txt for a week
Our client was having a design firm do some website development work for them. The work was done on a staging server that was blocked with a robots.txt to prevent duplicate content issues. Unfortunately, when the design firm made the changes live, they also moved over the robots.txt file, which blocked the good, live site from search for a full week. We saw the error (!) as soon as the latest crawl report came in. The error has been corrected, but... Does anyone have any experience with a snafu like this? Any idea how long it will take for the damage to be reversed and the site to get back in the good graces of the search engines? Are there any steps we should take in the meantime that would help to rectify the situation more quickly? Thanks for all of your help.
Technical SEO | | pixelpointpress0