Blocked by robots

Ezpro9

my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags

his former seo told him this: "We've activated following settings:

Use noindex for Categories
Use noindex for Archives
Use noindex for Tag Archives

to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "

is this guy correct?

what would be the problem with indexing these?

am i correct in thinking they should be indexed?

thanks

ThompsonPaul

As far as the upgrading of php on a server - this is for a different client, I seem to recall?

I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this

For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.

This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.

Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.

First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.

In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).

This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.

Hope that answers your question?

Paul

ThompsonPaul

Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.

Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.

Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.

Paul

Ezpro9

i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.

also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?

i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com

i feel handcuffed - no reason to run WP if you cant use plugins right?

ThompsonPaul

Sorry I missed this, Erik. Happy to have a look in the next day or two.

Paul

ThompsonPaul

First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.

There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)

The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.

In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.

This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).

Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.

Hope that helps?

Paul

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Blocked by robots

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Do I need a separate robots.txt file for my shop subdomain?

Ajax Crawling | Blocked URLs Spike

Removal request for entire catalog. Can be done without blocking in robots?

How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?

Block or remove pages using a robots.txt

Should I block robots from URLs containing query strings?

How to block "print" pages from indexing

How to recover after blocking all the search engine spiders?