Blocked by robots
-
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags
his former seo told him this: "We've activated following settings:
- Use noindex for Categories
- Use noindex for Archives
- Use noindex for Tag Archives
to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "is this guy correct?
what would be the problem with indexing these?
am i correct in thinking they should be indexed?
thanks
-
As far as the upgrading of php on a server - this is for a different client, I seem to recall?
I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this
For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.
This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.
Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.
First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.
In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).
This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.
Hope that answers your question?
Paul
-
Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.
Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.
Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.
Paul
-
i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.
also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?
i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com
i feel handcuffed - no reason to run WP if you cant use plugins right?
-
Sorry I missed this, Erik. Happy to have a look in the next day or two.
Paul
-
First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.
There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)
The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.
In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.
This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).
Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
Robots.txt | any SEO advantage to having one vs not having one?
Neither of my sites has a robots.txt file. I guess I have never been bothered by any particular bot enough to exclude it. Is there any SEO advantage to having one anyways?
Technical SEO | | GregB1230 -
Best Practice for Blocking a site from 1 countries search engines
A client cannot appear in any search engines in one given country but they are ok in rest of the world. Has anybody had any experience blocking a site from appearing in just google.de, bing.de and yahoo.de for example?
Technical SEO | | Salience_Search_Marketing0 -
Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?
Hi guys, I've already added the following syntax in robots.txt to prevent search engines in crawling dynamic pages produce by my website's search feature: Disallow: /search/. But soft 404s are still showing in Google Webmaster Tools. Do I need to wait(it's been almost a week since I've added the following syntax in my robots.txt)? Thanks, JC
Technical SEO | | esiow20130 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0