Blocked by robots
-
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags
his former seo told him this: "We've activated following settings:
- Use noindex for Categories
- Use noindex for Archives
- Use noindex for Tag Archives
to reduce keyword stuffing & duplicate post tags
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. "is this guy correct?
what would be the problem with indexing these?
am i correct in thinking they should be indexed?
thanks
-
As far as the upgrading of php on a server - this is for a different client, I seem to recall?
I would have a real problem with a developer saying they weren't going to upgrade because it might break things. Of course it might break things, but there are industry-standard approaches to dealing with this
For example, create a duplicate version of the site on a server instance that is using the newer version of php, and do a full Quality Assurance analysis on the dev site to find and fix anything that has issues with the new php version. Then deploy back to the live site with the php upgrade.
This is standard operating procedure and is necessary because there will come a time when any older server software will no longer be supported and therefore becomes a security risk as it will be unpatched. Planning for these kinds of upgrades should be included in any website operational plan.
Also, their solution to move WordPress to a subdomain is no protection whatsoever for the fact they have an extremely vulnerable, version.
First, the site is just as vulnerable to being hacked again as it is still unpatched. Being on a subdomain has no effect on this. Also, they have ruined the SEO value of that blog by moving it to a subdomain instead of fixing the issue and keeping it as a subdirectory of the prime site. And depending on the type of vulnerability exploited, it may still be possible for a hacker to get into the server via the vulnerable WP, then traverse from the subdomain to the prime site and cause harm there as well.
In the short term, if there truly aren't resources to properly do QA (Quality Assurance) on a dev site running an updated version of PHP, the alternative would be to move the WordPress install to it's own server or VPS running a current version of PHP, upgrade it and security patch it, then use a reverse proxy setup to have it show up as blog.domain.com (or even move it back to domain,com/blog).
This would at least allow for a properly secured WordPress that could also use current and new plugins. This would, however be at the expense of a slightly more complicated setup of the reverse proxy.
Hope that answers your question?
Paul
-
Sorry, Erik - I didn't' forget about you, but was dealing with an ethical dilemma.
Unfortunately, the business of the site you're dealing with is so completely against the terms of service of the Search Engines and against what I believe to be good, sustainable SEO, that I've decided I can't, in good conscience, do anything to help them.
Sorry this leaves you no assistance, but I would suggest strongly you not rely heavily on this client for ongoing revenues. They are just begging to get hammered by Google, if that's not what's happening already.
Paul
-
i'm happy for all the help so i'm not complaining here but i think you forgot about me paul.
also i need to know why my client is so adamant about not wanting to upgrade his php from 5.1.6 to 5..2.4 saying it could hinder his sites overall functionality. any idea why?
i want to update his WP to newest version and it requires php to be updated so we are running old plugins and old WP - his blog was hacked so his webguys moved the location from site.com/blog to blog.site.com
i feel handcuffed - no reason to run WP if you cant use plugins right?
-
Sorry I missed this, Erik. Happy to have a look in the next day or two.
Paul
-
First, to be clear, the Webmaster Tools notifications are just that. Google isn't indicating any kind of a problem, Erik. It's just declaring what it has found in the site's robot.txt file.
There's no way to give a definitive answer without seeing the actual website structure, but in general, it is VERY common and good practice to no-index the categories and tags on CMS-based websites. Usually, you want some form of the archives to be indexed, but it's usually the individual pages that are most important. (e.g. not date-based archives.)
The problem with allowing all of these to be indexed is that to a search engine, they will all look like duplicate content of other pages on the website. This will cause the search engine crawler to have to work much harder to find all the content on your website, and ad a result may quit part way though.
In addition,much of the content it finds it will consider to be duplicative of other pages on the website, and therefore will have a hard time knowing which version is actually the most valuable result to return. And as a result will split the authority of each of the pages, making them MUCH harder to rank.
This is a standard challenge of any CMS based website, because they display the same content organized by what are referred to as different taxonomies (different ways of categorizing or linking the same information).
Again, without seeing the actual site I can't say for sure, but short answer is that those three directives are very common for CMS- based websites and are very likely correct.
Hope that helps?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website URL, Robots.txt and Google Search Console (www. vs non www.)
Hi MOZ Community,
Technical SEO | | Badiuzz
I would like to request your kind assistance on domain URLs - www. VS non www. Recently, my team have moved to a new website where a 301 Redirection has been done. Original URL : https://www.example.com.my/ (with www.) New URL : https://example.com.my/ (without www.) Our current robots.txt sitemap : https://www.example.com.my/sitemap.xml (with www.)
Our Google Search Console property : https://www.example.com.my/ (with www.) Question:
1. How/Should I standardize these so that Google crawler can effectively crawl my website?
2. Do I have to change back my website URLs to (with www.) or I just need to update my robots.txt?
3. How can I update my Google Search Console property to reflect accordingly (without www.), because I cannot see the options in the dashboard.
4. Is there any to dos such as Canonicalization needed, or should I wait for Google to automatically detect and change it, especially in GSC property? Really appreciate your kind assistance. Thank you,
Badiuzz0 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
What's wrong with this robots.txt
Hi. really struggling with the robots.txt file
Technical SEO | | Leonie-Kramer
this is it: User-agent: *
Disallow: /product/ #old sitemap
Disallow: /media/name.xml When testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong. Can someone take a look at this and give me the solution.
Thanx in advance! Leonie1 -
Easy Question: regarding no index meta tag vs robot.txt
This seems like a dumb question, but I'm not sure what the answer is. I have an ecommerce client who has a couple of subdirectories "gallery" and "blog". Neither directory gets a lot of traffic or really turns into much conversions, so I want to remove the pages so they don't drain my page rank from more important pages. Does this sound like a good idea? I was thinking of either disallowing the folders via robot.txt file or add a "no index" tag or 301redirect or delete them. Can you help me determine which is best. **DEINDEX: **As I understand it, the no index meta tag is going to allow the robots to still crawl the pages, but they won't be indexed. The supposed good news is that it still allows link juice to be passed through. This seems like a bad thing to me because I don't want to waste my link juice passing to these pages. The idea is to keep my page rank from being dilluted on these pages. Kind of similar question, if page rank is finite, does google still treat these pages as part of the site even if it's not indexing them? If I do deindex these pages, I think there are quite a few internal links to these pages. Even those these pages are deindexed, they still exist, so it's not as if the site would return a 404 right? ROBOTS.TXT As I understand it, this will keep the robots from crawling the page, so it won't be indexed and the link juice won't pass. I don't want to waste page rank which links to these pages, so is this a bad option? **301 redirect: **What if I just 301 redirect all these pages back to the homepage? Is this an easy answer? Part of the problem with this solution is that I'm not sure if it's permanent, but even more importantly is that currently 80% of the site is made up of blog and gallery pages and I think it would be strange to have the vast majority of the site 301 redirecting to the home page. What do you think? DELETE PAGES: Maybe I could just delete all the pages. This will keep the pages from taking link juice and will deindex, but I think there's quite a few internal links to these pages. How would you find all the internal links that point to these pages. There's hundreds of them.
Technical SEO | | Santaur0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
Blocking URL's with specific parameters from Googlebot
Hi, I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor". How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages. DON'T want to block: http://www.mysite.com/product.php?productid=1234 WANT to block: http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2 Javacript button code: onclick="javascript: document.voteform.submit();" Thanks in advance for any advice given. Regards,
Technical SEO | | aethereal
Asim0 -
Mobile site - allow robot traffic
Hi, If a user comes to our site from a mobile device, we redirect to our mobile site. That is www.mysite/mypage redirects to m.mysite/mypage. Right now we are blocking robots from crawling our m. site. Previously there were concerns the m. site could rank for normal browser searches. To make sure this isn't a problem we are planning on rel canonical our m. site pages and reference the www pages (mobile is just a different version of our www site). From my understanding having a mobile version of a page is a ranking factor for mobile searches so allowing robots is a good thing. Before doing so, I wanted to see if anyone had any other suggestions/feedback (looking for potential pitfalls, issues etc)
Technical SEO | | NicB10 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0