Indexed, though blocked by robots.txt: Need to bother?
-
Hi,
We have intentionally blocked some of the website files which were indexed for years. Now we receive a message "Indexed, though blocked by robots.txt" in GSC. We can ignore as per my knowledge? Are any actions required about this? We thought of blocking them with meta tags but these are PDF files.
Thanks
-
Hi there!
What Google is telling you is that you are indexing URLs that you probably are not wanting to be indexed, or the other way around, that important pages are being blocked but indexed for other reasons.
If I might ask, why did you blocked through robots.txt those files?
There most 2 answers are:
1- Wanted to remove those from search results. If this is your case, you've solved only a part of the problem. What you should have done is (previously allowing robots to crawl those urls) apply noindex rules (keep in mind that can be set up in the HTTP header, as long as not html files cant have meta robots tag), then after a sufficient time block them in robots.txt.
_2- Optimize how GoogleBot (crawiling) time. _Being this case, then you've done it correctly and there is nothing to worry.Hope this help.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do we need to Disallow profiles from discussions or forums?
Hi, We have a forum where users create different threads like any other community...ex..Moz. Thousands of pages are getting created. New threads and comments are Okay as they have relevant content. We are planning to "Disallow" all profile pages as they do not help with content relevancy and may dilute the link juice with thousands of such profile pages. Is this right way to proceed? Thanks
Algorithm Updates | | vtmoz0 -
Google indexing https sites by default now, where's the Moz blog about it!
Hello and good morning / happy Friday! Last night an article from of all places " Venture Beat " titled " Google Search starts indexing and letting users stream Android apps without matching web content " was sent to me, as I read this I got a bit giddy. Since we had just implemented a full sitewide https cert rather than a cart only ssl. I then quickly searched for other sources to see if this was indeed true, and the writing on the walls seems to indicate so. Google - Google Webmaster Blog! - http://googlewebmastercentral.blogspot.in/2015/12/indexing-https-pages-by-default.html http://www.searchenginejournal.com/google-to-prioritize-the-indexing-of-https-pages/147179/ http://www.tomshardware.com/news/google-indexing-https-by-default,30781.html https://hacked.com/google-will-begin-indexing-httpsencrypted-pages-default/ https://www.seroundtable.com/google-app-indexing-documentation-updated-21345.html I found it a bit ironic to read about this on mostly unsecured sites. I wanted to hear about the 8 keypoint rules that google will factor in when ranking / indexing https pages from now on, and see what you all felt about this. Google will now begin to index HTTPS equivalents of HTTP web pages, even when the former don’t have any links to them. However, Google will only index an HTTPS URL if it follows these conditions: It doesn’t contain insecure dependencies. It isn’t blocked from crawling by robots.txt. It doesn’t redirect users to or through an insecure HTTP page. It doesn’t have a rel="canonical" link to the HTTP page. It doesn’t contain a noindex robots meta tag. It doesn’t have on-host outlinks to HTTP URLs. The sitemaps lists the HTTPS URL, or doesn’t list the HTTP version of the URL. The server has a valid TLS certificate. One rule that confuses me a bit is : **It doesn’t redirect users to or through an insecure HTTP page. ** Does this mean if you just moved over to https from http your site won't pick up the https boost? Since most sites in general have http redirects to https? Thank you!
Algorithm Updates | | Deacyde0 -
Adding the link masking directory to robots.txt?
Hey guys, Just want to know if you have any experience with this. Is it worthwhile blocking search engines from following the link masking directory.. (what i mean by this is the directory that holds the link redirectors to an affiliate site: example:
Algorithm Updates | | irdeto
mydomain.com/go/thislink goes to
amazon.com/affiliatelink I want to know if blocking the 'go' directory from getting crawled in robots.txt is a good idea or a bad idea? I am not using wordpress but rather a custom built php site where i need to manually decide on these things. i want to specifically know if this in any way violates guidelines for google. it doesn't change the custom experience because they know exactly where they will end up if they click on the link. any advice would be much appreciated.0 -
Homepage Index vs Home vs Default?
Should your home page be www.yoursite.com/index.htm or home.htm or default.htm on an apache server? Someone asked me this, and I have no idea. On our wordpress site, I have never even seen this come up, but according to my friend, every homepage HAS to be one of those three. So my question is which one is best for an apache server site AND does it actually have to be one of those three? Thanks, Ruben
Algorithm Updates | | KempRugeLawGroup0 -
Panda, Negative SEO and now Penguin - help needed
Hi,
Algorithm Updates | | mlm12
We are small business owners who've been running a website for 5 years that provides our income. We've done very little backlinking ourselves, and never did paid directories or anything like that - usually just occasional forum or blog responses. A few articles here and there with some of our keyword phrases for internal pages. Of course I admit we've done some kwp backlinks on some blogs, but our anchor text profile is largely brand names and our domain name and non keywords (excepting for some "bad" backlinks). Our DA is 34, PA 45 for our home page. We were doing great until last Sept 27 when we got hit by Panda and have been working on deoptimizing our site for keywords, we made a new site in Wordpress for good architecture and ease of use for our customers, and we're deleting/repurposing low quality pages and making our content more robust. We haven't yet recovered from this and now it appears we got hit May 22 for Penguin...ARGH! I recently discovered (hard to have time to devote to everything with just two of us) that others can "negative seo" a site now and I feel this has happened based upon results below... I signed up for linkdetox.com yesterday and it gives a grim picture of our backlinks (says we are in "deadly risk" territory). We have 83 "toxic" links and 600 some "suspicious" links (many are in malware/malicious listed sites, many are .pl domains from Poland, others are I believe foreign domains, or domains that are a bunch or letters that make no sense, or spammy sounding emd domains), - this makes up 80% of our links. As this is our only business, our income is now 1/3 of what it has been, even with PPC ads going as we've been hit hard by all of this and are wondering if we can survive fixing this. We do have an SEO firm minimally helping us along with guidance on recovering, but with income so low, we are doing the work ourselves and can't afford much. Needless to say, we are quite distressed and from reading around, not sure if we'll be able to recover and that is deeply saddening, especially from Negative SEO. We want to make sure we are on the right path for recovery if possible, hence my questions. We haven't been in contact with Google for reconsideration, again, no penalty messages from them. First of all, if we don't have a manual penalty, would you still contact all the toxic/malicious/possible porn looking sites and ask for a link removal, wait, ask for link removal, wait then disavow? Or just go straight to Google disavow? For backlinks coming from sites that are "gone" (like a message saying the account has been suspended), or there is no website there anymore, do I try and contact them too? Or go direct to disavow? Or do nothing? For the sites flagged as malicious (by linkdetox, my browser, or by Google), I don't want to try and open them on my browser to see if this site is legitimate. If linkdetox doesn't have the contact info for these - what are we supposed to do? For "suspicious" foreign sites that I can't read the webpage -would you still disavow them (I've seen many here say links from foreign sites should be disavowed). How do you keep up with all this is someone is negative SEOing you? We're really frustrated that Google's change has made it possible for competitors to tank your business (arguably though, if we had a stronger backlink profile this may not have hurt, or not as much - not sure). When you are small biz owners and can't hire a group to constantly monitor backlinks, get quality backlinks, content, site optimization, etc - it seems an almost impossible task to do. Are wordpress left nav and footer link anchor text an issue for Penguin? I would think Google would realize these internal links will be repetitive for the same anchor text on Wordpress (I know Matt Cutts said to not use the same anchor text more than once for internal linking -but obviously nav and footer menus will do this). What would you do if this was you? Try and fix it all? Start over with a new domain and 301 it (some say this has been working)? Just start over with a new domain and don't redirect? Thanks for your input and advice. We appreciate it.0 -
Urgent input needed on huge drop in Google
As of today we got huge drops in SERP across all our pages. We can see a drop between 10 to 80% on most of our pages on this domain: http://www.meresverige.dk Some background info: Never bought any links Yes, did optimize the site, but only in fair way, using SEO moz On-Page Optimization. Most pages get an A-grade No cloaking, all pages do look exactly same to visitors and Google Any input on what this could be? We are hugely grateful for any input that might lead us in the correct direction Have a nice day Fredrik
Algorithm Updates | | Resultify0 -
Any ideas why our category pages got de-indexed?
Hi all, I work for evenues, a directory website that provides listings of meeting rooms and event spaces. Things seemed to be chugging along nicely with our link building effort (mostly through guest blogging using a variety of anchor text). Woke up on Monday morning to find that our City pages have been de-indexed. This page: http://www.evenues.com/Meeting-Spaces/Seattle/Washington used to be at the top of page #2 in the SERPs for the keyword "Meeting Rooms in Seattle" I doubt that we got de-indexed because of our link building efforts, as it was only a few blog posts and links from profile pages on community websites. My guess is that when we did a recent 2.0 release of the site, there are now several "filters" or subcategory pages with latitude and longitude parameters in the URL + different page titles based on the categories like: "Meeting Rooms and Event Spaces in Seattle" --Main Page "Meeting Rooms in Seattle" "Classroom Venues in Seattle" "Party Venues in Seattle" There was a bit of pushback when I suggested that we do a rel="canonical" on these babies because ideally we'd like to rank for all 4 queries (Meeting Rooms, Party Venues, Classrooms, in City). These are new changes, and I have a sneaking suspicion this is why we got de-indexed. We're presenting generally the same content. Thoughts?
Algorithm Updates | | eVenuesSEO0