Disallow doesnt disallow anything!?
-
Been trying to exclude a ton of backed up files on our domain that have to stay public as people transition content over. I have tried everything — search a subdirectory by name, updated robots.txt with disallow and noindex (both with and without / or /*) and I still get almost triple the number of 'actual' pages.
I there any way to get cleaner results aside from manual sort-n-cut from the CSV?
-
In this case you would do well to read this documentation:
https://moz.com/help/moz-procedures/crawlers/rogerbot
This forum sees a lot of general SEO queries and is a bit of a broader community, not just for Moz products (though we do see lots of Moz product questions as well!)
Can you just deploy Meta no-index through the HTTP header instead of through the HTML?
-
Hey,
Would you be able to reach out to our HelpTeam at help@moz.com with any product related queries.
Looking forward to hearing from you!
Eli
-
Thanks everyone for chiming in but I specifically was talking to the Moz Pro crawl and results here in Moz Pro. I thought that was implied by asking it on this forum.
The suggestion to 'Add a meta robots noindex tag to each of the pages you want removed from the index' is quite impossible as there are over 10000 pages (some outside our CMS footprint, but on our domain). I wish it were as easy as adding a single line to a header include!
-
Full agreement with this
-
effectdigital is correct. If you're blocking pages via robots.txt and still seeing them in the index, it's likely that Google is encountering links to these pages, and is indexing them that way, without updating its crawl (since your robots.txt says not to). Your best bet is to:
- Add a meta robots noindex tag to each of the pages you want removed from the index;
- Remove the disallow directive from robots.txt;
- Wait for Google to re-crawl the pages (using "fetch as googlebot" in GSC may speed this process along
- Once the pages are no longer in Google's index, re-add the disallow directive to your robots.txt file.
-
In addition to what Dalerio-Consulting has said, be wary of your deployment. Robots.txt doesn't affect indexation, it affects crawling. If Google can't crawl the pages, how can they find the contained no-index tags?
-
Google generally takes some time to update the robots.txt file and even more time to update the SERP and remove the disallowed links from the results. After you have properly blocked the URLs from the bots, then they should be removed from the results within a few days.
In order to check how the robots.txt file affects the files, you can use the Google robots.txt checker tool. There you can see how the file is stored by Google and you can add your link to test whether it is allowed or not.
Daniel Rika - Dalerio Consulting
[Signature links removed by moderator.]
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved How can I shorten a url?
I've got way too many long url's but I have no idea how to shorten them?
Getting Started | | laurentjb0 -
Unsolved Crawler was not able to access the robots.txt
I'm trying to setup a campaign for jessicamoraninteriors.com and I keep getting messages that Moz can't crawl the site because it can't access the robots.txt. Not sure why, other crawlers don't seem to have a problem and I can access the robots.txt file from my browser. For some additional info, it's a SquareSpace site and my DNS is handled through Cloudflare. Here's the contents of my robots.txt file: # Squarespace Robots Txt User-agent: GPTBot User-agent: ChatGPT-User User-agent: CCBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: FacebookBot User-agent: Claude-Web User-agent: cohere-ai User-agent: PerplexityBot User-agent: Applebot-Extended User-agent: AdsBot-Google User-agent: AdsBot-Google-Mobile User-agent: AdsBot-Google-Mobile-Apps User-agent: * Disallow: /config Disallow: /search Disallow: /account$ Disallow: /account/ Disallow: /commerce/digital-download/ Disallow: /api/ Allow: /api/ui-extensions/ Disallow: /static/ Disallow:/*?author=* Disallow:/*&author=* Disallow:/*?tag=* Disallow:/*&tag=* Disallow:/*?month=* Disallow:/*&month=* Disallow:/*?view=* Disallow:/*&view=* Disallow:/*?format=json Disallow:/*&format=json Disallow:/*?format=page-context Disallow:/*&format=page-context Disallow:/*?format=main-content Disallow:/*&format=main-content Disallow:/*?format=json-pretty Disallow:/*&format=json-pretty Disallow:/*?format=ical Disallow:/*&format=ical Disallow:/*?reversePaginate=* Disallow:/*&reversePaginate=* Any ideas?
Getting Started | | andrewrench0 -
Unsolved Scripts?
0 -
Unsolved Website Traffic
Greetings All. I'm working on a new business website for a client, and I've been accessing the site numerous times daily (troubleshooting, confirming changes, etc.). I've been using Google search to access the site, and I use a VPN so that my IP would be random. So I would presume that the site traffic should be increasing. But on the last Moz Pro crawl, the site traffic was still listed as 0.
Getting Started | | depawl52
Is there a minimum amount of traffic required before Moz recognizes it or is something else going on?
Thank you.0 -
Using the free domain analysis tool - what would cause "Bummer no data found"
When I enter my domain in the free analysis tool, I get a "bummer, no data found". I am certain whatever is causing that to happen is causing other SEO problems https://academicanv.org
Getting Started | | verdet32323 -
How do I check PA in moz
i see where I can check the DA for a site, but how can I check the PA of a page?
Getting Started | | Konvertica0 -
Moz not able to crawl our site - any advice?
When I try and crawl our site through Moz it gives this message: Moz was unable to crawl your site on Aug 7, 2019. Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster. I have been through all the help and doesn't seem to be any issues. You can check the site and robots.txt here: https://myfamilyclub.co.uk/robots.txt. Anyone got any advice on where I could go to get this sorted?
Getting Started | | MyFamilClubLtd1