Disallow doesnt disallow anything!?

AaronWintersCSUS

Been trying to exclude a ton of backed up files on our domain that have to stay public as people transition content over. I have tried everything — search a subdirectory by name, updated robots.txt with disallow and noindex (both with and without / or /*) and I still get almost triple the number of 'actual' pages.

I there any way to get cleaner results aside from manual sort-n-cut from the CSV?

effectdigital

In this case you would do well to read this documentation:

https://moz.com/help/moz-procedures/crawlers/rogerbot

This forum sees a lot of general SEO queries and is a bit of a broader community, not just for Moz products (though we do see lots of Moz product questions as well!)

Can you just deploy Meta no-index through the HTTP header instead of through the HTML?

https://yoast.com/x-robots-tag-play/

A Former User

Hey,

Would you be able to reach out to our HelpTeam at help@moz.com with any product related queries.

Looking forward to hearing from you!

Eli

AaronWintersCSUS

Thanks everyone for chiming in but I specifically was talking to the Moz Pro crawl and results here in Moz Pro. I thought that was implied by asking it on this forum.

The suggestion to 'Add a meta robots noindex tag to each of the pages you want removed from the index' is quite impossible as there are over 10000 pages (some outside our CMS footprint, but on our domain). I wish it were as easy as adding a single line to a header include!

effectdigital

Full agreement with this

RuthBurrReedy

effectdigital is correct. If you're blocking pages via robots.txt and still seeing them in the index, it's likely that Google is encountering links to these pages, and is indexing them that way, without updating its crawl (since your robots.txt says not to). Your best bet is to:

Add a meta robots noindex tag to each of the pages you want removed from the index;
Remove the disallow directive from robots.txt;
Wait for Google to re-crawl the pages (using "fetch as googlebot" in GSC may speed this process along
Once the pages are no longer in Google's index, re-add the disallow directive to your robots.txt file.

effectdigital

In addition to what Dalerio-Consulting has said, be wary of your deployment. Robots.txt doesn't affect indexation, it affects crawling. If Google can't crawl the pages, how can they find the contained no-index tags?

Dalerio-Consulting

Google generally takes some time to update the robots.txt file and even more time to update the SERP and remove the disallowed links from the results. After you have properly blocked the URLs from the bots, then they should be removed from the results within a few days.

In order to check how the robots.txt file affects the files, you can use the Google robots.txt checker tool. There you can see how the file is stored by Google and you can add your link to test whether it is allowed or not.

Daniel Rika - Dalerio Consulting
[Signature links removed by moderator.]

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Disallow doesnt disallow anything!?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved How can I shorten a url?

Unsolved Crawler was not able to access the robots.txt

Unsolved Scripts?

Unsolved Website Traffic

Unsolved How do I remove close caption when watching lessons

Using the free domain analysis tool - what would cause "Bummer no data found"

How do I check PA in moz

Moz not able to crawl our site - any advice?