"Extremely high number of URLs" warning for robots.txt blocked pages

EhrenReilly

I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:

http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets

This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.

User-agent: *

Disallow: /trackingredirect

However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.

If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?

KristinaKledzik

Awesome, good to know things are all okay!

EhrenReilly

Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.

KristinaKledzik

And everything looks okay in your GWT?

EhrenReilly

This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.

KristinaKledzik

Hi Ehren,

Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.

What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.

Let me know if this is a bigger problem!

Kristina

EhrenReilly

Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.

FedeEinhorn

There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.

You can mark that in your Webmaster Tools as fixed and Google won't notify you again.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

"Extremely high number of URLs" warning for robots.txt blocked pages

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Quick Fix to "Duplicate page without canonical tag"?

How to explain "No Return Tags" Error from non-existing page?

Robots.txt crawling URL's we dont want it to

How do I add "noindex" or "nofollow" to a link in Wordpress

Sitemaps and "noindex" pages

Getting home page content at top of what robots see

Robots.txt and 301

How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?