"Extremely high number of URLs" warning for robots.txt blocked pages
-
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:
http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets
This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.
User-agent: *
Disallow: /trackingredirect
However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.
If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
-
Awesome, good to know things are all okay!
-
Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.
-
And everything looks okay in your GWT?
-
This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.
-
Hi Ehren,
Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.
What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.
Let me know if this is a bigger problem!
Kristina
-
Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.
-
There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.
You can mark that in your Webmaster Tools as fixed and Google won't notify you again.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does "?" in my URL have a negative effect?
I am having a difficult time finding specific information about the effect, if any, having a ? within the URL structure. We have the descriptive keyword phrase followed by the ? location id as in this example: http://www.adventuresonly.com/adventure-locations/things-to-do-in-arizona?stateid=124 Any feedback on effect and a corrective process to improve if necessary would be appreciated!
Technical SEO | | RBBonds0 -
Robots.txt & Mobile Site
Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger0 -
A few misc Webmaster tools questions & Robots.txt etc
Hi I have a few general misc questions re Robots.tx & GWT: 1) In the Robots.txt file what do the below lines block, internal search ? Disallow: /?
Technical SEO | | Dan-Lawrence
Disallow: /*? 2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ? **3) **What's the best way to deal with the below: - old removed page thats returning a 500 response code ? - a soft 404 for an old removed page that has no current replacement old removed pages returning a 404 The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ? Cheers Dan0 -
Having both <title>and <meta name="title"...> on a web page?</title>
Hi All, Client of mine using reversed Meta Tags format in their website and Honestly i never saw such Meta Tags formats. In my opinion having 2 Title tags and wrong reversed description tag is not correct and the needs to be removed, and other tags need to be changed,too But they said that it probably doesn't make a difference because weird thing is Search Engines are apparently able to index them ,So they don't think it affects search engine results and won't remove it just based on opinion. So should i persist in correcting them or just hope for the best and ignore it?!?!?! Thanks!
Technical SEO | | DigitalJungle0 -
How does robots.txt affect aliased domains?
Several of my sites are aliased (hosted in subdirectories off the root domain on a single hosting account, but visible at www.theSubDirectorySite.com) Not ideal, I know, but that's a different issue. I want to block bots from viewing those files that are accessible in subdirectories on the main hosting account, www.RootDomain.com/SubDirectorySite/, and force the bots to look at www.SubDirectorySite.com instead. I utilized the canonical meta tag to point bots away from the sub directory site, but I am wondering what will happen if I use robots.txt to block those files from within the root domain. Will the bots, specifically Google bot, still index the site at its own URL, www.AnotherSite.com even if I've blocked that directory with Disallow: /AnotherSite/ ? THANK YOU!!!
Technical SEO | | michaelj_me0 -
URL Structure "-" vs "/"? Are there any advantages to one over the other?
An example would be domain.com/keyword/keyword2 vs domain.com/keyword-keyword2 Are there any advantages / disadvantages to one over the other?
Technical SEO | | nicole.healthline0 -
Should I Block Tag, Category, Author Pages
Just finished reviewing the first crawl of my first SEOmoz campaign for a site that I am working on. The site I"m working on uses Wordpress as a CMS, and most if not all of the warnings and notices have to do with author, category, and tag pages. Should I block these from being indexed? Why or why not?
Technical SEO | | Falconberg0 -
What should i do with the links for "Login", "Register", "My Trolley" links on every page.
My website ommrudraksha has 3 links on every page. 1. Login 2. Register 3. My trolley My doubt is i do not want to give any weightage to these links. does these links will be calculated when page links are calculated ? Should i remove these as links and place these as buttons ? ( with look a like of link visually ? )
Technical SEO | | Ommrudraksha0