"Extremely high number of URLs" warning for robots.txt blocked pages
-
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system:
http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets
This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search.
User-agent: *
Disallow: /trackingredirect
However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed.
If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
-
Awesome, good to know things are all okay!
-
Yes, Google does not appear to be crawling or indexing any of the pages in question, and GWT doesn't note any issues with crawl budget.
-
And everything looks okay in your GWT?
-
This is what my other research has suggested, as well. Google is "discovering" millions of URLs that go into a queue to get crawled, and they're reporting the extremely high number of URLs in Webmaster Tools before they actually attempt to crawl, and see that all these URLs are blocked by robots.txt.
-
Hi Ehren,
Google has said that they send those warnings before they actually crawl your site (why they would bother you with a warning so quickly, I don't know), so I wouldn't worry about this if the warning is the only sign you're getting that Google might be crawling disallowed pages.
What is your Google Webmaster Tools account saying? If Google isn't reporting to you that it's spending too long crawling your site, and the correct number of pages are indexed, you should be fine.
Let me know if this is a bigger problem!
Kristina
-
Federico, my concern is how do I get Google to spend spending so much crawl time on those pages. I don't want Google to waste time crawling pages that are blocked in my robots.txt.
-
There's nothing you need to do. If you don't want those pages to be indexed leaving the robots.txt as it is is fine.
You can mark that in your Webmaster Tools as fixed and Google won't notify you again.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Schema Markup Warning "Missing field "url" (optional)"
Hello Moz Team, I hope everyone is doing well & good, I need bit help regarding Schema Markup, I am facing issue in my schema markup specifically with my blog posts, In my majority of the posts I find error "Missing field "url" (optional)"
Technical SEO | | JoeySolicitor
As this schema is generated by Yoast plugin, I haven't applied any custom steps. Recently I published a post https://dailycontributors.com/kisscartoon-alternatives-and-complete-review/ and I tested it at two platforms of schema test 1, Validator.Schema.org
2. Search.google.com/test/rich-results So the validator generate results as follows and shows no error
Schema without error.PNG It shows no error But where as Schema with error.PNG in search central results it gives me a warning "Missing field "url" (optional)". So is this really be going to issue for my ranking ? Please help thanks!6 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Robots.txt
Hi All Having a robots.txt looking like the below will this stop Google crawling the site User-agent: *
Technical SEO | | internetsalesdrive0 -
What is the best URL designed for a product page?
Should a product page URL include the category name and subcategory name in it? Most ecommerce platforms it seems are designed to do have the category and sub-category names included in the URL followed by the product name. If that is the case and the same product is listed in more then 1 category and sub-category then will that product have 2 unique urls and as a result be treated as 2 different product pages by google? And then since it is the same product in two places on the site won't google treat those 2 pages as having duplicate content? SO is it best to not have the category and sub-category names in the URL of a product page? And lastly, is there a preferred character limit for a URL to be less than in size? Thanks!
Technical SEO | | gallreddy0 -
Blocked by meta-robots but there is no robots file
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
Technical SEO | | Twinbytes0 -
Should search pages be disallowed in robots.txt?
The SEOmoz crawler picks up "search" pages on a site as having duplicate page titles, which of course they do. Does that mean I should put a "Disallow: /search" tag in my robots.txt? When I put the URL's into Google, they aren't coming up in any SERPS, so I would assume everything's ok. I try to abide by the SEOmoz crawl errors as much as possible, that's why I'm asking. Any thoughts would be helpful. Thanks!
Technical SEO | | MichaelWeisbaum0 -
Warnings on Pages excluded from Search Engines
I am new to this, so my question may seem a little rookie type... When looking at my crawl diagnostic errors there are 1604 warnings for "302 redirects". Of those 1604 warnings 1500 of them are for the same page with different product ID's on them such as: www.soccerstop.com/EMailproduct.aspx?productid=999
Technical SEO | | SoccerStop
www.soccerstop.com/EMailproduct.aspx?productid=998 In our robots.txt file we have Disallow: /emailproduct.aspx Wouldn't that take care of this problem? If so, why is it still giving me these warning errors? It does take into account our robots.txt file when generating this report does it not? Thanks for any help you can provide.
James0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0