Large robots.txt file
-
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately)
Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down.
Does anybody have any experience with a robots.txt of that size?
-
Answered my own questions:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt?csw=1#file-format
"A maximum file size may be enforced per crawler. Content which is after the maximum file size may be ignored. Google currently enforces a size limit of 500kb."
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disavow files and net com org etc ....
When looking at my backlinks if I see something like this: www.domainPizza.net
Intermediate & Advanced SEO | | HLTalk
www.domainPizza.com
sub.domainPizza.com
www.domainpizza.org
domainPizza.net
https://domainpizza.com
https://www.domainpizza.net What is the actual list of disavows that I put into the file if I want to disavow this domain? I am seeing so many variations of the same domain. Thank you.0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Links from swf file widely distributed?
Hello, I just realised that Google is listing as backlinks, links from swf games that we created and distributed widely. We never used this method to have backlinks but as we create the games and give them for free to other sites, we added a link back to our site, if the user who played the game want to visit us. But I am worried that this is interpreted as black hat seo, and this affected our ranking badly. Anyone had this kind of issue? How do you think we should be tackling this? Is this could be affected our site? Thanks for your help on this guys 😉
Intermediate & Advanced SEO | | drimlike0 -
Template Files .tpl versus .html files
We sell a large selection of Insulation Products use template files (.tpl) to collect up-to-date information from a server side database file that contains some 2,500 line items. When an HTML (.html) file is requested on the Internet, the 'example.tpl' file is accessed, the latest product and and pricing information is accessed, then presented to the viewer as 'example.html' My question: Can the use of .tpl files negatively impact Search Engine acceptance?
Intermediate & Advanced SEO | | Collie0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0 -
Strategy for a large website where you only work for one business unit.
I have been tasked with improving traffic/leads to www.intertek.com. The problems we face are that I only work for one of the business units. There are many within the company and they all work independantly. The services my division offers range from ISO certification to food safety/testing to oil and gas services. They want to increase their quality content and traffic. What is the best strategy to approach working with a company this diverse and the limitation of managing 500 pages of a 15,000 page site? What are the first steps and what actions do you think would give the best results?
Intermediate & Advanced SEO | | laura-intertek0 -
Sitemaps. When compressed do you use the .gz file format or the (untidy looking, IMHO) .xml.gz format?
When submitting compressed sitemaps to Google I normally use the a file named sitemap.gz A customer is banging on that his web guy says that sitemap.xml.gz is a better format. Google spiders sitemap.gz just fine and in Webmaster Tools everything looks OK... Interested to know other SEOmoz Pro's preferences here and also to check I haven't made an error that is going to bite me in the ass soon! Over to you.
Intermediate & Advanced SEO | | NoisyLittleMonkey0 -
Block all search results (dynamic) in robots.txt?
I know that google does not want to index "search result" pages for a lot of reasons (dup content, dynamic urls, blah blah). I recently optimized the entire IA of my sites to have search friendly urls, whcih includes search result pages. So, my search result pages changed from: /search?12345&productblue=true&id789 to /product/search/blue_widgets/womens/large As a result, google started indexing these pages thinking they were static (no opposition from me :)), but i started getting WMT messages saying they are finding a "high number of urls being indexed" on these sites. Should I just block them altogether, or let it work itself out?
Intermediate & Advanced SEO | | rhutchings0