Massive URL blockage by robots.txt
-
Hello people,
In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this?
-
Even though there are less pages indexed compared to those that are blocked, you still have a significant increase in indexed pages as well. That is a good thing! You technically have more pages that are indexed than before. It looks like you possibly relaunched the site or something? More pages blocked could be an indexing problem, or it might be a good thing - it all depends on what pages are being blocked.
If you relaunched the site and used this great new whiz-bang CMS that created an online catalog that gave your users 54 ways to sort your product catalog, then the number of "pages" could increase with each sort. Just imagine, sort your widgets by color, or by size or by price, or by price and size, or by size and color, or by color and price - you get the idea. Very quickly you have a bunch of duplicate pages of a single page. If your SEO was on his or her toes, they would account for this using a canonical approach or possibly a meta noindex or changing the robots.txt etc. That would be good as you are not going to confuse Google with all the different versions of the same page.
Ultimately, Shailendra has the approach that you need to take. Look in robots.txt, look at the code on your pages. What happened around 5/26/2013? All those things need to be looked at to try and answer your question.
-
Le Fras,
You don't only have to change the robots.txt file for Google to indicate that more URLs are being blocked by it. The robots.txt file tells the search engines not to crawl given URLs, but that they may keep them in the index and display the URLs in the search results.
So the search engines do know of the URLs that are being blocked and they are able to indicate that more are being blocked as you add pages to your site that are restricted by the robots.txt file.
-
Check you robots file. Are there entries to block the crawling? If you can give the url then it would be helpful/
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Massive SERP crash
I came in this morning to a swath of email updates from Moz. Our site had jumped in ranking in the four geographic regions we target and we were seeing (more or less) the best results we've ever had. Most of the jumps were only 1 - 4 places but they were for our most competitive keywords. Late this afternoon I did some spot checks on keywords to confirm that we were still holding roughly in the same positions and before I reported this to the wider business. We're not. What's more we have dropped off the first page for some terms and don't seem to be ranked at all for for others. The only keywords we're getting good rankings for are branded search terms. I can only assume this is due to a technical issues over the weekend our developers caused. I'm not completely across this but I at some point our sitemaps stopped working and a huge number of links on the site were broken. Could a massive surge in 404s cause this? I'm checking google analytics and I can't see a drop in organic traffic yet although I don't have the full figures for today. Thanks
Intermediate & Advanced SEO | | ahyde0 -
SEO within the URL /
If I were optimizing for 'marketing success' and my URL structure was domain.com/marketing/success would that count? I'm not sure if the '/' affects the keyword term. My assumption is that it does, but I wasn't 100% sure. Thanks!
Intermediate & Advanced SEO | | KristinaWitmer0 -
Robots.txt & Duplicate Content
In reviewing my crawl results I have 5666 pages of duplicate content. I believe this is because many of the indexed pages are just different ways to get to the same content. There is one primary culprit. It's a series of URL's related to CatalogSearch - for example; http://www.careerbags.com/catalogsearch/result/index/?q=Mobile I have 10074 of those links indexed according to my MOZ crawl. Of those 5349 are tagged as duplicate content. Another 4725 are not. Here are some additional sample links: http://www.careerbags.com/catalogsearch/result/index/?dir=desc&order=relevance&p=2&q=Amy
Intermediate & Advanced SEO | | Careerbags
http://www.careerbags.com/catalogsearch/result/index/?color=28&q=bellemonde
http://www.careerbags.com/catalogsearch/result/index/?cat=9&color=241&dir=asc&order=relevance&q=baggallini All of these links are just different ways of searching through our product catalog. My question is should we disallow - catalogsearch via the robots file? Are these links doing more harm than good?0 -
Urls missing from product_cat sitemap
I'm using Yoast SEO plugin to generate XML sitemaps on my e-commerce site (woocommerce). I recently changed the category structure and now only 25 of about 75 product categories are included. Is there a way to manually include urls or what is the best way to have them all indexed in the sitemap?
Intermediate & Advanced SEO | | kisen0 -
Meta canonical or simply robots.txt other domain names with same content?
Hi, I'm working with a new client who has a main product website. This client has representatives who also sells the same products but all those reps have a copy of the same website on another domain name. The best thing would probably be to shut down the other (same) websites and redirect 301 them to the main, but that's impossible in the minding of the client. First choice : Implement a conical meta for all the URL on all the other domain names. Second choice : Robots.txt with disallow for all the other websites. Third choice : I'm really open to other suggestions 😉 Thank you very much! 🙂
Intermediate & Advanced SEO | | Louis-Philippe_Dea0 -
Rewriting URL
I'm doing a major URL rewriting on our site to make the URL more SEO friendly as well as more comfortable and intuitive for our users. Our site has a lot of indexed pages, over 250k. So it will take Google a while to reindex everything. I was thinking that when Google Bot encounters the new URLs, it will probably figure out it's duplicate content with the old URL. At least until it recrawls the old URL and get a 301 directing them to the new URL. This will probably lower the ranking of every page being crawled. Am I right to assume this is what will happen? Or is it fine as long as the old URLs get 301 redirect? If it is indeed a problem, what's the best solution? rel="canonical" on every single page maybe? Another approach? Thank you.
Intermediate & Advanced SEO | | corwin0 -
What url should i link to?
Hi everybody, after some discussions i decided to keep my page on the old domain for better seo rankings; However, the new third level domain sounds better: poltronafraubrescia.zenucchi.it.... the question is: i'm going to recive a high value link and i don't know if i should link directly to the old adress ( www.zenucchi.it/ITA/poltrona-frau-brescia.it ) where the page is located or to the new one by making a 301 redirect to the previous. what's best? and second question what's the way to keep the page on this adress ( www.zenucchi.it/ITA/poltrona-frau-brescia.it ) but show poltronafraubrescia.zenucchi.it as url? thank you guido
Intermediate & Advanced SEO | | guidoboem0 -
Changing Site URLs
I am working on a new client that hasn't implemented any SEO previously. The site has terrible url nomenclature and I am wondering if it is worth it to try and change it. Will I lose rankings? What is the best url naming structure? Here's the website http://www.formica.com/en/home/TradeLanding.aspx. (I am only working on the North America site.) Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0