Massive URL blockage by robots.txt
-
Hello people,
In May there has been a dramatic increase in blocked URLs by robots.txt, even though we don't have so many URLs or crawl errors. You can view the attachment to see how it went up. The thing is the company hasn't touched the text file since 2012. What might be causing the problem? Can this result any penalties? Can indexation be lowered because of this?
-
Even though there are less pages indexed compared to those that are blocked, you still have a significant increase in indexed pages as well. That is a good thing! You technically have more pages that are indexed than before. It looks like you possibly relaunched the site or something? More pages blocked could be an indexing problem, or it might be a good thing - it all depends on what pages are being blocked.
If you relaunched the site and used this great new whiz-bang CMS that created an online catalog that gave your users 54 ways to sort your product catalog, then the number of "pages" could increase with each sort. Just imagine, sort your widgets by color, or by size or by price, or by price and size, or by size and color, or by color and price - you get the idea. Very quickly you have a bunch of duplicate pages of a single page. If your SEO was on his or her toes, they would account for this using a canonical approach or possibly a meta noindex or changing the robots.txt etc. That would be good as you are not going to confuse Google with all the different versions of the same page.
Ultimately, Shailendra has the approach that you need to take. Look in robots.txt, look at the code on your pages. What happened around 5/26/2013? All those things need to be looked at to try and answer your question.
-
Le Fras,
You don't only have to change the robots.txt file for Google to indicate that more URLs are being blocked by it. The robots.txt file tells the search engines not to crawl given URLs, but that they may keep them in the index and display the URLs in the search results.
So the search engines do know of the URLs that are being blocked and they are able to indicate that more are being blocked as you add pages to your site that are restricted by the robots.txt file.
-
Check you robots file. Are there entries to block the crawling? If you can give the url then it would be helpful/
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Only Indexing Canonical Root URL Instead of Specified URL Parameters
We just launched a website about 1 month ago and noticed that Google was indexing, but not displaying, URLs with "?location=" parameters such as: http://www.castlemap.com/local-house-values/?location=great-falls-virginia and http://www.castlemap.com/local-house-values/?location=mclean-virginia. Instead, Google has only been displaying our root URL http://www.castlemap.com/local-house-values/ in its search results -- which we don't want as the URLs with specific locations are more important and each has its own unique list of houses for sale. We have Yoast setup with all of these ?location values added in our sitemap that has successfully been submitted to Google's Sitemaps: http://www.castlemap.com/buy-location-sitemap.xml I also tried going into the old Google Search Console and setting the "location" URL Parameter to Crawl Every URL with the Specifies Effect enabled... and I even see the two URLs I mentioned above in Google's list of Parameter Samples... but the pages are still not being added to Google. Even after Requesting Indexing again after making all of these changes a few days ago, these URLs are still displaying as Allowing Indexing, but Not On Google in the Search Console and not showing up on Google when I manually search for the entire URL. Why are these pages not showing up on Google and how can we get them to display? Only solution I can think of would be to set our main /local-house-values/ page to noindex in order to have Google favor all of our other URL parameter versions... but I'm guessing that's probably not a good solution for multiple reasons.
Intermediate & Advanced SEO | | Nitruc0 -
Redirecting to Modal URLs
Hi everyone! Long time no chat - hope you're all well! I have a question that for some reason is causing me some trouble. I have a client that is creating a new website, the process was a mess and I am doing a last minute redirect file for them (long story, for another time). They have different teams for different business categories, so there are multiple staff pages with a list of staffers, and a link to their individual pages. Currently they have a structure like this for their staff bios... www.example.com/category-staff/bob-johnson/ But now, to access the staffers bio, a modal pops up. For instance... www.example.com/category-staff/#bob-johnson Should I redirect current staffers URLs to the staff category, or the modal URL? Unfortunately, we are late in the game and this is the way the bio pages are set up. Would love thoughts, thanks so much guys!!
Intermediate & Advanced SEO | | PatrickDelehanty0 -
Url design for automobile parts
Hi All, Im designing the url and im confused, need your experts advice engine-oil is a category I will display car truck, bike oils only
Intermediate & Advanced SEO | | Rahim119
Car > in this page I will display engine oils only related to car
Hyundia> in this page I will display engine oils only related to hyundia
i30 > in this page I will display engine oils only related to i30 models
Petrol > in this page I will display engine oils only related to petrol So im planning for www.xyz.com/engine-oil/car/Hyundia/i30/Petrol or should I write like this below xyz.com/c-engine-oil.html
xyz.com/c-car-engine-oil.html
xyz.com/c-hyundia--car-engine-oil.html
xyz.com/c-hyundia-i30-car-engine-oil.html
xyz.com/c-hyundia-i30-Petrol-car-engine-oil.html and also i heard i should keep 3 folders max.. so confused..
i have lot of car parts like engine oil, gear oil, tyres, battery,etc(categories)0 -
URL Changes Twice in the Same Year
I've got a new client with a great site, great off-page optimization and some scars and a hangover from a bad developer relationship. I'd be so grateful for your thoughts on this situation: Some time in the not-too-distant-past, the website is established and new content is posted. We'll call this Alpha. In April 2015, the client migrates to WordPress, implementing 301 redirects on every content page because of the capitalization issues of the old CMS. That means Alpha URLs are redirecting to Betas. Problem is, the new Beta WordPress URLs are the the permalink structure: /%year%/%monthnum%/%postname%/ and update by default when the page content is updated meaning that any updates to existing content cause another 301. It's my belief that for evergreen content, dates in the URL do nothing to help you and might even hurt from a user-experience standpoint, if not a search engine one. So, naturally, I'd like to move to the simple/%postname%/ structure, which would be Gamma. So, here's how I think we should fix it. Step 1: Update the sitemap and navigation and make the desired URL (Gamma) structure the default and the canonical. Step 2: Change the Alpha -> Beta redirects to Alpha -> Gamma Step 3: Add Beta -> Gamma redirects Anyone done this in the past? Anyone have any problems with it?
Intermediate & Advanced SEO | | LindsayDayton0 -
Redirect to url with parameter
I have a wiki (wiki 1) where many of the pages are well index in google. Because of a product change I had to create a new wiki (wiki 2) for the new version of my product. Now that most of my customers are using the new version of my product I like to redirect the user from wiki 1 to wiki 2. An example of a redirect could be from wiki1.website.com/how_to_build_kitchen to wiki2.website.com/how_to_build_kitchen. Because of a technical issue the url I redirect to, needs to have a parameter like "?" so the example will be wiki2.website.com/how_to_build_kitchen? Will the search engines see it as I have two pages with same content?
Intermediate & Advanced SEO | | Debitoor
wiki2.website.com/how_to_build_kitchen
and
wiki2.website.com/how_to_build_kitchen? And will the SEO juice from wiki1.website.com/how_to_build_kitchen be transfered to wiki2.website.com/how_to_build_kitchen?0 -
Ecommerce URL's
I'm a bit divided about the URL structure for ecommerce sites. I'm using Magento and I have Canonical URLs plugin installed. My question is about the URL structure and length. 1st Way: If I set up Product to have categories in the URL it will appear like this mysite.com/category/subcategory/product/ - and while the product can be in multiple places , the Canonical URL can be either short or long. The advantage of having this URL is that it shows all the categories in the breadcrumbs ( and a whole lot more links over the site ) . The disadvantage is the URL Length 2nd Way: Setting up the product to have no category in the URL URL will be mysite.com/product/ Advantage: short URL. disadvantage - doesn't show the categories in the breadcrumbs if you link direct. Thoughts?
Intermediate & Advanced SEO | | s_EOgi_Bear1 -
Robots.txt & Duplicate Content
In reviewing my crawl results I have 5666 pages of duplicate content. I believe this is because many of the indexed pages are just different ways to get to the same content. There is one primary culprit. It's a series of URL's related to CatalogSearch - for example; http://www.careerbags.com/catalogsearch/result/index/?q=Mobile I have 10074 of those links indexed according to my MOZ crawl. Of those 5349 are tagged as duplicate content. Another 4725 are not. Here are some additional sample links: http://www.careerbags.com/catalogsearch/result/index/?dir=desc&order=relevance&p=2&q=Amy
Intermediate & Advanced SEO | | Careerbags
http://www.careerbags.com/catalogsearch/result/index/?color=28&q=bellemonde
http://www.careerbags.com/catalogsearch/result/index/?cat=9&color=241&dir=asc&order=relevance&q=baggallini All of these links are just different ways of searching through our product catalog. My question is should we disallow - catalogsearch via the robots file? Are these links doing more harm than good?0 -
Best way to find all url parameters?
In reference to http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html, what is the best way to find all of the parameters that need to be addressed? Thanks!
Intermediate & Advanced SEO | | nicole.healthline0