Robots.txt Blocked Most Site URLs Because of Canonical
-
Had a bit of a "Gotcha" in Magento.
We had Yoast Canonical Links extension which worked well , but then we installed Mageworx SEO Suite.. which broke Canonical Links.
Unfortunately it started putting www.mysite.com/catalog/product/view/id/516/ as the Canonical Link - and all URLs with /catalog/productview/* is blocked in Robots.txt
So unfortunately We told Google that the correct page is also a blocked page. they haven't been removed as far as I can see but traffic has certainly dropped.
We have also , at the same time had some Site changes grouping some pages & having 301 redirects.
Resubmitted site map & did a fetch as google.
Any other ideas? And Idea how long it will take to become unblocked?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
Google Ignoring Canonical Tag for Hundreds of Sites
Bazaar Voice provides a pretty easy-to-use product review solution for websites (especially sites on Magento): https://www.magentocommerce.com/magento-connect/bazaarvoice-conversations-1.html If your product has over a certain number of reviews/questions, the plugin cuts off the number of reviews/questions that appear on the page. To see the reviews/questions that are cut off, you have to click the plugin's next or back function. The next/back buttons' URLs have a parameter of "bvstate....." I have noticed Google is indexing this "bvstate..." URL for hundreds of sites, even with the proper rel canonical tag in place. Here is an example with Microsoft: http://webcache.googleusercontent.com/search?q=cache:zcxT7MRHHREJ:www.microsoftstore.com/store/msusa/en_US/pdp/Surface-Book/productID.325716000%3Fbvstate%3Dpg:8/ct:r+&cd=2&hl=en&ct=clnk&gl=us My website is seeing hundreds of these "bvstate" urls being indexed even though we have a proper rel canonical tag in place. It seems that Google is ignoring the canonical tag. In Webmaster Console, the main source of my duplicate titles/metas in the HTML improvements section is the "bvstate" URLs. I don't necessarily want to block "bvstate" in the robots.txt as it will prohibit Google from seeing the reviews that were cutoff. Same response for prohibiting Google from crawling "bvstate" in Paramters section of Webmaster Console. Should I just keep my fingers crossed that Google honors the rel canonical tag? Home Depot is another site that has this same issue: http://webcache.googleusercontent.com/search?q=cache:k0MBLFcu2PoJ:www.homedepot.com/p/DUROCK-Next-Gen-1-2-in-x-3-ft-x-5-ft-Cement-Board-172965/202263276%23!bvstate%3Dct:r/pg:2/st:p/id:202263276+&cd=1&hl=en&ct=clnk&gl=us
Intermediate & Advanced SEO | | redgatst1 -
Yoast & rel canonical for paginated Wordpress URLs
Hello, our Wordpress blog at http://www.jobs.ca/career-resources has a rel canonical issue since we added pagination to the front page and category-pages. We're using Yoast and it's incorrectly applying a rel-canonical meta tag referencing page 1 on page 2, 3, etc. This is a known misuse of the rel-canonical tag (per Google's Webmaster Blog - http://googlewebmastercentral.blogspot.ca/2013/04/5-common-mistakes-with-relcanonical.html, which says rel-canonical should be replaced with rel-prev and rel-next for page 2, 3, etc.). We don't see a way to specify anywhere in Yoast's options to correct this behaviour for page 2, 3, etc. Yoast allows you to override a page's canonical URL, otherwise it automatically uses the Wordpress permalink. My question is, does anyone know how to configure Yoast to properly replace rel-canonical tags with rel-prev and rel-next for paginated URLs, or do I need to look at another plugin or customize the behavior directly in my child theme code? This issue was brought up here as well: http://moz.com/community/q/canonical-help, but the only response did not relate to Yoast. (We're using Wordpress 3.6.1 and Yoast "Wordpress SEO" 1.4.18)
Intermediate & Advanced SEO | | aactive0 -
After Receiving a "Googlebot can't access your site" would this stop your site from being crawled?
Hi Everyone,
Intermediate & Advanced SEO | | AMA-DataSet
A few weeks ago now I received a "Googlebot can't access your site..... connection failure rate is 7.8%" message from the webmaster tools, I have since fixed the majority of these issues but iv noticed that all page except the main home page now have a page rank of N/A while the home page has a page rank of 5 still. Has this connectivity issues reduced the page ranks to N/A? or is it something else I'm missing? Thanks in advance.0 -
Blog URL Canonical
Hi Guy's, I would like to know your thoughts on the following set-up for blog canonical. Option 1 domain.com/blog = <link rel="canonical" href="domin.com/blog"> domain.com/blog-category/general = <link rel="canonical" href="domain.com/blog"> domain.com/blog-article/how-to-set-canonical = no canonical option 2 domain.com/blog = <link rel="canonical" href="domin.com blog"="">(as option 1)</link rel="canonical" href="domin.com> domain.com/blog-category/general = <link rel="canonical" href="domain.com blog-category="" general"="">(this time has the canonical of the category)</link rel="canonical" href="domain.com> domain.com/blog-article/how-to-set-canonical = <link rel="canonical" href="domain.com blog-article="" how-to-set-canonical"="">(this time has the canonical of the article full URL)</link rel="canonical" href="domain.com> Just not sure which is the best option, or even if it is any of the above! Thanks Dan
Intermediate & Advanced SEO | | Dan1e10 -
About robots.txt for resolve Duplicate content
I have a trouble with Duplicate content and title, i try to many way to resolve them but because of the web code so i am still in problem. I decide to use robots.txt to block contents that are duplicate. The first Question: How do i use command in robots.txt to block all of URL like this: http://vietnamfoodtour.com/foodcourses/Cooking-School/
Intermediate & Advanced SEO | | magician
http://vietnamfoodtour.com/foodcourses/Cooking-Class/ ....... User-agent: * Disallow: /foodcourses ( Is that right? ) And the parameter URL: h
ttp://vietnamfoodtour.com/?mod=vietnamfood&page=2
http://vietnamfoodtour.com/?mod=vietnamfood&page=3
http://vietnamfoodtour.com/?mod=vietnamfood&page=4 User-agent: * Disallow: /?mod=vietnamfood ( Is that right? i have folder contain module, could i use: disallow:/module/*) The 2nd question is: Which is the priority " robots.txt" or " meta robot"? If i use robots.txt to block URL, but in that URL my meta robot is "index, follow"0 -
Best Format for URLs on large Ecommerce Site?
I saw this article, http://www.distilled.net/blog/seo/common-ecommerce-technical-seo-problems/, and noticed that Geoff mentioned that product URLs format should be in one of the following ways: Product Page: site.com/product-name Product Page: site.com/category/sub-category/product-name However, for SEO, is there a preferred way? I understand that the top one may be better to prevent duplicate page issues, but I would imagine that the bottom would be better for conversion (maybe the user backtracks to site.com/category/sub-category/ to see other products that he may be interested in). Also, I'd imagine that the top URL would not be a great way to distribute link juice since everything would be attached to the root, right?
Intermediate & Advanced SEO | | eTundra0 -
No index, follow vs. canonical url
We have a site that consists almost entirely as a directory of videos. Example here: http://realtree.tv/channels/realtreeoutdoorsclassics We're trying to figure out the best way to handle pagination and utility features such as sort for most recent, most viewed, etc. We've been reading countless articles on this topic, but so far have been unable to determine what might be considered the industry standard. Two solutions seem to stand out... Using the canonical url on all the sorted and paginated pages. However, after reading many blog posts, it seems that you should NEVER use the canonical url to solve the issue of paginated, and thus duplicated content because the search bots will never crawl past the first page leaving many results not in the index. (We are considering ruling this method out.) Another solution seems to be using the meta tag for noindex, follow so that a search engine like Google will crawl your directory pages but not add them to the index themselves. All links are followed so content is crawled and any passing link juice remains unchanged. However, I did see a few articles skeptical of this solution as well saying that there are always better alternatives, or that there is no verification that search engines obey this meta tag. This has placed some doubt in our minds. I was hoping to get some expert advice on these methods as it would pertain to our site. Thank you.
Intermediate & Advanced SEO | | grayloon0