How to use robots.txt to block areas on page?
-
Hi,
Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo?
How can I alter robots.txt to tell google not to crawl those particular text
Thanks for any advice!
-
Thanks for the info above. I think I'll find out if I can cut the text and try to put popup link.
-
Hi Laura
I have not used lazy loading except with images, however I did some reading around and it might be a solution. There is a large section in Google Webmasters that talks about how to make AJAX readable by a crawler/bot so obviously it is not normally readable (Google Webmaster on AJAX crawling).
The other option is to provide a summary on the product page for shipping info and link to a larger shipping info page (as suggested earlier) and get it to open on a new page/tab. At least this keeps the product page open too.
(Note good UX practice recommends you tell the user they will open a new page if they click on the link - this could be as simple as using the anchor text: "More Detailed Shipping Information (opens new page)".
cheers
Neil
-
Here is a tip that I use for my clients and I would recommend. Most CMS / Ecommerce platforms allow for you to put a category description in the page. But, what they do is when the page paginates is they use the same category description and just different products on the page (some use a querystring on the url, others use a shebang, others use other things).
What I recommend to my clients to escape any thin content issues is to point the canonical url of all of the paginated pages back to the 1st category page. At the same time I will add a noindex, follow tag to the header of the paginated pages. This is counter to what a lot of people do I think, but the reason I do it is because of thin content. Also you don't want your page 3 results cannibalizing your main category landing page results. Since no CMS that I know of lets you specify different category descriptions for each pagination of a category it seems like the only real choice. It also makes it where you do not really need to add rel=next and rel=previous to the paginated pages too.
-
Thanks, the info above is quite detailed.
We are not a shipping company those text are just to ensure visitors accordingly. The shipping info is quite long as we want to prompt as mush as we could to avoid customer leaving current page to search.
-
Hi Laura
I am not sure that you can use robots.txt to prevent a search engine bot from crawling a part of a page. Robots.txt is usually used to exclude a whole page.
The effect of the duplicate content on your search engine optimisation depends in part on how extensive is the duplication. In many cases it seems that Google won't penalise the duplicate content (it understands that some content will of necessity be duplicated) - see this video by Matt Cutts from Google.
Duplicate Content is Small (Short Paragraph)
From your question it sounds like you are talking about part of page and it sounds like a relatively small part - I assume you are not a shipping company so the shipping info would be a small part of the page.
In which case it may not affect your search engine optimisation at all (assuming you are not trying to rank for the shipping info).
As long as the content on the rest of the page is unique or different from other pages on the site.
Duplicate Content is Large (but not a page)
If the shipping info is substantial (say a couple of paragraphs or half the content on the page) then Google suggests you create a separate page with the substantial info on it and use a brief summary on other pages with a link to the separate page:
- Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
(from Google Webmaster: Duplicate Content)
Duplicated Pages
Much of the discussion about duplicated content is more about whole pages of duplicated content. The risk with these pages are that search engines may not know which to rank (or more to the point rank the one you don't want to rank). This is where you might use a rel=canonical tag or a 301 redirect to direct or hint to the search engine which page to use.
Moz has a good article on Duplicate Content.
All the best
Neil
-
Hiya,
First off the main answer is here - http://moz.com/learn/seo/robotstxt
an alternative solution might be use of the canonical tag meaning you're getting all the link juice rather than letting it fall off the radar. I wouldn't be overly worried about duplicate content its not a big bad wolf that will annihilate your website.
Best idea if you're worried about duplicate content is the canonical tag it has the benefit of keeping link juice where as the robots tends to mean you loose some link juice. One thing to remember those is the canonical tag means the pages will not be indexed (same as robots tag in the end) so if they are ranking (or getting page views) something to remember.
hope that helps.
Good luck.
-
Google smart enough to recognize what it is, it won't get you penalized for duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using one robots.txt for two websites
I have two websites that are hosted in the same CMS. Rather than having two separate robots.txt files (one for each domain), my web agency has created one which lists the sitemaps for both websites, like this: User-agent: * Disallow: Sitemap: https://www.siteA.org/sitemap Sitemap: https://www.siteB.com/sitemap Is this ok? I thought you needed one robots.txt per website which provides the URL for the sitemap. Will having both sitemap URLs listed in one robots.txt confuse the search engines?
Technical SEO | | ciehmoz0 -
Pages with Duplicate Page Content Crawl Diagnostics
I have Pages with Duplicate Page Content in my Crawl Diagnostics Tell Me How Can I solve it Or Suggest Me Some Helpful Tools. Thanks
Technical SEO | | nomyhot0 -
Many Pages Being Combined Into One Long Page
Hi All, In talking with my internal developers, UX, and design team there has been a big push to move from a "tabbed" page structure (where as each tab is it's own page) to combining everything into one long page. It looks great from a user experience standpoint, but I'm concerned that we'll decrease in rankings for the tabbed pages that will be going away, even with a 301 in place. I initially recommending#! or pushstate for each "page section" on the long form content. However there are technical limitations with this in our CMS. The next idea I had was to still leave those pages out there and to link to them in the source code, but this approach may get shot down as well. Has anyone else had to solve for this issue? If so, how did you do it?
Technical SEO | | AllyBank1 -
Can you noindex a page, but still index an image on that page?
If a blog is centered around visual images, and we have specific pages with high quality content that we plan to index and drive our traffic, but we have many pages with our images...what is the best way to go about getting these images indexed? We want to noindex all the pages with just images because they are thin content... Can you noindex,follow a page, but still index the images on that page? Please explain how to go about this concept.....
Technical SEO | | WebServiceConsulting.com0 -
Best use of robots.txt for "garbage" links from Joomla!
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received. One of my biggest gripes is the point of "Dublicate Page Content". Right now im having over 200 pages with dublicate page content. Now.. This is triggerede because Seomoz have snagged up auto generated links from my site. My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears. Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google. So i just want to get rid of them. Now to my question I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all. But, how do i do this? what should my syntax be? A lof of the links looks like this, but has different id numbers according to the product that is being send: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I guess i need a rule that grabs the following and makes google ignore links that contains this: view=send_friend
Technical SEO | | teleman0 -
2 links on home page to each category page ..... is page rank being watered down?
I am working on a site that has a home page containing 2 links to each category page. One of the links is a text link and one link is an image link. I think I'm right in thinking that Google will only pay attention to the anchor text/alt text of the first link that it spiders with the anchor text/alt text of the second being ignored. This is not my question however. My question is about the page rank that is passed to each category page..... Because of the double links on the home page, my reckoning is that PR is being divided up twice as many times as necessary. Am I also right in thinking that if Google ignore the 2nd identical link on a page only one lot of this divided up PR will be passed to each category page rather than 2 lots ..... hence horribly watering down the 'link juice' that is being passed to each category page?? Please help me win this argument with a developer and improve the ranking potential of the category pages on the site 🙂
Technical SEO | | QubaSEO0 -
Diagnostic says too many links on a page and most of the pages are from blog entries. Are tags considered links? How do I decrease links?
I just ran my first diagnostic on my site and the results came back were negative in the area of too many links one a page. There were also quite a few 404 errors. What is the best way to fix these problems? Most of the pages with too many links are from blog posts, are the tags counted as well and is this the reason for too many links?
Technical SEO | | Newport10300 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0