How to use robots.txt to block areas on page?
-
Hi,
Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo?
How can I alter robots.txt to tell google not to crawl those particular text
Thanks for any advice!
-
Thanks for the info above. I think I'll find out if I can cut the text and try to put popup link.
-
Hi Laura
I have not used lazy loading except with images, however I did some reading around and it might be a solution. There is a large section in Google Webmasters that talks about how to make AJAX readable by a crawler/bot so obviously it is not normally readable (Google Webmaster on AJAX crawling).
The other option is to provide a summary on the product page for shipping info and link to a larger shipping info page (as suggested earlier) and get it to open on a new page/tab. At least this keeps the product page open too.
(Note good UX practice recommends you tell the user they will open a new page if they click on the link - this could be as simple as using the anchor text: "More Detailed Shipping Information (opens new page)".
cheers
Neil
-
Here is a tip that I use for my clients and I would recommend. Most CMS / Ecommerce platforms allow for you to put a category description in the page. But, what they do is when the page paginates is they use the same category description and just different products on the page (some use a querystring on the url, others use a shebang, others use other things).
What I recommend to my clients to escape any thin content issues is to point the canonical url of all of the paginated pages back to the 1st category page. At the same time I will add a noindex, follow tag to the header of the paginated pages. This is counter to what a lot of people do I think, but the reason I do it is because of thin content. Also you don't want your page 3 results cannibalizing your main category landing page results. Since no CMS that I know of lets you specify different category descriptions for each pagination of a category it seems like the only real choice. It also makes it where you do not really need to add rel=next and rel=previous to the paginated pages too.
-
Thanks, the info above is quite detailed.
We are not a shipping company those text are just to ensure visitors accordingly. The shipping info is quite long as we want to prompt as mush as we could to avoid customer leaving current page to search.
-
Hi Laura
I am not sure that you can use robots.txt to prevent a search engine bot from crawling a part of a page. Robots.txt is usually used to exclude a whole page.
The effect of the duplicate content on your search engine optimisation depends in part on how extensive is the duplication. In many cases it seems that Google won't penalise the duplicate content (it understands that some content will of necessity be duplicated) - see this video by Matt Cutts from Google.
Duplicate Content is Small (Short Paragraph)
From your question it sounds like you are talking about part of page and it sounds like a relatively small part - I assume you are not a shipping company so the shipping info would be a small part of the page.
In which case it may not affect your search engine optimisation at all (assuming you are not trying to rank for the shipping info).
As long as the content on the rest of the page is unique or different from other pages on the site.
Duplicate Content is Large (but not a page)
If the shipping info is substantial (say a couple of paragraphs or half the content on the page) then Google suggests you create a separate page with the substantial info on it and use a brief summary on other pages with a link to the separate page:
- Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
(from Google Webmaster: Duplicate Content)
Duplicated Pages
Much of the discussion about duplicated content is more about whole pages of duplicated content. The risk with these pages are that search engines may not know which to rank (or more to the point rank the one you don't want to rank). This is where you might use a rel=canonical tag or a 301 redirect to direct or hint to the search engine which page to use.
Moz has a good article on Duplicate Content.
All the best
Neil
-
Hiya,
First off the main answer is here - http://moz.com/learn/seo/robotstxt
an alternative solution might be use of the canonical tag meaning you're getting all the link juice rather than letting it fall off the radar. I wouldn't be overly worried about duplicate content its not a big bad wolf that will annihilate your website.
Best idea if you're worried about duplicate content is the canonical tag it has the benefit of keeping link juice where as the robots tends to mean you loose some link juice. One thing to remember those is the canonical tag means the pages will not be indexed (same as robots tag in the end) so if they are ranking (or getting page views) something to remember.
hope that helps.
Good luck.
-
Google smart enough to recognize what it is, it won't get you penalized for duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
Home page indexed but not ranking...interior pages with thin content outrank home page??
I have a Joomla site with a home page that I can't get to rank for anything beyond the company name @ Google - the site works fine @ Bing and Yahoo. The interior pages will rank all day long but the home page never shows up in the results. I have checked the page code out in every tool that I know about and have had no luck....by all account it should be good to go...any thoughts/comments/help would be greatly appreciated. The site is http://www.selectivedesigns.com Thanks! Greg
Technical SEO | | DougHosmer0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Does the use of sliders for text-on-page, effects SEO in any way?
The concept of using text sliders in an e-commerce site as a solution to placing SEO text above or in between product and high on ages, seems too good to be true.... or is it? How would a text slider for FAQ or other on-page text done with sliding paragraphs (similar but not this specific code- http://demo.tutorialzine.com/2010/08/dynamic-faq-jquery-yql-google-docs/faq.html) might effect text-on-page SEO. Does Google consider it hidden text? Would there be any other concerns or best practices with this design concept? faq.html
Technical SEO | | RKanfi0 -
Have a client that migrated their site; went live with noindex/nofollow and for last two SEOMoz crawls only getting one page crawled. In contrast, G.A. is crawling all pages. Just wait?
Client site is 15 + pages. New site had noindex/nofollow removed prior to last two crawls.
Technical SEO | | alankoen1230 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0 -
Too many on page links for WP blog page
Hello, I have set my WP blog to a page so new posts go to that page making it the blog. On a SEOmoz campaign crawl, it says there are too many links on one page, so does this mean that as I am posting my blog posts to this page, the search engines are seeing the page as one page with links instead of the blog posts? I worry that if I continue to add more posts (which obviously I want to) the links will increase more and more, meaning that they will be discounted due to too many links. What can I do to rectify this? Many thanks in advance
Technical SEO | | mozUser14692366292850