How to use robots.txt to block areas on page?
-
Hi,
Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo?
How can I alter robots.txt to tell google not to crawl those particular text
Thanks for any advice!
-
Thanks for the info above. I think I'll find out if I can cut the text and try to put popup link.
-
Hi Laura
I have not used lazy loading except with images, however I did some reading around and it might be a solution. There is a large section in Google Webmasters that talks about how to make AJAX readable by a crawler/bot so obviously it is not normally readable (Google Webmaster on AJAX crawling).
The other option is to provide a summary on the product page for shipping info and link to a larger shipping info page (as suggested earlier) and get it to open on a new page/tab. At least this keeps the product page open too.
(Note good UX practice recommends you tell the user they will open a new page if they click on the link - this could be as simple as using the anchor text: "More Detailed Shipping Information (opens new page)".
cheers
Neil
-
Here is a tip that I use for my clients and I would recommend. Most CMS / Ecommerce platforms allow for you to put a category description in the page. But, what they do is when the page paginates is they use the same category description and just different products on the page (some use a querystring on the url, others use a shebang, others use other things).
What I recommend to my clients to escape any thin content issues is to point the canonical url of all of the paginated pages back to the 1st category page. At the same time I will add a noindex, follow tag to the header of the paginated pages. This is counter to what a lot of people do I think, but the reason I do it is because of thin content. Also you don't want your page 3 results cannibalizing your main category landing page results. Since no CMS that I know of lets you specify different category descriptions for each pagination of a category it seems like the only real choice. It also makes it where you do not really need to add rel=next and rel=previous to the paginated pages too.
-
Thanks, the info above is quite detailed.
We are not a shipping company those text are just to ensure visitors accordingly. The shipping info is quite long as we want to prompt as mush as we could to avoid customer leaving current page to search.
-
Hi Laura
I am not sure that you can use robots.txt to prevent a search engine bot from crawling a part of a page. Robots.txt is usually used to exclude a whole page.
The effect of the duplicate content on your search engine optimisation depends in part on how extensive is the duplication. In many cases it seems that Google won't penalise the duplicate content (it understands that some content will of necessity be duplicated) - see this video by Matt Cutts from Google.
Duplicate Content is Small (Short Paragraph)
From your question it sounds like you are talking about part of page and it sounds like a relatively small part - I assume you are not a shipping company so the shipping info would be a small part of the page.
In which case it may not affect your search engine optimisation at all (assuming you are not trying to rank for the shipping info).
As long as the content on the rest of the page is unique or different from other pages on the site.
Duplicate Content is Large (but not a page)
If the shipping info is substantial (say a couple of paragraphs or half the content on the page) then Google suggests you create a separate page with the substantial info on it and use a brief summary on other pages with a link to the separate page:
- Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details. In addition, you can use the Parameter Handling tool to specify how you would like Google to treat URL parameters.
(from Google Webmaster: Duplicate Content)
Duplicated Pages
Much of the discussion about duplicated content is more about whole pages of duplicated content. The risk with these pages are that search engines may not know which to rank (or more to the point rank the one you don't want to rank). This is where you might use a rel=canonical tag or a 301 redirect to direct or hint to the search engine which page to use.
Moz has a good article on Duplicate Content.
All the best
Neil
-
Hiya,
First off the main answer is here - http://moz.com/learn/seo/robotstxt
an alternative solution might be use of the canonical tag meaning you're getting all the link juice rather than letting it fall off the radar. I wouldn't be overly worried about duplicate content its not a big bad wolf that will annihilate your website.
Best idea if you're worried about duplicate content is the canonical tag it has the benefit of keeping link juice where as the robots tends to mean you loose some link juice. One thing to remember those is the canonical tag means the pages will not be indexed (same as robots tag in the end) so if they are ranking (or getting page views) something to remember.
hope that helps.
Good luck.
-
Google smart enough to recognize what it is, it won't get you penalized for duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect?
If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect? If this scenario requires a 301 redirect no matter what, I might as well update the URL to be a little more keyword rich for the page while I'm at it. However, since these pages are ranking well I'd rather not lose any authority in the process and keep the URL just stripped of the ".html" (if that's possible). Thanks for you help! [edited for formatting]
Technical SEO | | Booj0 -
Does adding subcategory pages to an commerce site limit the link juice to the product pages?
I have a client who has an online outdoor gear company. He mostly sells high end outdoor gear (like ski jackets, vests, boots, etc) at a deep discount. His store currently only resides on Ebay. So we're building him an online store from scratch. I'm trying to determine the best site architecture and wonder if we should include subcategory pages. My issue is that I think the subcategory pages might be good from a user experience, but it'll add an additional layer between the homepage and the product pages. The problem is that I think a lot of user's might be searching for the product name to see if they can find a better deal, and my client's site would be perfect for them. So I really want to rank well for the product pages, but I'm nervous that the subcategory pages will limit the link juice of the product pages. Home --> SubCategory --> Product List --> Product Detail Home --> Men's Ski Clothing --> Men's Ski Jack --> North Face Mt Everest Jacket Should I keep the SubCategory page "Men's Ski Clothing" if it helps usability? On a separate note, the SubCategory pages would have some head keyword terms, but I don't think that he could rank well for these terms anytime soon. However, they would be great pages / terms to rank for in the long term. Should this influence the decision?
Technical SEO | | Santaur0 -
Robots.txt anomaly
Hi, I'm monitoring a site thats had a new design relaunch and new robots.txt added. Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14). In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example: Disallow: /wp-content/plugins
Technical SEO | | Dan-Lawrence
Disallow: /wp-content/cache
Disallow: /wp-content/themes etc, etc And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ? Thanks in advance for any help/advice/clarity why this may be happening ? Cheers Dan0 -
Duplicate Page Content Lists the same page twice?
When checking my crawl diagnostics this morning I see that I have the error Duplicate page content. It lists the exact same url twice though and I don't understand how to fix this. It's also listed under duplicate page title. Personal Assistant | Virtual Assistant | Charlotte, NC http://charlottepersonalassistant.com/110 Personal Assistant | Virtual Assistant | Charlotte, NC http://charlottepersonalassistant.com/110 Does this have anything to do with a 301 redirect here? Why does it have http;// twice? Thanks all! | http://www.charlottepersonalassistant.com/ | http://http://charlottepersonalassistant.com/ |
Technical SEO | | eidna220 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0 -
Micro formats to block HTML text portions of pages
I have a client that wants to use micro formatting to keep a portion of their page (the disclaimer) from being read by the search engines. They want to do this because it will help with their keyword density on the rest of the page and block the “bad keywords” that come from their legally required disclaimer. We have suggested alternate methods to resolve this problem, but they do not want to implement those, they just want a POV from us explaining how this micro formatting process will work. And that’s where the problem is. I’ve never heard of this use case and can’t seem to find anyone who has. I'm posting the question to the Moz Community to see if anyone knows how microformats can keep copy from being crawled by the bots. Please include any links to sites that you know that are using micro formatting in this way. Have you implemented it and seen results? Do you know of a website that is using it now? We're looking for use cases please!
Technical SEO | | Merkle-Impaqt0 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0