Using Meta Header vs Robots.txt
-
Hey Mozzers,
I am working on a site that has search-friendly parameters for their faceted navigation, however this makes it difficult to identify the parameters in a robots.txt file. I know that using the robots.txt file is highly recommended and powerful, but I am not sure how to do this when facets are using common words such as sizes.
For example, a filtered url may look like www.website.com/category/brand/small.html Brand and size are both facets. Brand is a great filter, and size is very relevant for shoppers, but many products include "small" in the url, so it is tough to isolate that filter in the robots.txt. (I hope that makes sense).
I am able to identify problematic pages and edit the Meta Head so I can add on any page that is causing these duplicate issues. My question is, is this a good idea? I want bots to crawl the facets, but indexing all of the facets causes duplicate issues.
Thoughts?
-
"there is no penalty for have duplicates of your own content"
Alan,
I must respectfully disagree with this statement. Perhaps google will not penalize you directly, but it is easy to self-canabalize key terms if one has many facets that only differ slightly. I have seen this on a site where they don't rank on the first page, but they have 3-4 pages on the second page or SERPs. This is the exact issue that I am trying to resolve.
Evan
ps. sorry I hit the wrong button, but you got a good answer out of it
-
Hey Craig,
I agree with you regarding the robots.txt, however, how does one isolate parameters that are commonly used within product names, thus being the the product url as well. We are using a plugin the makes the urls more user friendly, but it makes it tough to isolate "small" or "blue" because the parameters don't include a "?sort=" or "color=" prefix anymore.
This is why I am considering using the meta header in order to control help with the issues of the duplicate content and crawl allowance?
Since I can edit the meta headers on a variety of pages, is it a viable option to use NOINDEX,FOLLOW?
-
As mentioned initially, the CMS doesn't allow me to edit canonicals for individual pages that are created via facets. The other question I had about canonicals is that a rel canonical is meant to help bots understand that different variations of the same page are, in fact, the same page: example.com = example.com/. But, for the user (which ultimately bots care about), example.com/sony/50 may not always be the same as example.com/sony.
Anyways, that is a little beside the point. I have visited the options of canonicals, but I am not sure it can be done.
-
This sounds like a job for a canonical tag.
-
Hey Craig,
Thanks for your response. This is the common answer that I have found. Here is the challenge I am having (I will use your example above):
Let's say that example.com/tv/sony is the main category page for this brand, but I only carry a few Sony tvs. Therefore, the only difference between that page and this page: example.com/tv/sony/50 is a category description that disappears when further facets are chosen.
When I search in the SERPS for "Sony TVs", rather than ranking well for one of these pages, both rank moderately well, but not well enough for first page results, and I would think this is confusing to customers as well to find two very closely related pages side by side.
So, while I agree that robots.txt is a tool that I can apply for limiting search engines from getting dizzy with the facets by limiting them to (say) 4, is NOINDEX the best solution for controlling duplicate content issues that are not that deep, and more case-by-case?
One more thing I might add is that these issues don't happen site-wide. If I carry many products from Samsung, than example.com/tv/samsung and example.com/tv/samsung/50 and even example.com/tv/samsung/50/HD will produce very different results. The issue usually occurs where there are few products for a brand, and they filter the same way with many facets.
Does that make sense? I agree with you whole heartedly, I am just trying to figure out how to deal with the shallow duplicate issues.
Cheers,
-
they will be linked to by internal links,
There is no penalty for have duplicates of your own content, but having links pouring away link juice is a self imposed penalty.
-
Hi Alan, I understand that, but the problem Evan is describing seems to be related to duplicate content and crawl allowance. There's no perfect answer but in my experience the types of pages that Evan is describing aren't often linked to. Taking that into consideration, IMO robots.txt is the correct solution.
-
The problem with robots text is that any link pointing to a no-indexed page is passing link juice that will never be returned, it is wasted. robots.txt is the last resort, IMO its should never be used.
-
Hi Even, this is quite a common problem. There are a couple of things to consider when deciding if Noindex is the solution rather than robots.txt.
Unless there is a reason the pages need to be crawled (like there are pages on the site that are only linked to from those pages) I would use robots.txt. Noindex doesn't stop search engines crawling those pages, only from putting them in the index. So in theory, search engines could spend all there time crawling pages that you don't want to be in the index.
Here's what I'd do:
Decide on a reasonable number of facets, for example, if you're selling TVs people might search for:
- Sony TV (Brand search)
- 50 inch sony tv (size + brand)
- Sony 50 inch HD TV (brand + size + specification)
But past 3 facets tends to get very little search volume (do keyword research for your own market)
In this case I'd create a rule that appends something to the URL after 3 facets hat would make it easy to block in robots.txt. For example I might make my structure:
But as soon as I add a 4th facet, for example 'colour'- I add in the filter subfolder
- example.com**/filter/**tv/sony/50/HD/white
I can then easily block all these pages in robots.txt using:
Disallow: /filter/
I hope this helps.
-
It is a problem in the SERPS because if I run a query for the brand, I can see faceted variations of that brand (say "brand" "blue") is ranking right below, but neither of them are ranking on the first page. I won't NOINDEX all pages, just those that don't provide value for customers searching, and those that are competing with competitive terms that are causing the preferred page to rank lower.
It was brought to my attention through Moz analytics, and once I began to investigate it further, I found many sources mentioning that this is very common for e-commerce. Common practice is robots.txt and a plugin, but we are using a different plugin. So, for this reason, I am trying to figure out if NOINDEX meta headers are a good option.
Does that make sense?
-
I'm not sure you have a problem, why not let them all get indexed?
-
Hey Alan,
Again, I thank you for your feedback. Unfortunately rel prev/next are not relevant in this circumstance. Also, it is all unique content on my client's own site, and I know that it is a duplicate content problem because I have 2 similar pages with slightly different facets ranking 14 and 15 in SERPS. If search engines were to choose one over the other, they would not rank them back to back.
For clarification, this is an e-commerce application with faceted navigation. Not a pagination issue.
Thanks for your input.
-
I would look at canonical and rel previous next,
Also I would establish do you have a problem?
duplicate content is not always a problem, if it is duplicate content on your own site then there is not a lot to worry about, google will rank just one page. There is no penalty for DC itself, if you are screen scaping then you may have a problem,
-
Hey Alan,
Thanks for your feedback. I guess I am not sure what "other solutions there are for this circumstance. The CMS does allow me to use rel=canonicals for individual pages with facets, I definitely don't think 301s are the way to go. I figured the NOINDEX, FOLLOW is best because it blocks bots from confusing duplicate content, but can still take advantage of some link juice. Mind you, these are faceted pages, not top level pages.
Thoughts?
-
robotx.txt is a bad way to do things, because any links pointing to a noindexed page wastes its link juice. using noindex,follow is a better way as it allows the links to be followed and link juice to return to your indexed pages.
but best not to noindex at all, and find another solution if posible
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I use NoIndex on short-lived pages?
Hello, I have a large number of product pages on my site that are relatively short-lived: probably in the region of a million+ pages that are created and then removed within a 24 hour period. Previously these pages were being indexed by Google and did receive landings, but in recent times I've been applying a NoIndex tag to them. I've been doing that as a way of managing our crawl budget but also because the 410 pages that we serve when one of these product pages is gone are quite weak and deliver a relatively poor user experience. We're working to address the quality of those 410 pages but my question is should I be no-indexing these product pages in the first place? Any thoughts or comments would be welcome. Thanks.
Intermediate & Advanced SEO | | PhilipHGray0 -
Tool to identify if meta description are showing?
Hi we have a Ecommerce client with 1000s of meta descriptions, we have noticed that some meta descriptions are not showing properly, we want to pull and see which ones are showing on Google SERP results. You can use tools like screaming frog to pull meta description from page, but we want to see if it's showing for certain keywords. Any ideas on how to automate this? Cheers.
Intermediate & Advanced SEO | | brianna00 -
Structured Data + Meta Descriptions
Hey All, Was just looking through some google pages on best practices for meta descriptions and came across this little tidbit. "Include clearly tagged facts in the description. The meta description doesn't just have to be in sentence format; it's also a great place to include structured data about the page. For example, news or blog postings can list the author, date of publication, or byline information. This can give potential visitors very relevant information that might not be displayed in the snippet otherwise. Similarly, product pages might have the key bits of information—price, age, manufacturer—scattered throughout a page. A good meta description can bring all this data together. For example, the following meta description provides detailed information about a book. " This is the first time I have seen suggested use of structured data in meta descriptions. Does this totally replace a regular meta description or will it work in conjunction with the regular meta description? If I provide both structured data and text, will the SERP display text and the structured data the way it was previously displayed? Or will the 150 -160 character limit take precedence and just cut off all info after that?
Intermediate & Advanced SEO | | Whebb0 -
HTTPS pages - To meta no-index or not to meta no-index?
I am working on a client's site at the moment and I noticed that both HTTP and HTTPS versions of certain pages are indexed by Google and both show in the SERPS when you search for the content of these pages. I just wanted to get various opinions on whether HTTPS pages should have a meta no-index tag through an htaccess rule or whether they should be left as is.
Intermediate & Advanced SEO | | Jamie.Stevens0 -
Appropriate use of rel canonical
Hey Guys,I'm a bit stuck. My on-page grade indicated the following two issues and I need to find how how to fix both issues.If you have a solution, could you please let me know how to address these issues? It's all a bit intimidating at the moment!!Thank you so much..****************************************************************************************************************************************Appropriate Use of Rel Canonical If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL. Recommendation: We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. No More Than One Canonical URL Tag The canonical URL tag is meant to be employed only a single time on an individual URL (much like the title element or meta description). To ensure the search engines properly parse the canonical source, employ only a single version of this tag. Recommendation: Remove all but a single canonical URL tag
Intermediate & Advanced SEO | | StoryScout1 -
Should I use selected Keywords in Meta Title of non important pages
Hi All, I have identified 2 main keywords that I want a website to be found for 1: Alarm Systems 2: Security Systems I have two relevant landing pages set up and optimised for these terms and I have also optimised the home page for these terms I have countless other pages on the website that I don't really need to optimise such as Distributor Benefits or Supplier Benefits, About Us etc My question is should I use my selected keywords (alarm systems, security systems) in the Meta Title on these non important pages or should I just use them on the selected landing pages and home page? Historically I have used my primary keywords on all non important pages but not sure if Google looks down on this now. Thanks Robbie
Intermediate & Advanced SEO | | daracreative0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0 -
Do any of you regularly use expired domains?
I know there has been discussion on using expired domains in the past. This is not so much a question as to how to do it or whether it works, but rather I would love to see how many of you use this in your backlink strategy. I have a domain in a low to moderately competitive niche that ranks really well, mostly on the power of a couple of expired domains. I bought the domains, created a quick wordpress site and pointed some anchor texted links to the site. It took some time for the expired domains to regain their PR, but when they did, the benefit was great. I'm considering whether I want to do this with another domain of mine. On one hand, it's a relatively inexpensive way to get some good quality anchor texted links. But, on the other hand, something in it feels "immoral" or "sneaky" to me. What do you think?
Intermediate & Advanced SEO | | MarieHaynes0