Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to block "print" pages from indexing
-
I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search.
Can you recommend a way to block this from happening?
Example Article:
Example "Print" page:
http://www.knottyboy.com/lore/article.php?id=052&action=print
-
Donnie, I agree. However, we had the same problem on a website and here's what we did the canonical tag:
Over a period of 3-4 weeks, all those print pages disappeared from the SERP. Now if I take a print URL and do a cache: for that page, it shows me the web version of that page.
So yes, I agree the question was about blocking the pages from getting indexed. There's no real recipe here, it's about getting the right solution. Before canonical tag, robots.txt was the only solution. But now with canonical there (provided one has the time and resources available to implement it vs adding one line of text to robots.txt), you can technically 301 the pages and not have to stop/restrict the spiders from crawling them.
Absolutely no offence to your solution in any way. Both are indeed workable solutions. The best part is that your robots.txt solution takes 30 seconds to implement since you provided the actually disallow code :), so it's better.
-
Thanks Jennifer, will do! So much good information.
-
Sorry, but I have to jump in - do NOT use all of those signals simultaneously. You'll make a mess, and they'll interfere with each other. You can try Robots.txt or NOINDEX on the page level - my experience suggests NOINDEX is much more effective.
Also, do not nofollow the links yet - you'll block the crawl, and then the page-level cues (like NOINDEX) won't work. You can nofollow later. This is a common mistake and it will keep your fixes from working.
-
Josh, please read my and Dr. Pete's comments below. Don't nofollow the links, but do use the meta noindex,follow on the page.
-
Rel-canonical, in practice, does essentially de-index the non-canonical version. Technically, it's not a de-indexation method, but it works that way.
-
You are right Donnie. I've "good answered" you too.
I've gone ahead and updated my robots.txt file. As soon as I am able, I will use no indexon the page, no follow on the links, and rel=canonical.
This is just what I needed, a quick fix until I can make a more permanent solution.
-
Your welcome : )
-
Although you are correct... there is still more then one way to skin a chicken.
-
But the spiders still run on the page and read the canonical link, however with the robot text the spiders will not.
-
Yes, but Rel=Canonical does not block a page it only tells google which page to follow out of two pages.The question was how to block, not how to tell google which link to follow. I believe you gave credit to the wrong answer.
http://en.wikipedia.org/wiki/Canonical_link_element
This is not fair. lol
-
I have to agree with Jen - Robots.txt isn't great for getting indexed pages out. It's good for prevention, but tends to be unreliable as a cure. META NOINDEX is probably more reliable.
One trick - DON'T nofollow the print links, at least not yet. You need Google to crawl and read the NOINDEX tags. Once the ?print pages are de-indexed, you could nofollow the links, too.
-
Yes, it's strongly recommended. It should be fairly simple to populate this tag with the "full" URL of the article based on the article ID. This approach will not only help you get rid of the duplicate content issue, but a canonical tag essentially works like a 301 redirect. So from all search engine perspective you are 301'ing your print pages to the real web urls without redirecting the actual user's who are browsing the print pages if they need to.
-
Ya it is actually really useful. Unfortunately they are out of business now - so I'm hacking it on my own.
I will take your advice. I've shamefully never used rel= canonical before - so now is a good time to start.
-
True but using robots.txt does not keep them out of the index. Only using "noindex" will do that.
-
Thanks Donnie. Much appreciated!
-
I actually remember Lore from a while ago. It's an interesting, easy to use FAQ CMS.
Anyways, I would also recommend implementing Canonical Tags for any possible duplicate content issues. So whether it's the print or the web version, each one of them will contain a canonical tag pointing to the web url of that article in the section of your website.
rel="canonical" href="http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html" /> -
-
Try This.
User-agent: *
Disallow: /*&action=print
-
Theres more then one way to skin a chicken.
-
Rather than using robots.txt I'd use a noindex,follow tag instead to the page. This code goes into the tag for each print page. And it will ensure that the pages don't get indexed but that the links are followed.
-
That would be great. Do you mind giving me an example?
-
you can block in .robot text, every page that ends in action=print
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to move some pages of my website to a folder and nav menu in those pages should only show inner page links, will it hurt SEO?
Hi, My website has a few SaaS products, to make my website simple i want to move my website some pages to its specific folder structure , so eg website.com/product1/features
Technical SEO | | webbeemoz
website.com/product1/pricing
website.com/product1/information and same for product2 and so on, the website.com/product1/.. menu will only show the links of product1 and only one link to homepage (possibly in footer). Please share your opinion will it be a good idea, from UI perspective it will be simple , but i am not sure about SEO perspective, please help thanks1 -
How to index e-commerce marketplace product pages
Hello! We are an online marketplace that submitted our sitemap through Google Search Console 2 weeks ago. Although the sitemap has been submitted successfully, out of ~10000 links (we have ~10000 product pages), we only have 25 that have been indexed. I've attached images of the reasons given for not indexing the platform. gsc-dashboard-1 gsc-dashboard-2 How would we go about fixing this?
Technical SEO | | fbcosta0 -
Discrepancy in actual indexed pages vs search console
Hi support, I checked my search console. It said that 8344 pages from www.printcious.com/au/sitemap.xml are indexed by google. however, if i search for site:www.printcious.com/au it only returned me 79 results. See http://imgur.com/a/FUOY2 https://www.google.com/search?num=100&safe=off&biw=1366&bih=638&q=site%3Awww.printcious.com%2Fau&oq=site%3Awww.printcious.com%2Fau&gs_l=serp.3...109843.110225.0.110430.4.4.0.0.0.0.102.275.1j2.3.0....0...1c.1.64.serp..1.0.0.htlbSGrS8p8 Could you please advise why there is discrepancy? Thanks.
Technical SEO | | Printcious0 -
Home Page Ranking Instead of Service Pages
Hi everyone! I've noticed that many of our clients have pages addressing specific queries related to specific services on their websites, but that the Home Page is increasingly showing as the "ranking" page. For example, a plastic surgeon we work with has a page specifically talking about his breast augmentation procedure for Miami, FL but instead of THAT page showing in the search results, Google is using his home page. Noticing this across the board. Any insights? Should we still be optimizing these specific service pages? Should I be spending time trying to make sure Google ranks the page specifically addressing that query because it SHOULD perform better? Thanks for the help. Confused SEO :/, Ricky Shockley
Technical SEO | | RickyShockley0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
New "Static" Site with 302s
Hey all, Came across a bit of an interesting challenge recently, one that I was hoping some of you might have had experience with! We're currently in the process of a website rebuild, for which I'm really excited. The new site is using Markdown to create an entirely static site. Load-times are fantastic, and the code is clean. Life is good, apart from the 302s. One of the weird quirks I've realized is that with oldschool, non-server-generated page content is that every page of the site is an Index.html file in a directory. The resulting in a www.website.com/page-title will 302 to www.website.com/page-title/. My solution off the bat has been to just be super diligent and try to stay on top of the link profile and send lots of helpful emails to the staff reminding them about how to build links, but I know that even the best laid plans often fail. Has anyone had a similar challenge with a static site and found a way to overcome it?
Technical SEO | | danny.wood1 -
Should I put meta descriptions on pages that are not indexed?
I have multiple pages that I do not want to be indexed (and they are currently not indexed, so that's great). They don't have meta descriptions on them and I'm wondering if it's worth my time to go in and insert them, since they should hypothetically never be shown. Does anyone have any experience with this? Thanks! The reason this is a question is because one member of our team was linking to this page through Facebook to send people to it and noticed random text on the page being pulled in as the description.
Technical SEO | | Viewpoints0 -
Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
I am doing a bit of SEO work for a friend, and the situation is the following: The site is a place to discuss articles on the web. When clicking on a link that has been posted, it sends the user to a URL on the main site that is URL.com/article/view. This page has a large iframe that contains the article itself, and a small bar at the top containing the article with various links to get back to the original site. I'd like to make sure that the comment pages (URL.com/article) are indexed instead of all of the URL.com/article/view pages, which won't really do much for SEO. However, all of these pages are indexed. What would be the best approach to make sure the iframe pages aren't indexed? My intuition is to just have a "noindex" in the header of those pages, and just make sure that the conversation pages themselves are properly linked throughout the site, so that they get indexed properly. Does this seem right? Thanks for the help...
Technical SEO | | jim_shook0