PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to fix Duplicate Content Warnings on Pagination? Indexed Pagination?
Hi all! So we have a Wordpress blog that properly has pagination tags of rel="prev" and rel="next" set up for pages, but we're still getting crawl errors with MOZ for duplicate content on all of our pagination pages. Also, we are having all of our pages indexed as well. I'm talking pages as deep as page 89 for the home page. Is this something I should ignore? Is it hurting my SEO potentially? If so, how can I start tackling it for a fix? Would "noindex" or "nofollow" be a good idea? Any help would be greatly appreciated!
Intermediate & Advanced SEO | | jampaper0 -
Duplicate content across different domains in different countries?
Hi Guys, We have a 4 sites One in NZ, UK, Canada and Australia. All geo-targeting their respective countries in Google Search Console. The sites are identical. We recently added the same content to all 4 sites. Will this cause duplicate content issues or any issues even though they are in different countries and geo-targeting is set? Cheers.
Intermediate & Advanced SEO | | wickstar0 -
How to Set Up Canonical Tags to Eliminate Duplicate Content Error
Google Webmaster Tools under HTML improvements is showing duplicate meta descriptions for 2 similar pages. The 2 pages are for building address. The URL has several pages because there are multiple property listings for this building. The URLs in question are: www.metro-manhattan.com/601-west-26th-street-starrett-lehigh-building-contains-executive-office-space-manhattan/page/3 www.metro-manhattan.com/601-west-26th-street-starrett-lehigh-building-contains-executive-office-space-manhattan How do I correct this error using canonical tags? Do I enter the URL of the 1<sup>st</sup> page under “Canonical URL” under “Advanced” to show Google that these pages are one and the same? If so, do I enter the entire URL into this field (www.metro-manhattan.com /601-west-26th-street-starrett-lehigh-building-contains-executive-office-space-manhattan) or an abbreviated version (/601-west-26th-street-starrett-lehigh-building-contains-executive-office-space-manhattan)? Please see attached images. Thanks!! Alan rUspIzk 34aSQ7k
Intermediate & Advanced SEO | | Kingalan10 -
Site duplication issue....
Hi All, I have a client who has duplicated an entire section of their site onto another domain about 1 year ago. The new domain was ranking well but was hit heavily back in March by Panda. I have to say the set up isn't great and the solution I'm proposing isn't ideal, however, as an agency we have only been tasked with "performing SEO" on the new domain. Here is an illustration of the problem: http://i.imgur.com/Mfh8SLN.jpg My solution to the issue is to 301 redirect the duplicated area of the original site out (around 150 pages) to the new domain name, but I'm worried that this could be could cause a problem as I know you have to be careful with redirecting internal pages to external when it comes to SEO. The other issue I have is that the client would like to retain the menu structure on the main site, but I do not want to be putting an external link in the main navigation so my proposed solution is as follows: Implement 301 redirects for URLs from original domain to new domain Remove link out to this section from the main navigation of original site and add a boiler plate link in another area of the template for "Visit xxx for our xxx products" kind of link to the other site. Illustration of this can be found here: http://i.imgur.com/CY0ZfHS.jpg I'm sure the best solution would be to redirect in URLs from the new domain into the original site and keep all sections within the one domain and optimise the one site. My hands are somewhat tied on this one but I just wanted clarification or advice on the solution I've proposed, and that it wont dramatically affect the standing of the current sites.
Intermediate & Advanced SEO | | MiroAsh0 -
Amazing decrease of visits in a Good Content Site
Dear Sirs, contributors and aspirants of Seomoz: I have a site called General History (http://general-history.com/) that was created in 2010, and has a current PR of 3, a DA of 23 and a home page authority of 32. It also has 1.690 links, knowing that we have not invested on link building, all the links were built manually via post inserting or viral via social shares. The thing is that in only 5 months, it passed from receiving 14.000 visits/per month to only 1.500. Is that a decrease of 700% in 5 months? I must admint that I earn my life offering SEO to companies, but this is one of my own sites, a site in which my 73 year old father likes to write about General History. I really think, given that he used to be a journalist, that the content not only isn't spam but it is high quality content. As I had Analytics, I started searching for the cause. The first question was... 1.- From what source did I loose the most amount of visitors? Organic, Paid or Social. The answer is organic by far. As I discovered it was an organic loss, I tried to find what content used to have the most visitors. I found 3 posts that brought 80% of the total traffic. How did the people find the content? Well, some of them found the site in the first page of google when searching for "Holocaust facts and figures" for example, but Analytics says that the most people came from image search in Google Images. General history disappeared from the SERPs but progressively, not from one day to another. So then I thought, It can't be a penalization. I contacted google and send them a reconsideration. 5 days later they answered saying that general-history.com is not a spammy site and thus it has not been penalized. For the ones who can read Spanish, here is Google answer: "Estimado webmaster o propietario del sitio http://general-history.com/: Hemos recibido una solicitud del propietario de un sitio para que volvamos a comprobar si http://general-history.com/ cumple las directrices para webmasters de Google. Hemos revisado tu sitio y no hemos detectado acciones manuales del equipo de webspam que puedan perjudicar la clasificación del mismo en Google. No es necesario que presentes una solicitud de reconsideración para el mismo, ya que las incidencias relacionadas con la clasificación que puedan producirse no se derivan de acciones manuales realizadas por el equipo de webspam. Existen otras incidencias relacionadas con tu sitio que pueden perjudicar la clasificación del mismo. Los ordenadores de Google determinan el orden de los resultados de búsqueda a través de una serie de fórmulas denominadas algoritmos. Cada año, se realizan cientos de cambios en los algoritmos de búsqueda, y se utilizan más de 200 señales diferentes para clasificar páginas. A medida que cambian los algoritmos y la Web (incluido tu sitio), se pueden producir fluctuaciones en la clasificación, ya que se actualiza para ofrecer a los usuarios los resultados más relevantes. Si has detectado un cambio en la clasificación y consideras que no se debe simplemente a un cambio de algoritmos, te recomendamos que investigues otras posibles causas, como un cambio importante en el contenido del sitio, en el sistema de gestión de contenido o en la arquitectura del servidor. Por ejemplo, es posible que un sitio no obtenga una buena posición en los resultados de búsqueda si el servidor deja de proporcionar páginas a Googlebot o si el usuario cambia las URL de una gran parte de las páginas del sitio. En este artículo se incluye una lista de otros posibles motivos por los que tu sitio no obtiene una buena clasificación en los resultados de búsqueda. Si sigues sin poder solucionar la incidencia, accede al foro de ayuda para webmasters para obtener asistencia. Atentamente, Equipo de Calidad de búsqueda de Google" They say interesting things like it might be other problems that caused my position decrease like: Site content change, content management, server architecture or change or urls. After receiving this, I thought I should get in the admin panel in wordpress and search for bugs, html or css, php errors and I found that somebody had hijacked my site, entering the wordpress panel and adding a code of into one of my landing pages. That page does not exist anymore. I erased completely. The span code was as follows:
Intermediate & Advanced SEO | | Tintanus
General History | General-History General History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-HistoryGeneral History | General-History I thought that would be the problem ! But it was NOT, because Google did not penalize me as you can see in the letter they sent me. I erased the complete page in which the span appeared, I updated my sitemap, re-check my robots.txt, searched my folders via FTP and mucho more... Conclusion? I have no idea why I General-History has lost 700% of its traffic in 5 months.0 -
Duplicate Content Warning For Pages That Do Not Exist
Hi Guys I am hoping someone can help me out here. I have had a new site built with a unique theme and using wordpress as the CMS. Everything was going fine but after checking webmaster tools today I noticed something that I just cannot get my head around. Basically I am getting warnings of Duplicate page warnings on a couple of things. 1 of which i think i can understand but do not know how to get the warning to go. Firstly I get this warning of duplicate meta desciption url 1: / url 2: /about/who-we-are I understand this as the who-we-are page is set as the homepage through the wordpress reading settings. But is there a way to make the dup meta description warning disappear The second one I am getting is the following: /services/57/ /services/ Both urls lead to the same place although I have never created the services/57/ page the services/57/ page does not show on the xml sitemap but Google obviously see it because it is a warning in webmaster tools. If I press edit on services/57/ page it just goes to edit the /services/ page/ is there a way I can remove the /57/ page safely or a method to ensure Google at least does not see this. Probably a silly question but I cannot find a real comprehensive answer to sorting this. Thanks in advance
Intermediate & Advanced SEO | | southcoasthost0 -
Fixing Duplicate Content Errors
SEOMOZ Pro is showing some duplicate content errors and wondered the best way to fix them other than re-writing the content. Should I just remove the pages found or should I set up permanent re-directs through to the home page in case there is any link value or visitors on these duplicate pages? Thanks.
Intermediate & Advanced SEO | | benners0 -
Duplicate Content Through Sorting
I have a website that sells images. When you search you're given a page like this: http://www.andertoons.com/search-cartoons/santa/ I also give users the option to resort results by date, views and rating like this: http://www.andertoons.com/search-cartoons/santa/byrating/ I've seen in SEOmoz that Google might see these as duplicate content, but it's a feature I think is useful. How should I address this?
Intermediate & Advanced SEO | | andertoons0