Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
-
Hi,
I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much.
Sincerely
SEOmoz Pro Member
-
Recently discovered this:
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server).Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
-http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
I would consider either excluding the PDFs from the index with your robots.txt in conjunction with resubmitting your sitemap (which you're all over), or placing a text link at the bottom of each PDF pointing back to the HTML version of that page (which, all things being equal, should cause the HTML version of the page to rank instead). I am not sure about serving 404 headers to Google instead of the PDFs that are currently in the index. Why not 301 to the HTML version of each PDF? Obviously that can't be a permanent solution, as you will eventually want to restore the functionality to users, right? But it will tell Googlebot that the content of each PDF is to be found from here on out at the URL containing the HTML version. This is a case where it would be handy to serve one thing to the bots and another to the human viewers, but I am afraid that doing so could get you into trouble.
I am interested in your case though—let us know what, if anything besides the 404s and sitemap resubmittal, you end up trying and what happens with it. I'm also curious to know what other mozzers suggest.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google.
Technical SEO | Jan 11, 2021, 4:43 PM | ChophelDoes anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Does Page speed matter for google ranking?
We are not sure that page does matter or not for google ranking as I am working for this keyword "flower delivery in Bangalore" from last few months and I saw some website's google first page who have low page speed but still ranking so I am really worried about my page that has also low page speed. will my Bangalore page rank on google the first page if the speed is low and kindly suggest me more tips for the ranking best factors which really works in 2020 and one more thing that domain authority really matters in this year? as I also saw some websites with low domain authority and ranking on google's first page. Home page: Flowerportal Bangalore page: https://flowerportal.in/flower-delivery/bangalore/ focus Keyword is: Flower delivery in Bangalore, send flowers to Bangalore
Technical SEO | Oct 13, 2021, 4:38 AM | vidi34231 -
Should search pages be indexed?
Hey guys, I've always believed that search pages should be no-indexed but now I'm wondering if there is an argument to index them? Appreciate any thoughts!
Technical SEO | Apr 16, 2019, 7:57 AM | RebekahVP0 -
Does Google read dynamic canonical tags?
Does Google recognize rel=canonical tag if loaded dynamically via javascript? Here's what we're using to load: <script> //Inject canonical link into page head if (window.location.href.indexOf("/subdirname1") != -1) { canonicalLink = window.location.href.replace("/kapiolani", ""); } if (window.location.href.indexOf("/subdirname2") != -1) { canonicalLink = window.location.href.replace("/straub", ""); } if (window.location.href.indexOf("/subdirname3") != -1) { canonicalLink = window.location.href.replace("/pali-momi", ""); } if (window.location.href.indexOf("/subdirname4") != -1) { canonicalLink = window.location.href.replace("/wilcox", ""); } if (canonicalLink != window.location.href) { var link = document.createElement('link'); link.rel = 'canonical'; link.href = canonicalLink; document.head.appendChild(link); } script>
Technical SEO | Aug 15, 2017, 3:53 PM | SoulSurfer80 -
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | Aug 25, 2015, 6:45 PM | linklander0 -
Wordpress versus html and google ranking
My current SEO has always recommended that I take my site to wordpress. I really don't want to move to wordpress. I don't like it... I just like writing code in raw html, css, and script. I feel like I have more control that way. Wordpress just seems like a platform for blogs (I have my blog in wordpress). My question is, do wordpress websites typically rank better? Is there benefit to moving to it?
Technical SEO | May 31, 2014, 10:50 AM | CalicoKitty20000 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | Jun 11, 2013, 4:56 PM | rhoadesjohn0 -
How to tell if PDF content is being indexed?
I've searched extensively for this, but could not find a definitive answer. We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines. When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed? I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct? Since WordPress has you upload PDFs like they are an image could this be causing the problem? Would it make sense to take the time and extract all of the PDF content to html? Thanks for any assistance, this has been driving me crazy.
Technical SEO | Dec 14, 2011, 8:06 PM | zazo0