Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
-
Hi,
I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much.
Sincerely
SEOmoz Pro Member
-
Recently discovered this:
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server).Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
-http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
I would consider either excluding the PDFs from the index with your robots.txt in conjunction with resubmitting your sitemap (which you're all over), or placing a text link at the bottom of each PDF pointing back to the HTML version of that page (which, all things being equal, should cause the HTML version of the page to rank instead). I am not sure about serving 404 headers to Google instead of the PDFs that are currently in the index. Why not 301 to the HTML version of each PDF? Obviously that can't be a permanent solution, as you will eventually want to restore the functionality to users, right? But it will tell Googlebot that the content of each PDF is to be found from here on out at the URL containing the HTML version. This is a case where it would be handy to serve one thing to the bots and another to the human viewers, but I am afraid that doing so could get you into trouble.
I am interested in your case though—let us know what, if anything besides the 404s and sitemap resubmittal, you end up trying and what happens with it. I'm also curious to know what other mozzers suggest.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page missing from Google index
Hi all, One of our most important pages seems to be missing from the Google index. A number of our collections pages (e.g., http://perfectlinens.com/collections/size-king) are thin, so we've included a canonical reference in all of them to the main collection page (http://perfectlinens.com/collections/all). However, I don't see the main collection page in any Google search result. When I search using "info:http://perfectlinens.com/collections/all", the page displayed is our homepage. Why is this happening? The main collection page has a rel=canonical reference to itself (auto-generated by Shopify so I can't control that). Thanks! WUKeBVB
Technical SEO | | leo920 -
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
Keyword Targeting with Dynamic Pages
We have a large e-commerce website made with .net. so all of our category and item pages are made dynamic. Most things like title, some of the words and a few other things are done with scripts. I want to be able to target certain words and have more customized words on certain pages. Has anyone dealt with this? I know .net is pretty common so I can't be a unique case.
Technical SEO | | EcommerceSite0 -
Why is our page not visible in Google-ranking? www.loseweight.com.
using Wordpress as platform. Using the URL gets into the site,- but seems to be non-existent for public... No comments at all, seems to be "invisible"?
Technical SEO | | gewi0 -
Wrong page ranking even after fixes. Why?
Howdy, I've read a number of posts on this issue and followed all the advice I could find. My issue is this: several of our targeted keywords have the wrong page ranking which results in a loss of conversion and high bounce rate. I use the site:mydomain.com operator to see how the pages are ranking which I find really useful. One of our targeted keywords is 'wedding stationery' and the page which should rank is shop.confetti.co.uk/branch/wedding-stationery however this page ranks 3rd on our domain. I've done the following: Optimised the correct page De-optimised the ranking pages Added links from the incorrectly ranking page to the page I want to rank using keyword anchor text Added internal links to the page I want to rank from high Page Authority pages using keyword anchor text Here is a link to the SERP which shows the ranking for our domain for the 'wedding stationery' search term. I made the changes a couple of weeks ago. Any advice or explanations as to what is causing this would be greatly received. Thanks in advance, Brendan.
Technical SEO | | Confetti_Wedding0 -
Pages not Indexed after a successful Google Fetch
I am trying to understand why google isn't indexing key content on my site. www.BeyondTransition.com is indexed and new pages show up in a couple of hours. My key content is 6 pages of information for each of 3000 events (driven by mySQL on a wordpress platform). These pages are reached via a search page, but no direct navigation from the home page. When I link to an event page from an indexed page it doesn't show up in search results. When I use fetch on webmaster tools the fetch is successful but is then not indexed - or if it does appear in results it's directed to the internal search page e.g. http://www.beyondtransition.com/site/races/course/race110003/ has been fetched and submitted with links but when I search for BeyondTransition Ironman Cozumel I get these results.... So what have I done wrong and how do I go about fixing it? All thoughts and advice appreciated Thanks Denis
Technical SEO | | beyondtransition0 -
Getting more pages indexed by yahoo and bing
Anyone has a reliable way to get more pages indexed in yahoo and bing. Please dont say to get more inner page quality links.
Technical SEO | | mickey110 -
Are Google now indexing iFrames?
A client is pulling content through an iFrame, and when searching for a snippet of that exact content the page that is pulling the data is being indexed and not the iFrame page. Seen this before?
Technical SEO | | White.net0