Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
-
Hi,
I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much.
Sincerely
SEOmoz Pro Member
-
Recently discovered this:
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server).Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
-http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
I would consider either excluding the PDFs from the index with your robots.txt in conjunction with resubmitting your sitemap (which you're all over), or placing a text link at the bottom of each PDF pointing back to the HTML version of that page (which, all things being equal, should cause the HTML version of the page to rank instead). I am not sure about serving 404 headers to Google instead of the PDFs that are currently in the index. Why not 301 to the HTML version of each PDF? Obviously that can't be a permanent solution, as you will eventually want to restore the functionality to users, right? But it will tell Googlebot that the content of each PDF is to be found from here on out at the URL containing the HTML version. This is a case where it would be handy to serve one thing to the bots and another to the human viewers, but I am afraid that doing so could get you into trouble.
I am interested in your case though—let us know what, if anything besides the 404s and sitemap resubmittal, you end up trying and what happens with it. I'm also curious to know what other mozzers suggest.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
How long does it take for Google for deindexing pages?
Hi mozzers, We just launched a mobile website(parallel) and realized that it created many duplicate content with desktop URLs. I decided to add name="robots" content="No index, No follow" /> to the entire mobile site. My only concern is that I am still seeing the mobile site indexed when it's been almost a week I added these tags. Does anyone know how long it takes google to deindex your content? Thanks
Technical SEO | | Ideas-Money-Art0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Getting Pages Indexed That Are Not In The Main Navigation
Hi All, Hoping you can help me out with a couple of questions I have. I am looking to create SEO friendly landing pages optimized for long tail keywords to increase site traffic and conversions. These pages will not live on the main navigation. I am wondering what the best way to get these pages indexed is? Internal text linking, adding to the sitemap? What have you done in this situation? I know that these pages cannot be orphaned pages and they need to be linked to somewhere. Looking for some tips to do this properly and to ensure that they can become indexed. Thanks! Pat
Technical SEO | | PatBausemer0 -
How to remove a sub domain from Google Index!
Hello, I have a website having many subdomains having same copy of content i think its harming my SEO for that site since abc and xyz sub domains do have same contents. Thus i require to know i have already deleted required subdomain DNS RECORDS now how to have those pages removed from Google index as well ? The DNS Records no more exists for those subdomains already.
Technical SEO | | anand20100 -
Best Dynamic Sitemap Generator
Hello Mozers, Could you please share the best Dynamic Sitemap Generator you are using. I have found this place: http://www.seotools.kreationstudio.com/xml-sitemap-generator/free_dynamic_xml_sitemap_generator.php Thanks in advanced for your help.
Technical SEO | | SEOPractices0