Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
-
Hi,
I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much.
Sincerely
SEOmoz Pro Member
-
Recently discovered this:
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server).Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
-http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
I would consider either excluding the PDFs from the index with your robots.txt in conjunction with resubmitting your sitemap (which you're all over), or placing a text link at the bottom of each PDF pointing back to the HTML version of that page (which, all things being equal, should cause the HTML version of the page to rank instead). I am not sure about serving 404 headers to Google instead of the PDFs that are currently in the index. Why not 301 to the HTML version of each PDF? Obviously that can't be a permanent solution, as you will eventually want to restore the functionality to users, right? But it will tell Googlebot that the content of each PDF is to be found from here on out at the URL containing the HTML version. This is a case where it would be handy to serve one thing to the bots and another to the human viewers, but I am afraid that doing so could get you into trouble.
I am interested in your case though—let us know what, if anything besides the 404s and sitemap resubmittal, you end up trying and what happens with it. I'm also curious to know what other mozzers suggest.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google.
Technical SEO | | ChophelDoes anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Indexing Issue of Dynamic Pages
Hi All, I have a query for which i am struggling to find out the answer. I unable to retrieve URL using "site:" query on Google SERP. However, when i enter the direct URL or with "info:" query then a snippet appears. I am not able to understand why google is not showing URL with "site:" query. Whether the page is indexed or not? Or it's soon going to be deindexed. Secondly, I would like to mention that this is a dynamic URL. The index file which we are using to generate this URL is not available to Google Bot. For instance, There are two different URL's. http://www.abc.com/browse/ --- It's a parent page.
Technical SEO | | SameerBhatia
http://www.abc.com/browse/?q=123 --- This is the URL, generated at run time using browse index file. Google unable to crawl index file of browse page as it is unable to run independently until some value will get passed in the parameter and is not indexed by Google. Earlier the dynamic URL's were indexed and was showing up in Google for "site:" query but now it is not showing up. Can anyone help me what is happening here? Please advise. Thanks0 -
How preproduction website is getting indexed in Google.
Hi team, Can anybody please help me to find how my preproduction website and urls are getting indexed in Google.
Technical SEO | | nlogix0 -
Fake Links indexing in google
Hello everyone, I have an interesting situation occurring here, and hoping maybe someone here has seen something of this nature or be able to offer some sort of advice. So, we recently installed a wordpress to a subdomain for our business and have been blogging through it. We added the google webmaster tools meta tag and I've noticed an increase in 404 links. I brought this up to or server admin, and he verified that there were a lot of ip's pinging our server looking for these links that don't exist. We've combed through our server files and nothing seems to be compromised. Today, we noticed that when you do site:ourdomain.com into google the subdomain with wordpress shows hundreds of these fake links, that when you visit them, return a 404 page. Just curious if anyone has seen anything like this, what it may be, how we can stop it, could it negatively impact us in anyway? Should we even worry about it? Here's the link to the google results. https://www.google.com/search?q=site%3Amshowells.com&oq=site%3A&aqs=chrome.0.69i59j69i57j69i58.1905j0j1&sourceid=chrome&es_sm=91&ie=UTF-8 (odd links show up on pages 2-3+)
Technical SEO | | mshowells0 -
Image Indexing Issue by Google
Hello All,My URL is: www.thesalebox.comI have Submitted my image Sitemap in google webmaster tool on 10th Oct 2013,Still google could not indexing any of my web images,Please refer my sitemap - www.thesalebox.com/AppliancesHomeEntertainment.xml and www.thesalebox.com/Hardware.xmland my webmaster status and image indexing status are below,
Technical SEO | | CommercePunditCan you please help me, why my images are not indexing in google yet? is there any issue? please give me suggestions?Thanks!
0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Does it hurt to have a dynamic counter in your page title?
Currently we work with page titles which display the number of products we have as a counter. This number is highly volatile and can change every day, so that our page title changes all the time. We did this to improve user experience, meet expectations and improve click through rates. Question is whether this can hurt our rankings and if someone has experimented with this or has experience with this?
Technical SEO | | ElmarReizen0