Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.

Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.

Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://moz.com/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing bad URLS
Hi All, The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/ I have done the following to prevent these URLs from being created & indexed: 1. Added a directive in my Htaccess to 404 all of these URLs 2. Blocked /wp-content/uploads/revslider/ in my robots.txt 3. Manually de-inedex each URL using the GSC tool 4. Deleted the plugin However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I Thanks!
Technical SEO | | Tom3_150 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Question on noscript tags and indexing
If I have a <noscript>tag on every page of my website with the same sentence over and over saying something to the effect of "Sorry our site uses Javascript, please enable javascript for the full site experience.", Webmaster Tools will tell me that one of the most common words on my site is "Javascript".</p> <p>Is this something to be concerned about from an SEO perspective? My site is obviously not about Javascript and I don't want to dilute my page's topic or authority by repeating words that are not relevant to the topic of my site.</p> <p>Thanks!</p></noscript>
Technical SEO | | IrvCo_Interactive0 -
301 Redirect with index.asp
I am very new to all of this so forgive the newbie questions I will get better.  Ok so after starting a campaign I see that I have many issues including where some pages are being deemed as duplicate content. 1. The report says the http://lucid8.com has duplicate content on 2 other pages 2. When I look at them it shows that http://lucid8.com/index.asp and http://www.lucid8.com are duplicates. 3. Really these are the exactly the same page because the default page that is opened for www.lucid8.com http://www.lucid8.com etc always opens the index.asp page. 4. Now I read that I should do permanent redirects and how to do this via IIS and I tried to do a redirect from index.asp to www.lucid8.com but that does not work because www.lucid8.com is pointing to index.asp and so we end up in a circle. So the question is how do I get rid of these duplicate page references without causing problems. Thanks
Technical SEO | | TroyW0 -
What to do with 302 redirects being indexed
Hi there, Our site's forums include permalinks that for some reason uses an intermediary URL that 302 redirects to the URL with the permalink anchor. For example: http://en.tradimo.com/learn/chart-analysis/time-frames/ In the comments, there is a permalink to the following URL; en.tradimo.com/co/50c450005f2b949e3200001b/ Â (there is no content here, and never has been). Â This URL 302 redirects to the following final URL: http://en.tradimo.com/learn/chart-analysis/time-frames/?offset=0&limit=20#50c450005f2b949e3200001b The problem is, Google is indexing the redirect URL (en.tradimo.com/co/50c450005f2b949e3200001b/) and showing duplicate content even though we are using the nofollow tag on these links. Ideally, we would directly use the last link rather than redirecting. Â Alternatively, I'd say a 301 redirect would be preferable. Â But if both aren't available, is there a way to get these pages out of the index? Is the canonical tag the best way? Â I really wish I could just add /co/ to the robots.txt file, but I think they would still be in the index, right? Thanks for your help!
Technical SEO | | etruvian0 -
Instant Indexing
I've been  working on a site for a while now, methodically building content and building trust and authority. Lately I've noticed that anything I publish there appears to be instantly indexed by Google, which surprises me. I haven't had this happen before so I'm curious. I'd be interested to hear the experience of others.
Technical SEO | | waynekolenchuk0 -
How to get Google to index another page
Hi, I will try to make my question clear, although it is a bit complex. For my site the most important keyword is "Insurance" or at least the danish variation of this. My problem is that Google are'nt indexing my frontpage on this, but are indexing a subpage - www.mydomain.dk/insurance instead of www.mydomain.dk. My link bulding will be to subpages and to my main domain, but i wont be able to get that many links to www.mydomain.dk/insurance. So im interested in making my frontpage the page that is my main page for the keyword insurance, but without just blowing the traffic im getting from the subpage at the moment. Is there any solutions to do this? Thanks in advance.
Technical SEO | | Petersen110 -
Struggling to get my lyrics website fully indexed
Hey guys, been a longtime SEOmoz user, only just getting heavily into SEO now and this is my first query, apologies if it's simple to answer but I have been doing my research! My website is http://www.lyricstatus.com - basically it's a lyrics website. Rightly or wrongly, I'm using Google Custom Search Engine on my website for search, as well as jQuery auto-suggest - please ignore the latter for now. My problem is that when I launched the site I had a complex AJAX Browse page, so Google couldn't see static links to all my pages, thus it only indexed certain pages that did have static links. This led to my searches on my site using the Google CSE being useless as very few pages were indexed. I've since dropped the complex AJAX links and replaced it with easy static links. However, this was a few weeks ago now and still Google won't fully index my site. Try doing a search for "Justin Timberlake" (don't use the auto-suggest, just click the "Search" button) and it's clear that the site still hasn't been fully indexed! I'm really not too sure what else to do, other than wait and hope, which doesn't seem like a very proactive thing to do! My only other suspicion is that Google sees my site as more duplicate content, but surely it must be ok with indexing multiple lyrics sites since there are plenty of different ones ranking in Google. Any help or advice greatly appreciated guys!
Technical SEO | | SEOed0