Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.

Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.

Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://moz.com/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0 -
Index.php and 301 redirect with Joomla
Hi, I'm running Joomla 1.7 with SEF on and I'm trying to do a htaccess redirect which fails. I have approximately 100 in effect so far and all working fine, but I have one snag. Index.php is not working as I need it to when it's redirected to www.myurl.com/ If I turn on index.php redirect to root using this code #index.php to root
Technical SEO | | NaescentAdam
RewriteCond %{HTTP_HOST} ^myurl.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.myurl.com$
RewriteRule ^index.php$ "http://www.myurl.com/" [R=301,L] And then go to www.myurl.com/test.html I'm redirected to the homepage. I think this is because all pages are index.php in joomla. SEOMOZ and Google both think that index.php and root are duplicate pages. Does anyone have any advice for overcoming this? Thanks, Adam0 -
Block a sub-domain from being indexed
This is a pretty quick and simple (i'm hoping) question. What is the best way to completely block a sub domain from getting indexed from all search engines? One item i cannot use is the meta "no follow" tag. Thanks! - Kyle
Technical SEO | | kchandler0 -
How do I redirect index.html to the root / ?
The site I've inherited had operated on index.html at one point, and now uses index.php for the home page, which goes to the / page. The index.html was lost in migrating server hosts. How do I redirect the index.html to the / page? I've tried different options that keep giving ending up with the same 404 error. I tried a redirect from index.html to index.php which ended in an infinite loop. Because the index.html no longer exists in the root, should I created it and then add a redirect to it? Can I avoid this by editing the .htaccess? Any help is appreciated, thanks in advance!
Technical SEO | | NetPicks0 -
Struggling to get my lyrics website fully indexed
Hey guys, been a longtime SEOmoz user, only just getting heavily into SEO now and this is my first query, apologies if it's simple to answer but I have been doing my research! My website is http://www.lyricstatus.com - basically it's a lyrics website. Rightly or wrongly, I'm using Google Custom Search Engine on my website for search, as well as jQuery auto-suggest - please ignore the latter for now. My problem is that when I launched the site I had a complex AJAX Browse page, so Google couldn't see static links to all my pages, thus it only indexed certain pages that did have static links. This led to my searches on my site using the Google CSE being useless as very few pages were indexed. I've since dropped the complex AJAX links and replaced it with easy static links. However, this was a few weeks ago now and still Google won't fully index my site. Try doing a search for "Justin Timberlake" (don't use the auto-suggest, just click the "Search" button) and it's clear that the site still hasn't been fully indexed! I'm really not too sure what else to do, other than wait and hope, which doesn't seem like a very proactive thing to do! My only other suspicion is that Google sees my site as more duplicate content, but surely it must be ok with indexing multiple lyrics sites since there are plenty of different ones ranking in Google. Any help or advice greatly appreciated guys!
Technical SEO | | SEOed0