Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.
Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.
Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://moz.com/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Discovered - currently not indexed issue
Hello all, We have a sitemap with URLs that have mostly user generated content. Profile Overview section. Where users write about their services and some other things. Out of 46K URLs, only 14K are valid according to search console and 32K URLs are excluded. Out of these 32K, 28K are "Discovered - currently not indexed". We can't really update these pages as they have user generated content. However we do want to leverage all these pages to help us in our SEO. So the question is how do we make all of these pages indexable? If anyone can help in the regard, please let me know. Thanks!
Technical SEO | | akashkandari0 -
Why images are not getting indexed and showing in Google webmaster
Hi, I would like to ask why our website images not indexing in Google. I have shared the following screenshot of the search console. https://www.screencast.com/t/yKoCBT6Q8Upw Last week (Friday 14 Sept 2018) it was showing 23.5K out 31K were submitted and indexed by Google. But now, it is showing only 1K 😞 Can you please let me know why might this happen, why images are not getting indexed and showing in Google webmaster.
Technical SEO | | 21centuryweb0 -
Indexed, but not shown in search result
Hi all We face this problem for www.residentiebosrand.be, which is well programmed, added to Google Search Console and indexed. Web pages are shown in Google for site:www.residentiebosrand.be. Website has been online for 7 weeks, but still no search results. Could you guys look at the update below? Thanks!
Technical SEO | | conversal0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Question on noscript tags and indexing
If I have a <noscript>tag on every page of my website with the same sentence over and over saying something to the effect of "Sorry our site uses Javascript, please enable javascript for the full site experience.", Webmaster Tools will tell me that one of the most common words on my site is "Javascript".</p> <p>Is this something to be concerned about from an SEO perspective? My site is obviously not about Javascript and I don't want to dilute my page's topic or authority by repeating words that are not relevant to the topic of my site.</p> <p>Thanks!</p></noscript>
Technical SEO | | IrvCo_Interactive0 -
Struggling to get my lyrics website fully indexed
Hey guys, been a longtime SEOmoz user, only just getting heavily into SEO now and this is my first query, apologies if it's simple to answer but I have been doing my research! My website is http://www.lyricstatus.com - basically it's a lyrics website. Rightly or wrongly, I'm using Google Custom Search Engine on my website for search, as well as jQuery auto-suggest - please ignore the latter for now. My problem is that when I launched the site I had a complex AJAX Browse page, so Google couldn't see static links to all my pages, thus it only indexed certain pages that did have static links. This led to my searches on my site using the Google CSE being useless as very few pages were indexed. I've since dropped the complex AJAX links and replaced it with easy static links. However, this was a few weeks ago now and still Google won't fully index my site. Try doing a search for "Justin Timberlake" (don't use the auto-suggest, just click the "Search" button) and it's clear that the site still hasn't been fully indexed! I'm really not too sure what else to do, other than wait and hope, which doesn't seem like a very proactive thing to do! My only other suspicion is that Google sees my site as more duplicate content, but surely it must be ok with indexing multiple lyrics sites since there are plenty of different ones ranking in Google. Any help or advice greatly appreciated guys!
Technical SEO | | SEOed0