No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.
Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.
Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://moz.com/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regarding Mobile first Indexing
My Site name GiftaLove.com Desktop version - https://www.giftalove.com/
Technical SEO | | Packersmove
Mobile version - https://m.giftalove.com/ How to enable mobile first Indexing in Desktop and Mobile version sites. Not found any message from both sites desktop and mobile version. Please resolve my Issue.0 -
URL Change, Old URLs Still In Index
Recently changed URLs on a website to remove dynamic parameters. We 301'd the old dynamic links (canonical version) to the cleaner parameter-free URLs. We then updated the canonical tags to reflect these changes. All pages dropped at least a few ranking positions and now Moz shows both the new page ranking slightly lower in results pages and the old page still in the index. I feel like I'm splitting value between the two page versions until the old one disappears... is there a way to consolidate this quickly?
Technical SEO | | ShawnW0 -
How can I tell Google not to index a portion of a webpage?
I'm working with an ecommerce site that has many product descriptions for various brands that are important to have but are all straight duplicates. I'm looking for some type of tag tht can be implemented to prevent Google from seeing these as duplicates while still allowing the page to rank in the index. I thought I had found it with Googleoff, googleon tag but it appears that this is only used with the google appliance hardware.
Technical SEO | | bradwayland0 -
Carwling and indexing problems
hi, i have noticed since my site was upgraded that google is taking a long time to publish my articles. before the upgrade google would publish the article straight away, but now it takes an average of around 4 days. the article i am talking about at the moment is here http://www.in2town.co.uk/celebrities-in-the-news/stuart-hall-has-his-prison-sentence-for-sex-crimes-doubled-to-30-months now i have a blog here on blogger and the article was picked up within six mins http://showbizgossipandnews.blogspot.co.uk/2013/07/stuart-hall-has-his-prison-sentence-for.html so i am just wondering what the problem is and what i need to solve this my problem is, my site is mostly a news site so it is no good to me if google is publishing new stories every four days, any help would be great.
Technical SEO | | ClaireH-1848860 -
Find all 404 links in my site that are indexed
Hi All, Find all 404 links in my site that are indexed. We deleted a lot of URl's from site but now i dont have the track of all we deleted. Any site/Tool can scan the index and give me the exact URL's so I can use https://www.google.com/webmasters/tools/removals?hl=en&rlf=all Regards Martin
Technical SEO | | mtthompsons0 -
Should I index my search result pages?
I have a job site and I am planning to introduce a search feature. The question I have is, is it a good idea to index search results even if the query parameters are not there? Example: A user searches for "marketing jobs in New York that pay more than 50000$". A random page will be generated like example.com/job-result/marketing-jobs-in-new-york-that-pay-more-than-50000/ For any search that gets executed, the same procedure would be followed. This would result in a large number of search result pages automatically set up for long tail keywords. Do you think this is a good idea? Or is it a bad idea based on all the recent Google algorithm updates?
Technical SEO | | jombay0 -
Number of Indexed Pages in Webmaster Tools
My # of indexed pages in Webmaster Tools fluctuates greatly. Compared to the # of URLs submitted (4700), we have 3000 indexed. The other day, all 4700 were indexed. Why does it keep changing? I obviously want all of them indexed right? What can I do to make that happen?
Technical SEO | | kylesuss0 -
Google refuses to index our domain. Any suggestions?
A very similar question was asked previously. (http://www.seomoz.org/q/why-google-did-not-index-our-domain) We've done everything in that post (and comments) and then some. The domain is http://www.miwaterstewardship.org/ and, so far, we have: put "User-agent: * Allow: /" in the robots.txt (We recently removed the "allow" line and included a Sitemap: directive instead.) built a few hundred links from various pages including multiple links from .gov domains properly set up everything in Webmaster Tools submitted site maps (multiple times) checked the "fetch as googlebot" display in Webmaster Tools (everything looks fine) submitted a "request re-consideration" note to Google asking why we're not being indexed Webmaster Tools tells us that it's crawling the site normally and is indexing everything correctly. Yahoo! and Bing have both indexed the site with no problems and are returning results. Additionally, many of the pages on the site have PR0 which is unusual for a non-indexed site. Typically we've seen those sites have no PR at all. If anyone has any ideas about what we could do I'm all ears. We've been working on this for about a month and cannot figure this thing out. Thanks in advance for your advice.
Technical SEO | | NetvantageMarketing0