Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
No Index PDFs
-
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
-
The files aren't duplicate. I am familiar with using the XRobots tag. I was really just curious if my theory would work.
Thanks for all your input.
-
Hi Monica,
I presume you already check all the options before posting this question. I have concluded this by seeing your others posts/reply in this community.
Now here is my answer
To prevent your PDF file (or any non HTML file) from being listed in search results, the only way is to use the HTTP X-Robots-Tag response header, e.g.:
X-Robots-Tag: noindex
robots.txt does not prevent your page from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never ever ever disallow a page in robots.txt if you employ the X-Robots-Tag header.
I hope it helps but not very sure.
Thanks
-
-
If you want to deindex all PDF files, I recommend using the x-robots-tag in .htaccess - https://yoast.com/x-robots-tag-play/
-
If the PDFs are pdf versions of existing pages, I would set canonicals to point to the URL you do want indexed (#2 on http://moz.com/blog/htaccess-file-snippets-for-seos )
-
-
If the pdf's are in a separate folder on your site - you could mark that folder as noindex in robots.txt
As far as I know, it's not possible to add a noindex to a link.
rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing bad URLS
Hi All, The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/ I have done the following to prevent these URLs from being created & indexed: 1. Added a directive in my Htaccess to 404 all of these URLs 2. Blocked /wp-content/uploads/revslider/ in my robots.txt 3. Manually de-inedex each URL using the GSC tool 4. Deleted the plugin However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I Thanks!
Technical SEO | | Tom3_150 -
Indexing Issue of Dynamic Pages
Hi All, I have a query for which i am struggling to find out the answer. I unable to retrieve URL using "site:" query on Google SERP. However, when i enter the direct URL or with "info:" query then a snippet appears. I am not able to understand why google is not showing URL with "site:" query. Whether the page is indexed or not? Or it's soon going to be deindexed. Secondly, I would like to mention that this is a dynamic URL. The index file which we are using to generate this URL is not available to Google Bot. For instance, There are two different URL's. http://www.abc.com/browse/ --- It's a parent page.
Technical SEO | | SameerBhatia
http://www.abc.com/browse/?q=123 --- This is the URL, generated at run time using browse index file. Google unable to crawl index file of browse page as it is unable to run independently until some value will get passed in the parameter and is not indexed by Google. Earlier the dynamic URL's were indexed and was showing up in Google for "site:" query but now it is not showing up. Can anyone help me what is happening here? Please advise. Thanks0 -
Indexed pages
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth?
Technical SEO | | muzzmoz1 -
Do URLs with canonical tags get indexed by Google?
Hi, we re-branded and launched a new website in February 2016. In June we saw a steep drop in the number of URLs indexed, and there have continued to be smaller dips since. We started an account with Moz and found several thousand high priority crawl errors for duplicate pages and have since fixed those with canonical tags. However, we are still seeing the number of URLs indexed drop. Do URLs with canonical tags get indexed by Google? I can't seem to find a definitive answer on this. A good portion of our URLs have canonical tags because they are just events with different dates, but otherwise the content of the page is the same.
Technical SEO | | zasite0 -
Homepage not indexed - seems to defy explanation
Hey folks Hoping to get some more eyes on a specific problem I am seeing with a clients site. Site: http:www.ukjuicers.com We have checked everything we can think of and the usual suspects here are not present: Canonical URL is in place Site is shown as indexed in search console No Crawl, DNS, Connectivity or server errors No robots.txt blocking - verified in search console No robots meta tags or directives Fetch as Google works Fetch & render works site command returns all other pages info command does not return the homepage homepage is cached and cache has been updated since this issue started: http://webcache.googleusercontent.com/search?q=cache:www.ukjuicers.com homepage is indexed in yahoo and Bing all variations redirect to the www.ukjuicers.com domain (.co.uk, .com, www, sans www etc) The only issue I found after some extensive digging was some issues with the HTTP and HTTPS versions of the site both being available and both specifying the canonical version as themselves. So, http site used canonicals with http and https site used canonicals with https. So, a conflict there with the canonical exacerbating the problem it is there to solve. The HTTPS site is not indexed though and we have set this up in webmaster tools and now the web developer has set redirects to ensure all versions even the https now 301 redirect to the http://www.ukjuicers.com page so these canonical issues have been ironed out. But... it's still not indexing the homepage. The practical implications of this are quite scary - the site used to be somewhere between 1st and 4th for keywords like 'juicers', 'juicer' etc. Now they are bottom of page 1 or top of page 2 with an internal page. They were jostling with the big boys (amazon, argos, john lewis etc) but now they are right at the bottom of the second page. It's a strange one - i have seen all manor of technical problems over the years but this one seems to defy sensible explanation. The next step is to do a full technical SEO audit of the site but I am always of the opinion that with many eyes all bugs are shallow so if anyone has any input or experience with odd indexation problems like this would love to get your input. Cheers
Technical SEO | | Marcus_Miller
Marcus0 -
Should I put meta descriptions on pages that are not indexed?
I have multiple pages that I do not want to be indexed (and they are currently not indexed, so that's great). They don't have meta descriptions on them and I'm wondering if it's worth my time to go in and insert them, since they should hypothetically never be shown. Does anyone have any experience with this? Thanks! The reason this is a question is because one member of our team was linking to this page through Facebook to send people to it and noticed random text on the page being pulled in as the description.
Technical SEO | | Viewpoints0 -
Index.php and 301 redirect with Joomla
Hi, I'm running Joomla 1.7 with SEF on and I'm trying to do a htaccess redirect which fails. I have approximately 100 in effect so far and all working fine, but I have one snag. Index.php is not working as I need it to when it's redirected to www.myurl.com/ If I turn on index.php redirect to root using this code #index.php to root
Technical SEO | | NaescentAdam
RewriteCond %{HTTP_HOST} ^myurl.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.myurl.com$
RewriteRule ^index.php$ "http://www.myurl.com/" [R=301,L] And then go to www.myurl.com/test.html I'm redirected to the homepage. I think this is because all pages are index.php in joomla. SEOMOZ and Google both think that index.php and root are duplicate pages. Does anyone have any advice for overcoming this? Thanks, Adam0 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3