Does Bing ignore robots txt files?
-
Bonjour from "Its a miracle is not raining" Wetherby Uk
Ok here goes... Why despite a robots text file excluding indexing to site
http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google?
Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below.
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg
Any insights welcome
-
Thanks Clever PHD - we are now adding your recommendations to our preview sites
-
I know this does not sound related, but Matt Cutts explains this same situation on Google. It is probably the same reasoning for Bing.
http://www.mattcutts.com/blog/robots-txt-remove-url/
Looking at your screen shot, it looks as if all that is being shown in Bing is just the URL, no title tag, description, no other information.
What Matt says is that they did not technically crawl the url, but they are aware that it exists. Example, there is another page linking to it with related content or the anchor tag on the link relates to the keyword search you are performing.
You are searching for the URL specifically and so it makes sense that they would show the URL as it relates to that search, but they are not showing any information from the page as they do not have it as they did not spider it, again, they are just aware of the URL. Kind of like talking to a lawyer eh?
If you search for any other keywords does this excluded site show up? Probably not. If the do, then they are probably only showing the URL like in the example above.
The video has more details. Here are the solutions he gives, I will outline them as well
-
Use the Bing URL removal tool - bing bang boom. Done.
-
(my new favorite) Let the page / site be indexed but then show an noindex nofollow meta tag on the page / site. There is a subtle but important difference in the meta tag vs the robot.txt file. The spiders have to be able to crawl the page to be able to see what they are supposed to do with it.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it."
The thing is, if you have a robots.txt file that says don't crawl the site, then the spider never gets to the noindex meta tag to know to delete the page from the index. It sounds a little backwards, but when the page is already in the search index, you have to let the spider crawl it to then see the noindex tag so that the search engine will know to remove it from the index.
Here is what you can do as this seems to only be an issue with Bing and just with the home page. Open up the robots.txt to allow Bing to crawl the site. Restrict the crawling to the home page only and exclude all the other pages from the crawl.
On the home page that you allow Bing to crawl, add the noindex no follow meta tag and you should be set.
All of that said. If you have a single URL listed in bing with no meta data, it may not be worth all the above effort as you are not ranking for any valuable key words, but that is your call
It is always interesting to see how the spiders and engines think so I wanted to pass this along.
Cheers!
PS - If you have a ton of pages like this - then you just would allow Bing to crawl them all and add the noindex nofollow tag to all of them.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Bloking pages in roborts.txt that are under a redirected subdomain
Hi Everyone, I have a lot of Marketo landing pages that I don't want to show in SERP. Adding the noindex meta tag for each page will be too much, I have thousands of pages. Blocking it in roborts.txt could have been an option, BUT, the subdomain homepage is redirected to my main domain (with a 302) so I may confuse search engines ( should they follow the redirect or should they block) marketo.mydomain.com is redirected to www.mydomain.com disallow: / (I think this will be confusing with the redirect) I don't have folders, all pages are under the subdomain, so I can't block folders in Robots.txt also Would anyone had this scenario or any suggestions? I appreciate your thoughts here. Thank you Rachel
Technical SEO | | RaquelSaiz0 -
'domain:example.com/' is this line with a '/' at the end of the domain valid in a disavow report file ?
Hi everyone Just out of curiosity, what would happen if in my disavow report I have this line : domain:example.com**/** instead of domain:example.com as recommended by google. I was just wondering if adding a / at the end of a domain would automatically render the line invalid and ignored by Google's disavow backlinks tool. Many thanks for your thoughts
Technical SEO | | LabeliumUSA0 -
Fetch as Google - stylesheets and js files are temporarily unreachable
Fetch as Google often says that some of my stylesheets and js files are temporarily unreachable. Is that a problem for SEO? These stylesheets and scripts aren't blocked and Search Consoles show that a normal user would see the page just fine.
Technical SEO | | WebGain0 -
Robots.txt on refinements
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
Technical SEO | | Gordian0 -
The use of robots.txt
Could someone please confirm that if I do not want to block any pages from my URL, then I do not need a robots.txt file on my site? Thanks
Technical SEO | | ICON_Malta0 -
Should I block robots from URLs containing query strings?
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL. Might there be any downside to this that I need to consider? Appreciate your help / experiences on this one. Thanks Jenni
Technical SEO | | ShearingsGroup0 -
.lbi file - SEO friendly or not?
Up until yesterday afternoon i had never heard of a .lbi file. It turns out it is a library file used by Adobe Dreamweaver. From what i can tell it works like a client side included but i am unsure of the technology behind it. The issue:
Technical SEO | | kchandler
When running through a recent SEO audit for a new client i found these .lbi files being used all over there site for site wide callouts and even navigation. When viewing this content through firebug or in the browser you can see the executed HTML content but when viewing the source or the page in seo-browser.com the content is nowhere to be seen. So my thought is this is not SEO friendly and is the same as displaying content in any client-side script like JavaScript or JQuery. Any feedback or thoughts on this subject would be awesome, especially if anyone has used these previously. Unfortunately i cannot share the client site but i would be more than happy to answer any questions if more detail is needed. Thanks in advance - Kyle0 -
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?
Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing0