Robots.txt query
-
Quick question, if this appears in a clients robots.txt file, what does it mean?
Disallow: /*/_/
Does it mean no pages can be indexed? I have checked and there are no pages in the index but it's a new site too so not sure if this is the problem.
Thanks
Karen
-
Thank you so much, that is a great help!
-
That blocks all spiders from viewing those pages. I am not sure what and who did the /* /_/, but unless there is something there they don't want indexed then it is not necessary to keep it.
One thing you mind want to keep in mind as well, just because you block it on robots txt, doesn't mean a spider can't still go there.
Sometimes they don't listen to the robots txt(looking at you baidu)
-
User-agent: *
Thanks for your response.
-
What is the user agent?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I add my html sitemap to Robots?
I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo0 -
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Duplicate content: using the robots meta tag in conjunction with the canonical tag?
We have a WordPress instance on an Apache subdomain (let's say it's blog.website.com) alongside our main website, which is built in Angular. The tech team is using Akamai to do URL rewrites so that the blog posts appear under the main domain (website.com/more-keywords/here). However, due to the way they configured the WordPress install, they can't do a wildcard redirect under htaccess to force all the subdomain URLs to appear as subdirectories, so as you might have guessed, we're dealing with duplicate content issues. They could in theory do manual 301s for each blog post, but that's laborious and a real hassle given our IT structure (we're a financial services firm, so lots of bureaucracy and regulation). In addition, due to internal limitations (they seem mostly political in nature), a robots.txt file is out of the question. I'm thinking the next best alternative is the combined use of the robots meta tag (no index, follow) alongside the canonical tag to try to point the bot to the subdirectory URLs. I don't think this would be unethical use of either feature, but I'm trying to figure out if the two would conflict in some way? Or maybe there's a better approach with which we're unfamiliar or that we haven't considered?
Technical SEO | | prasadpathapati0 -
Do robot.txts permanently affect websites even after they have been removed?
A client has a Wordpress blog to sit alongside their company website. They kept it hidden whilst they were developing what it looked like, keeping it un-searchable by Search Engines. It was still live, but Wordpress put a robots.txt in place. When they were ready they removed the robots.txt by clicking the "allow Search Engines to crawl this site" button. It took a month and a half for their blog to show in Search Engines once the robot.txt was removed. Google is now recognising the site (as a "site:" test has shown) however, it doesn't rank well for anything. This is despite the fact they are targeting keywords with very little organic competition. My question is - could the fact that they developed the site behind a robot.txt (rather than offline) mean the site is permanently affected by the robot.txt in the eyes of the Search Engines, even after that robot.txt has been removed? Thanks in advance for any light you can shed on the situation.
Technical SEO | | Driver720 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Will a robots.txt disallow apply to a 301ed URL?
Hi there, I have a robots.txt query which I haven't tried before and as we're nearing a big time for sales, I'm hesitant to just roll out to live! Say for example, in my robots.txt I disallow the URL 'example1.html'. In reality, 'example1.html' 301s/302s to 'example2.html'. Would the robots.txt directive also apply to 'example2.html' (disallow) or as it's a separate URL, would the directive be ignored as it's not valid? I have a feeling that as it's a separate URL, the robots disallow directive won't apply. However, just thought I'd sense-check with the community.
Technical SEO | | ecommercebc0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Rel canonical with index follow on query string URLs
Hi guys, Quick question regarding the rel canonical tag. I have lots of links pointing at me with query strings and previously used some code to determine if query strings were in the URL and if they were then not to index that page. If there weren't query strings then the page would be indexed and followed. I assume I can now use the rel canonical tag on each of these pages so the value goes to the proper URL minus any query string. However do I need to have the rel canonical tag above the index, follow tag on the page? So URL is site.com/page.html?ref=ABC meta robots is "index, follow" Rel canonical is "site.com/page.html" Does the order of the meta robots and canonical tag matter? Thanks in advance!
Technical SEO | | panini0