Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?
-
I have a site that produces thousands of pages via file uploads. These pages are then linked to by users for others to download what they have uploaded.
Naturally, the client has blocked the parameter which precedes these pages in an attempt to keep them from being indexed. What they did not consider, was they these pages are attracting hundreds of thousands of links that are not passing any authority to the main domain because they're being blocked in robots.txt
Can I allow google to follow, but NOT index these pages via a robots.txt file --- or would this have to be done on a page by page basis?
-
Since you have those pages blocked via robots.txt, the bots would never even crawl these pages in theory...which means the Noindex,follow is not helping.
Also, if you do a report on the domain on opensiteexplorer and dig, you should be able to find tons of those links already showing up. So if my site is linking to a page on that site, that page may not be cached/indexed because of the robots.txt exclusion, but that as long as my site is follow, your domain is still getting the credit for the link.
Does that make sense ?
-
Answered my own question.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google robots.txt test - not picking up syntax errors?
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me... e.g. /url/?*
Intermediate & Advanced SEO | | McTaggart
/url/?
/url/* and so on. I would use ? and not ? for example and what is ? for! - etc. Yet "Google robots.txt Tester" did not highlight the issues... I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns. Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors? Thanks, Luke0 -
Differences between "casas rusticas" and "casas rústicas"
Hi All, I've a client with this website: http://www.e-rustica.com/casas-rusticas It's a spanish realtor for special houses (rustic). We wanto it to be good posited as "casas rústicas" that it's the correct keyword and asl "casas rusticas" that it's like lot of people write it. Do you know if google see this two keywords as the same? Even we've done SEO for "casas rústicas" it's much better posited for "casas rusticas". Regards,
Intermediate & Advanced SEO | | lbenzo_aficiona0 -
"No Index" Extensions
Hi there, We run an e-commerce website and we are aware of our duplicate page content/title problems. We know about the "rel canonical" tag and the "no index" tag but I am more interested in the latter. We use a CMS called Magento. Now, Magento has an extension that allows you to use the "no follow" and "no index" tag on products. Google has indexed many of our pages and I wanted to know if applying the "no index" tag on duplicate pages will instruct Google to remove the duplicate url's it has already indexed. I know the tag will tell Google not to index a page but what if I apply it to a product already indexed?
Intermediate & Advanced SEO | | iBags0 -
After Receiving a "Googlebot can't access your site" would this stop your site from being crawled?
Hi Everyone,
Intermediate & Advanced SEO | | AMA-DataSet
A few weeks ago now I received a "Googlebot can't access your site..... connection failure rate is 7.8%" message from the webmaster tools, I have since fixed the majority of these issues but iv noticed that all page except the main home page now have a page rank of N/A while the home page has a page rank of 5 still. Has this connectivity issues reduced the page ranks to N/A? or is it something else I'm missing? Thanks in advance.0 -
Our Robots.txt and Reconsideration Request Journey and Success
We have asked a few questions related to this process on Moz and wanted to give a breakdown of our journey as it will likely be helpful to others! A couple of months ago, we updated our robots.txt file with several pages that we did not want to be indexed. At the time, we weren't checking WMT as regularly as we should have been and in a few weeks, we found that apparently one of the robots.txt files we were blocking was a dynamic file that led to the blocking of over 950,000 of our pages according to webmaster tools. Which page was causing this is still a mystery, but we quickly removed all of the entries. From research, most people say that things normalize in a few weeks, so we waited. A few weeks passed and things did not normalize. We searched, we asked and the number of "blocked" pages in WMT which had increased at a rate of a few hundred thousand a week were decreasing at a rate of a thousand a week. At this rate it would be a year or more before the pages were unblocked. This did not change. Two months later and we were still at 840,000 pages blocked. We posted on the Google Webmaster Forum and one of the mods there said that it would just take a long time to normalize. Very frustrating indeed considering how quickly the pages had been blocked. We found a few places on the interwebs that suggested that if you have an issue/mistake with robots.txt that you can submit a reconsideration request. This seemed to be our only hope. So, we put together a detailed reconsideration request asking for help with our blocked pages issue. A few days later, to our horror, we did not get a message offering help with our robots.txt problem. Instead, we received a message saying that we had received a penalty for inbound links that violate Google's terms of use. Major backfire. We used an SEO company years ago that posted a hundred or so blog posts for us. To our knowledge, the links didn't even exist anymore. They did.... So, we signed up for an account with removeem.com. We quickly found many of the links posted by the SEO firm as they were easily recognizable via the anchor text. We began the process of using removem to contact the owners of the blogs. To our surprise, we got a number of removals right away! Others we had to contact another time and many did not respond at all. Those we could not find an email for, we tried posting comments on the blog. Once we felt we had removed as many as possible, we added the rest to a disavow list and uploaded it using the disavow tool in WMT. Then we waited... A few days later, we already had a response. DENIED. In our request, we specifically asked that if the request were to be denied that Google provide some example links. When they denied our request, they sent us an email and including a sample link. It was an interesting example. We actually already had this blog in removem. The issue in this case was, our version was a domain name, i.e. www.domainname.com and the version google had was a wordpress sub domain, i.e. www.subdomain.wordpress.com. So, we went back to the drawing board. This time we signed up for majestic SEO and tied it in with removem. That added a few more links. We also had records from the old SEO company we were able to go through and locate a number of new links. We repeated the previous process, contacting site owners and keeping track of our progress. We also went through the "sample links" in WMT as best as we could (we have a lot of them) to try to pinpoint any other potentials. We removed what we could and again, disavowed the rest. A few days later, we had a message in WMT. DENIED AGAIN! This time it was very discouraging as it just didn't seem there were any more links to remove. The difference this time, was that there was NOT an email from Google. Only a message in WMT. So, while we didn't know if we would receive a response, we responded to the original email asking for more example links, so we could better understand what the issue was. Several days passed we received an email back saying that THE PENALTY HAD BEEN LIFTED! This was of course very good news and it appeared that our email to Google was reviewed and received well. So, the final hurdle was the reason that we originally contacted Google. Our robots.txt issue. We did not receive any information from Google related to the robots.txt issue we originally filed the reconsideration request for. We didn't know if it had just been ignored, or if there was something that might be done about it. So, as a last ditch final effort, we responded to the email once again and requested help as we did the other times with the robots.txt issue. The weekend passed and on Monday we checked WMT again. The number of blocked pages had dropped over the weekend from 840,000 to 440,000! Success! We are still waiting and hoping that number will continue downward back to zero. So, some thoughts: 1. Was our site manually penalized from the beginning, yet without a message in WMT? Or, when we filed the reconsideration request, did the reviewer take a closer look at our site, see the old paid links and add the penalty at that time? If the latter is the case then... 2. Did our reconsideration request backfire? Or, was it ultimately for the best? 3. When asking for reconsideration, make your requests known? If you want example links, ask for them. It never hurts to ask! If you want to be connected with Google via email, ask to be! 4. If you receive an email from Google, don't be afraid to respond to it. I wouldn't over do this or spam them. Keep it to the bare minimum and don't pester them, but if you have something pertinent to say that you have not already said, then don't be afraid to ask. Hopefully our journey might help others who have similar issues and feel free to ask any further questions. Thanks for reading! TheCraig
Intermediate & Advanced SEO | | TheCraig5 -
Avoiding 301 on purpose; Landing homepage linking to another domain with "Click here to go" and 5 sec meta refresh
Hello, Some users when they search for our site by using "ourbrand" keyword that ignore the first result (we will call it here ourbrand.de -not real name-) and they look for ourbrand.com . Even though we have that domain name also registered (indeed it also has a high ranking power) we are doing a 301 from the dot com to the dot.de . What we want to do is to index the homepage of the dot com, that is http://www.ourband.com as a secondary result while doing a 301 to any other internal URL of the dot com to the dot .de. Yes, we will loose link juice for the main domain but at least we will not loose visits from the brand traffic (which is our main traffic). So the question is, would Google index ourbrand.com if we show just a landing page that just show our logo, a "Click here to go to ourbrand.de" with a link to http://www.ourbrand.de and a meta refresh of 6 seconds to that URL? Additionally a cookie would be sent to the first time visitors, so in the next time they would be automatically redirected. PS: The 6 seconds is to avoid search engine consider it a "301" like it do with short meta refresh (not sure what time is the minimum to avoid be considered a 301). Any other suggestions on how to deal with this problem are welcomed
Intermediate & Advanced SEO | | Zillo0 -
Rel="prev" and rel="next" implementation
Hi there since I've started using semoz I have a problem with duplicate content so I have implemented on all the pages with pagination rel="prev" and rel="next" in order to reduce the number of errors but i do something wrong and now I can't figure out what it is. the main page url is : alegesanatos.ro/ingrediente/ and for the other pages : alegesanatos.ro/ingrediente/p2/ - for page 2 alegesanatos.ro/ingrediente/p3/ - for page 3 and so on. We've implemented rel="prev" and rel="next" according to google webmaster guidelines without adding canonical tag or base link in the header section and we still get duplicate meta title error messages for this pages. Do you think there is a problem because we create another url for each page instead of adding parameters (?page=2 or ?page=3 ) to the main url alegesanatos.ro/ingrediente?page=2 thanks
Intermediate & Advanced SEO | | dan_panait0 -
Not using a robot command meta tag
Hi SEOmoz peeps. Was doing some research on robot commands and found a couple major sites that are not using them. If you check out the code for these: http://www.amazon.com http://www.zappos.com http://www.zappos.com/product/7787787/color/92100 http://www.altrec.com/ You fill not find a meta robot command line. Of course you need the line for any noindex, nofollow, noarchive pages. However for pages you want crawled and indexed, is there any benefit for not having the line at all? Thanks!
Intermediate & Advanced SEO | | STPseo0