Can I rely on just robots.txt
-
We have a test version of a clients web site on a separate server before it goes onto the live server.
Some code from the test site has some how managed to get Google to index the test site which isn't great!
Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
-
You can do the inbound link check right here using SEOMoz's Open Site Explorer tool to check for links to the dev site, whether it's in a subdomain, subfolder or a separate site.
Good luck!
Paul
-
thats a great help cheers
wheres the best place to do an inbound link check?
-
You're actually up against a bit of a sticky wicket here, SS. You do need the no-index, no-follow meta tags on each page as Irving mentions.
HOWEVER! If you also add a robots.txt directive not to index the site, the search crawlers will not crawl your pages and therefore will never see the noindex metatag to know to remove the incorrectly-indexed pages from their index.
My recommendation is for a belt & suspenders approach.
- implement the meta no-index, no-follow tags throughout the dev site, but do NOT immediately implement the robots.txt exclusion. Wait a day or two until the pages get recrawled and the bots discover the noindex metatags
- Use the Remove URL tools in both Google and Bing Webmaster Tools to request removal of all the dev pages you are aware have been indexed.
- Then add the exclusion directive to the robots.txt file to keep the crawlers out from then on (leaving the no-index, no-follow tags in place).
- check back in the SERPS periodically to check that no other dev pages have been indexed. IF they have, do another manual removal request.
Does that make sense?
Paul
P.S. As a last measure, run an inbound links check on the dev pages that got indexed to find out which external pages are linking to the dev pages. Get those inbound links removed ASAP so the search engines aren't getting any signals to index the dev site. Last option would be to simply password-protect the directory the dev site is in. A little less convenient, but guaranteed to keep the crawlers out.
-
cheers, i thought as much
-
You cannot rely on robots.txt alone, you need to add the meta noindex tag to the pages as well to ensure that they will not get indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
"5XX (Server Error)" - How can I fix this?
Hey Mozers! Moz Crawl tells me I am having an issue with my Wordpress category - it is returning a 5XX error and i'm not sure why? Can anyone help me determine the issue? Crawl Issues and Notices for: http://www.refusedcarfinance.com/news/category/news We found 1 crawler issue(s) for this page. High Priority Issues 1 5XX (Server Error) 5XX errors (e.g., a 503 Service Unavailable error) are shown when a valid request was made by the client, but the server failed to complete the request. This can indicate a problem with the server, and should be investigated and fixed.
Technical SEO | | RocketStats0 -
Can someone evaluate this page so I can continue adding others?
Hi, I am adding a bunch of similar category stickers and I am not looking into that good SEO for these since there will be hundreds of them coming but I just want to include the relevant keywords that people perhaps use in the Google image search to take them to our site. They are all related to JDM (Japanese Domestic Motors) so I decided to include JDM at the end of all the SEO titles. I am writing totally different short descriptions for all of these stickers and the Related Products are changing as well. I just want to achieve something like Amazon or eBay listings do - not the perfect SEO since I cannot spend too much time with each sticker optimizing it but I don't want to NOINDEX, FOLLOW them either - hence the different related products for all items and also unique short descriptions. If you check one of the pages: http://www.redrockdecals.com/rising-sun-wakaba-leaf-sticker-red-black-jdm Do you think I should be in the safe side so I don't hurt my overall SEO? Thanks!!
Technical SEO | | speedbird12290 -
Confirming Robots.txt code deep Directories
Just want to make sure I understand exactly what I am doing If I place this in my Robots.txt Disallow: /root/this/that By doing this I want to make sure that I am ONLY blocking the directory /that/ and anything in front of that. I want to make sure that /root/this/ still stays in the index, its just the that directory I want gone. Am I correct in understanding this?
Technical SEO | | cbielich0 -
Can you do a 301 redirect without a hosting account?
Trying to retire domain1 and 301 it to domain2 - just don't want to get stuck having to pay the old hosting provider simply to serve a .htaccess file with the redirect rule.
Technical SEO | | TitanDigital0 -
Can dynamically translated pages hurt a site?
Hi all...looking for some insight pls...i have a site we have worked very hard on to get ranked well and it is doing well in search. The site has about 1000 pages and climbing and has about 50 of those pages in translated pages and are static pages with unique urls. I have had no problems here with duplicate content and that sort of thing and all pages were manually translated so no translation issues. We have been looking at software that can dynamically translate the complete site into a handfull of languages...lets say about 5. My problem here is these pages get produced dynamically and i have concerns that google will take issue with this aswell as the huge sudden influx of new urls....as now we could be looking at and increase of 5000 new urls. (which usually triggers an alarm) My feeling is that it could be risking the stability of the site that we have worked so hard for and maybe just stick with the already translated static pages. I am sure the process could be fine but fear a manual inspection and a slap on the wrist for having dynamically created content?? and also just risk a review trigger period. These days it is hard to know what could get you in "trouble" and my gut says keep it simple and as is and dont shake it up?? Am i being overly concerned? Would love to here from others who have tried similar changes and also those who have not due to similar "fear" thanks
Technical SEO | | nomad-2023230 -
What is the value in Archiving and how can I avoid negative SEO impact?
I have been very busy reducing GWT duplicate content errors on my website, www.heartspm.com, created on a Wordpress platform. Each month, blog entries are being archived and each month is generating a duplicate description by Google. We post 2-3 blog entries per month and they don't really go out of date. Most are not news related butr rather they are nuggets of information on entomology. Do I need to use the archiving feature? Can I turn it off? Should I switch to archive perhaps once per year instead of every month and how is that done? How do I stop Google from creating its' own meta-description, duplicates each month for these archive entries? Should I have the archive as NOINDEX, FOLLOW? I'm not the programmer, but I have some technical know how, so I have a lot of half baked ideas and answers that could use some polishing. Thanks for your help and suggestions. Gerry
Technical SEO | | GerryWeitz0 -
Mobile site: robots.txt best practices
If there are canonical tags pointing to the web version of each mobile page, what should a robots.txt file for a mobile site have?
Technical SEO | | bonnierSEO0