Can I rely on just robots.txt
-
We have a test version of a clients web site on a separate server before it goes onto the live server.
Some code from the test site has some how managed to get Google to index the test site which isn't great!
Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
-
You can do the inbound link check right here using SEOMoz's Open Site Explorer tool to check for links to the dev site, whether it's in a subdomain, subfolder or a separate site.
Good luck!
Paul
-
thats a great help cheers
wheres the best place to do an inbound link check?
-
You're actually up against a bit of a sticky wicket here, SS. You do need the no-index, no-follow meta tags on each page as Irving mentions.
HOWEVER! If you also add a robots.txt directive not to index the site, the search crawlers will not crawl your pages and therefore will never see the noindex metatag to know to remove the incorrectly-indexed pages from their index.
My recommendation is for a belt & suspenders approach.
- implement the meta no-index, no-follow tags throughout the dev site, but do NOT immediately implement the robots.txt exclusion. Wait a day or two until the pages get recrawled and the bots discover the noindex metatags
- Use the Remove URL tools in both Google and Bing Webmaster Tools to request removal of all the dev pages you are aware have been indexed.
- Then add the exclusion directive to the robots.txt file to keep the crawlers out from then on (leaving the no-index, no-follow tags in place).
- check back in the SERPS periodically to check that no other dev pages have been indexed. IF they have, do another manual removal request.
Does that make sense?
Paul
P.S. As a last measure, run an inbound links check on the dev pages that got indexed to find out which external pages are linking to the dev pages. Get those inbound links removed ASAP so the search engines aren't getting any signals to index the dev site. Last option would be to simply password-protect the directory the dev site is in. A little less convenient, but guaranteed to keep the crawlers out.
-
cheers, i thought as much
-
You cannot rely on robots.txt alone, you need to add the meta noindex tag to the pages as well to ensure that they will not get indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking pages from Moz and Alexa robots
Hello, We want to block all pages in this directory from Moz and Alexa robots - /slabinventory/search/ Here is an example page - https://www.msisurfaces.com/slabinventory/search/granite/giallo-fiesta/los-angeles-slabs/msi/ Let me know if this is a valid disallow for what I'm trying to. User-agent: ia_archiver
Technical SEO | | Pushm
Disallow: /slabinventory/search/* User-agent: rogerbot
Disallow: /slabinventory/search/* Thanks.0 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
Can spiders crawl jQuery Fancy Box scripts
Hi Everyone - I'm not a technical person at all. I have some content that will be hidden until a user clicks "learn more" where upon it will be displayed via jQuery Fancy Box script. The content behind the learn more javascript is important and I need it to be crawled by search engine spiders. Does anyone know if there will be a problem with this script?
Technical SEO | | Santaur0 -
Robots.txt checker
Google seems to have discontinued their robots.txt checker. Is there another tool that I can use to check my text instead? Thanks!
Technical SEO | | theLotter0 -
Can anyone tell me why the bot has only picked up one page?
www.namebadgesinternational.co.nz After the 2nd week, I changed the robots.txt file to allow ALL robots on the website, but it still hasn't gone through any pages after another crawl Any help would be hugely appreciated.
Technical SEO | | designsecrets0 -
How can I optimise for Google Products?
Has anyone got experience of optimising Google Products (Google Base) feeds? I've noticed that, although my site doesn't often appear on page one in the standard results, we occasionally appear right at the top because of the "universal" shopping results. My question is: how can we make this happen more often? There seems to be a lot less competition (presumably because our competitors haven't worked out how to provide the feed to Google yet!), so I imagine it should be easier and quicker to reach the top this way than any other way. Thanks! Alex
Technical SEO | | reddogmusic0 -
Can I reduce link count by no following links?
Hi, A large number of my pages contain over 100 links. This is due to a large drop down navigation which is on every page. To reduce my link count could I just no follow these navigation links or would I have to remove the navigation completely?
Technical SEO | | moesian0 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0