Can I rely on just robots.txt
-
We have a test version of a clients web site on a separate server before it goes onto the live server.
Some code from the test site has some how managed to get Google to index the test site which isn't great!
Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
-
You can do the inbound link check right here using SEOMoz's Open Site Explorer tool to check for links to the dev site, whether it's in a subdomain, subfolder or a separate site.
Good luck!
Paul
-
thats a great help cheers
wheres the best place to do an inbound link check?
-
You're actually up against a bit of a sticky wicket here, SS. You do need the no-index, no-follow meta tags on each page as Irving mentions.
HOWEVER! If you also add a robots.txt directive not to index the site, the search crawlers will not crawl your pages and therefore will never see the noindex metatag to know to remove the incorrectly-indexed pages from their index.
My recommendation is for a belt & suspenders approach.
- implement the meta no-index, no-follow tags throughout the dev site, but do NOT immediately implement the robots.txt exclusion. Wait a day or two until the pages get recrawled and the bots discover the noindex metatags
- Use the Remove URL tools in both Google and Bing Webmaster Tools to request removal of all the dev pages you are aware have been indexed.
- Then add the exclusion directive to the robots.txt file to keep the crawlers out from then on (leaving the no-index, no-follow tags in place).
- check back in the SERPS periodically to check that no other dev pages have been indexed. IF they have, do another manual removal request.
Does that make sense?
Paul
P.S. As a last measure, run an inbound links check on the dev pages that got indexed to find out which external pages are linking to the dev pages. Get those inbound links removed ASAP so the search engines aren't getting any signals to index the dev site. Last option would be to simply password-protect the directory the dev site is in. A little less convenient, but guaranteed to keep the crawlers out.
-
cheers, i thought as much
-
You cannot rely on robots.txt alone, you need to add the meta noindex tag to the pages as well to ensure that they will not get indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I add my html sitemap to Robots?
I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo0 -
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
How can my homepage have 2 meta descriptions?
Hi all, When googling our company, I see our main page pop up with 2 different meta descriptions, depending on the search query. The situation
Technical SEO | | NHA_DistanceLearning
The search query 'nha' (on google.nl) returns the main page with a meta description that looks like a random grab from the code by Google itself, starting with 'Ik volg een cursus bij de NHA...' The search query 'nha.nl' (on google.nl) returns the main page with the proper meta description, starting with 'Aanbieder van thuisstudies met onder meer MBO-opleidingen...'. So yeah, I'd like to have the main page only appear with the proper meta description, the latter one. We did have a redirect issue (duplicate homepages) a few weeks ago and programming fixed it. Could this have something to do with a redirect? I'd love to hear your thoughts. Thanks!0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Can I have an http AND a https site on Google Webmaster tools
My website is https but the default property that was configured on Google WMT was http and wasn't showing me any information because of that. I added an https property for that, but my question is: do I need to delete the original HTTP or can I leave both websites?
Technical SEO | | Onboard.com0 -
Can I mark up breadcrumbs without showing them? (responsive design)
I am working on a site that has responsive design. We use faceted search for the desktop version but implemented a style of breadcrumbs for the mobile version as sidebars take up too much screen real estate. On the desktop design we are putting a display:none in front of the breadcrumbs. If we mark up those breadcrumbs and they are behind a display none, can we still get the rich snippets? Will Google see this is cloaking? In follow up, is there a way to markup breadcrumbs in the or somewhere else that is constant?
Technical SEO | | MarloSchneider0 -
Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?
I've got several URL's that I need to disallow in my robots.txt file. For example, I've got several documents that I don't want indexed and filters that are getting flagged as duplicate content. Rather than typing in thousands of URL's I was hoping that wildcards were still valid.
Technical SEO | | mkhGT0 -
Robots.txt file
How do i get Google to stop indexing my old pages and start indexing my new pages even months down the line? Do i need to install a Robots.txt file on each page?
Technical SEO | | gimes0