Google Indexing Development Site Despite Robots.txt Block
-
Hi,
A development site that has been set-up has the following Robots.txt file:
User-agent: *
Disallow: /
In an attempt to block Google indexing the site, however this isn't the case and the development site has since been indexed.
Any clues why this is or what I could do to resolve it?
Thanks!
-
Hi so I'm assuming your on IIS (I'm no expert on ISS I think you will need to configure the web.config) and I'm just going to step back now and get my coat as I only have experience with Apache
-
Thanks for your help! Much appreciated
-
It's generally best to noindex/nofollow using the meta robots tag in the header. If it's not too much of a stretch for you, you can also password protect the test site. The over-so-lovely and charming Googles will still display results blocked by robots.txt - though it won't generally cache the content. If you would like, you can hookup the test site with Webmaster Tools and remove the URL(s) from the index.
-
Its my understanding that htaccess is PHP based and as we code in .net we don't have a htaccess file.
Do you know of this this happening before because its not something that I've heard of.
-
You would need to block access via htaccess rather than robots file as the robots.txt is only advisory
If you are using wordpress I use this simple plugin JF3 Maintenance Redirect
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How ask Google to de index scrapper sites?
While doing text Google searches for various keywords I have found two sites that have scrapped pages from my site which goes by an old URL of www.tpxcnex.com and a new URL of www.tpxonline.com www.folder.com is one of the sites and if you try to visit that site or any of the scrapped Google index listing, Chrome warns you not to. How can I ask Chrome to deindex www.folder.com or another scrapper site, or atleast deindex the URLs which have clearly scrapped my content?
Technical SEO | | DougHartline0 -
How can I get a photo album indexed by Google?
We have a lot of photos on our website. Unfortunately most of them don't seem to be indexed by Google. We run a party website. One of the things we do, is take pictures at events and put them on the site. An event page with a photo album, can have anywhere between 100 and 750 photo's. For each foto's there is a thumbnail on the page. The thumbnails are lazy loaded by showing a placeholder and loading the picture right before it comes onscreen. There is no pagination of infinite scrolling. Thumbnails don't have an alt text. Each thumbnail links to a picture page. This page only shows the base HTML structure (menu, etc), the image and a close button. The image has a src attribute with full size image, a srcset with several sizes for responsive design and an alt text. There is no real textual content on an image page. (Note that when a user clicks on the thumbnail, the large image is loaded using JavaScript and we mimic the page change. I think it doesn't matter, but am unsure.) I'd like that full size images should be indexed by Google and found with Google image search. Thumbnails should not be indexed (or ignored). Unfortunately most pictures aren't found or their thumbnail is shown. Moz is giving telling me that all the picture pages are duplicate content (19,521 issues), as they are all the same with the exception of the image. The page title isn't the same but similar for all images of an album. Example: On the "A day at the park" event page, we have 136 pictures. A site search on "a day at the park" foto, only reveals two photo's of the albums. 3QolbbI.png QTQVxqY.jpg mwEG90S.jpg
Technical SEO | | jasny0 -
Videos not indexed yet in google analytics
Hi I submitted a video stemap to google webmasters a month ago and it shows submitted videos in webmasters account but it is not showing that any video been indexed by google yet. I am not sure whats the issue because its taking long time or may be some error is there. Amit
Technical SEO | | expertvillagemedia0 -
Question about Robot.txt
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format. For example this URL: www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap) Another error is www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time. Can somebody help me on the format of Robot.txt. Please? thanks
Technical SEO | | paumer800 -
Site being indexed by Google before it has launched
We are currently coming towards the end of a site migration, and are at the final stage of testing redirects etc. However, to our horror we've just discovered Google has started indexing the new site. Any ideas on how this could have happened? I have most recently asked for robots.txt to exclude anything with a certain parameter in URL. Is there a chance this, wrongly implemented, could have caused this?
Technical SEO | | Sayers0 -
Will Google index a site with white text? Will it give it bad ratings?
Will google not rank a site bc pretty much all the copy is white (and the background is all white)? Here's the site in question: https://www.dropbox.com/s/6w24f6h5p0zaxhg/Garrison_PLAY.vs2-static.pdf https://www.dropbox.com/sh/fwudppvwy2khpau/t43NozpG3E/Garrison_PLAY.vs3.jpg thanks--if you need me to clarify more let me know TM Humphries LocalSearched.com
Technical SEO | | CloudGuys0 -
How to Find all the Pages Index by Google?
I'm planning on moving my online store, http://www.filtrationmontreal.com/ to a new platform, http://www.corecommerce.com/ To reduce the SEO impact, I want to redirect 301 all the pages index by Google to the new page I will create in the new platform. I will keep the same domaine name, but all the URL will be customize on the new platform for better SEO. Also, is there a way or tool to create CSV file from those page index. Can Webmaster tool help? You can read my question about this subject here, http://www.seomoz.org/q/impacts-on-moving-online-store-to-new-platform Thank you, BigBlaze
Technical SEO | | BigBlaze2050 -
Will Google Continue to Index the Page with NoIndex Tag Upon Google +1 Button Impression or Click?
The FAQs for Google +1 button suggests as follows: "+1 is a public action, so you should add the button only to public, crawlable pages on your site. Once you add the button, Google may crawl or recrawl the page, and store the page title and other content, in response to a +1 button impression or click." If my page has NoIndex tag, while at the same time inserted with Google +1 button on the page, will Google recognise the NoIndex Tag on the page (and will not index the page) despite the +1 button's impression or clicks send signals to Google spiders?
Technical SEO | | globalsources.com0