Robots.txt on http vs. https
-
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https.
I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt?
Strangely, I cannot find a single ressource about this...
-
Glad to be of help. Check out this Google link to confirm you picked up the 180 day crawl
https://support.google.com/webmasters/answer/83106?hl=en
Second URLs helpful as well.
http://blog.raventools.com/moving-site-from-http-to-ssl/
all the best,
tom
-
Good point with the backlinks! Currently, both robots.txt files are open and google does not seem to have canonicalization problems so far. So it makes sense to leave it this way anyways... Thanks Thomas!
-
"Now that https is the canonical version, should I block the http-Version with robots.txt?"
Absolutely not GWT will handel all of it think about backlinks both https:// & http:// urls you will not want to lose the flow of link juice that you would cut off
Remake robost.txt with
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
But use https:// for the xml sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking subdomains with Robots.txt file
We noticed that Google is indexing our pre-production site ibweb.prod.interstatebatteries.com in addition to indexing our main site interstatebatteries.com. Can you all help shed some light on the proper way to no-index our pre-prod site without impacting our live site?
Technical SEO | | paulwatley0 -
Do you still loose 15% of value of inbound links when you redirect your site from http to https (so all inbound links to http are being redirected to https version)?
I know when you redesign your on website, you loose about 15% internally due to the 301 redirects (see moz article: https://moz.com/blog/accidental-seo-tests-how-301-redirects-are-likely-impacting-your-brand), but I'm wondering if that also applies to value of inbound links when you redirect your http://www.sitename.com to https://www.sitename.com. I appreciate your help!
Technical SEO | | JBMediaGroup0 -
Responsive web design has a crawl error of redirecting to HTTP instead of HTTPS ? is this because of the new update of google that appreciates the HTTPs more?
We at yamsafer.me are using a Repsonsive web design! A crawl errors occured which redirects the hompage to an HTTP version instead of HTTPS? Any ideas on why this happened?
Technical SEO | | Yamsafer.com0 -
Robots.txt and Magento
HI, I am working on getting my robots.txt up and running and I'm having lots of problems with the robots.txt my developers generated. www.plasticplace.com/robots.txt I ran the robots.txt through a syntax checking tool (http://www.sxw.org.uk/computing/robots/check.html) This is what the tool came back with: http://www.dcs.ed.ac.uk/cgi/sxw/parserobots.pl?site=plasticplace.com There seems to be many errors on the file. Additionally, I looked at our robots.txt in the WMT and they said the crawl was postponed because the robots.txt is inaccessible. What does that mean? A few questions: 1. Is there a need for all the lines of code that have the “#” before it? I don’t think it’s necessary but correct me if I'm wrong. 2. Furthermore, why are we blocking so many things on our website? The robots can’t get past anything that requires a password to access anyhow but again correct me if I'm wrong. 3. Is there a reason Why can't it just look like this: User-agent: * Disallow: /onepagecheckout/ Disallow: /checkout/cart/ I do understand that Magento has certain folders that you don't want crawled, but is this necessary and why are there so many errors?
Technical SEO | | EcomLkwd0 -
Https vs http sitemap
I have a site that does a 301 redirect from http to https I currently have a sitemap auto submitted to google webmaster tools using the http pages. (because i didnt have https before) should I disable that sitemap for http and create one for the https only?
Technical SEO | | puremobile0 -
Https-pages still in the SERP's
Hi all, my problem is the following: our CMS (self-developed) produces https-versions of our "normal" web pages, which means duplicate content. Our it-department put the <noindex,nofollow>on the https pages, that was like 6 weeks ago.</noindex,nofollow> I check the number of indexed pages once a week and still see a lot of these https pages in the Google index. I know that I may hit different data center and that these numbers aren't 100% valid, but still... sometimes the number of indexed https even moves up. Any ideas/suggestions? Wait for a longer time? Or take the time and go to Webmaster Tools to kick them out of the index? Another question: for a nice query, one https page ranks No. 1. If I kick the page out of the index, do you think that the http page replaces the No. 1 position? Or will the ranking be lost? (sends some nice traffic :-))... thanx in advance 😉
Technical SEO | | accessKellyOCG0 -
HTTP Compression -- Any potential issues with doing this?
We are thinking about turning on the IIS-6 HTTP Compression to help with page load times. Has anyone had any issues with doing this, particularly from an SEO or site functionality standpoint? We just want to double check before we take this step and see if there are any potential pitfalls we may not be aware of. Everything we've read seems to indicate it can only yield positive results. Any thoughts, advice, comments would be appreciated. Thank-you, Matt & Keith
Technical SEO | | MWM37720 -
Robots.txt blocking site or not?
Here is the robots.txt from a client site. Am I reading this right --
Technical SEO | | 540SEO
that the robots.txt is saying to ignore the entire site, but the
#'s are saying to ignore the robots.txt command? See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file To ban all spiders from the entire site uncomment the next two lines: User-Agent: * Disallow: /0