Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt on http vs. https
-
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https.
I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt?
Strangely, I cannot find a single ressource about this...
-
Glad to be of help. Check out this Google link to confirm you picked up the 180 day crawl
https://support.google.com/webmasters/answer/83106?hl=en
Second URLs helpful as well.
http://blog.raventools.com/moving-site-from-http-to-ssl/
all the best,
tom
-
Good point with the backlinks! Currently, both robots.txt files are open and google does not seem to have canonicalization problems so far. So it makes sense to leave it this way anyways... Thanks Thomas!
-
"Now that https is the canonical version, should I block the http-Version with robots.txt?"
Absolutely not GWT will handel all of it think about backlinks both https:// & http:// urls you will not want to lose the flow of link juice that you would cut off
Remake robost.txt with
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
But use https:// for the xml sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a limit to how many URLs you can put in a robots.txt file?
We have a site that has way too many urls caused by our crawlable faceted navigation. We are trying to purge 90% of our urls from the indexes. We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags. Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file? Could this cause any issues for us? Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.
Technical SEO | | kcb81780 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
Div tags vs. Tables
Is there any reason NOT to code in tables (other than it being outdated) for SEO reasons?
Technical SEO | | EileenCleary0 -
Subdomain vs Main Domain Penalties
We have a client who's main root.com domain is currently penalized by Google, but the subdomain.root.com is appearing very well. We're stumped - any ideas why?
Technical SEO | | Prospector-Plastics0 -
How do you disallow HTTPS?
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/). If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't... Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor. It's really just 1 page that needs to be disallowed.. Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
Technical SEO | | WebsiteConsultants0 -
Sitefinity vs Wordpress
We're looking for a new CMS and out development company suggested Sitefinity. I've had great success with Wordpress. Is either system better. I love worpdress but have had no experience with Sitefinity. Thanks!
Technical SEO | | StandUpCubicles0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050