Google Indexing Of Pages As HTTPS vs HTTP
-
We recently updated our site to be mobile optimized. As part of the update, we had also planned on adding SSL security to the site. However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet. So, those iframes weren't displaying the content.
As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for.
However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors. The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https. I have fixed the htaccess file to no longer have https.
My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?
-
That's not going to solve your problem, vikasnwu. Your immediate issue is that you have URLs in the index that are HTTPS and will cause searchers who click on them not to reach your site due to the security error warnings. The only way to fix that quickly is to get the SSL certificate and redirect to HTTP in place.
You've sent the search engines a number of very conflicting signals. Waiting while they try to work out what URLs they're supposed to use and then waiting while they reindex them is likely to cause significant traffic issues and ongoing ranking harm before the SEs figure it out for themselves. The whole point of what I recommended is it doesn't depend on the SEs figuring anything out - you will have provided directives that force them to do what you need.
Paul
-
Remember you can force indexing using Google Search Console
-
Nice answer!
But you forgot to mention:
- Updating the sitemap files with the good URLs
- Upload them to Google Search Console
- You can even force the indexing at Google Search Console
Thanks,
Roberto
-
Paul,
I just provided the solution to de-index the https version. I understood that what's wanted, as they need their client to fix their end.And of course that there is no way to noindex by protocol. I do agree what you are saying.
Thanks a lot for explaining further and prividing other ways to help solvinf the issue, im inspired by used like you to help others and make a great community.
GR.
-
i'm first going to see what happens if I just upload a sitemap with http URLs since there wasn't a sitemap in webmaster tools from before. Will give you the update then.
-
Great! I'd really like to hear how it goes when you get the switch back in.
P.
-
Paul that does make sense - i'll add the SSL certificate back, and then redirect from https to http via the htaccess file.
-
You can't noindex a URL by protocol, Gaston - adding no-index would eliminate the page from being returned as a search result regardless of whether HTTP or HTTPS, essentially making those important pages invisible and wasting whatever link equity they may have. (You also can't block in robots.txt by protocol either, in my experience.)
-
There's a very simple solution to this issue - and no, you absolutely do NOT want to artificially force removal of those HTTPS pages from the index.
You need to make sure the SSL certificate is still in place, then re-add the 301-redirect in the site's htaccess file, but this time redirecting all HTTPS URLs back their HTTP equivalents.
You don't want to forcibly "remove" those URLs from the SERPs, because they are what Google now understands to be the correct pages. If you remove them, you'll have to wait however long it takes for Google and other search engines to completely re-understand the conflicting signals you've sent them about your site. And traffic will inevitably suffer in that process. Instead, you need to provide standard directives that the search engines don't have to interpret and can't ignore. Once the search engines have seen the new redirects for long enough, they'll start reverting the SERP listings back to the HTTP URLs naturally.
The key here is the SSL cert must stay in place. As it stands now, a visitor clicking a page in the search engine is trying to make an HTTPS connection to your site. If there is no certificate in place, they will get the harmful security warning. BUT! You can't just put in a 301-redirect in that case. The reason for this is that the initial connection from the SERP is coming in over the "secure channel". That connection must be negotiated securely first, before the redirect can even be read. If that first connection isn't secure, the browser will return the security warning without ever trying to read the redirect.
Having the SSL cert in place even though you're not running all pages under HTTPS means that first connection can still be made securely, then the redirect can be read back to the HTTP URL, and the visitor will get to the page they expect in a seamless manner. And search engines will be able to understand and apply authority without misunderstandings/confusion.
Hope that all makes sense?
Paul
-
Noup, Robots.txt works on a website level. This means that there has to be a file for the http and another for the https website.
And, there is no need for waiting until the whole site is indexed.Just to clarify, robots.txt itself does not remove pages already indexed. It just blocks bots from crawling a website and/or specific pages with in it.
-
GR - thanks for the response.
Given our site is just 65 pages, would it make sense to just put all of the site's "https" URLs in the robots.txt file as "noindex" now rather than waiting for all the pages to get indexed as "https" and then remove them?
And then upload a sitemap to webmaster tools with the URLS as "http://"?
VW
-
Hello vikasnwu,
As what you are looking for is to remove from index the pages, follow this steps:
- Allow the whole website to be crawable in the robots.txt
- add the robots meta tag with "noindex,follow" parametres
- wait several weeks, 6 to 8 weeks is a fairly good time. Or just do a followup on those pages
- when you got the results (all your desired pages to be de-indexed) re-block with robots.txt those pages
- DO NOT erase the meta robots tag.
Remember that http://site.com andhttps://site.com are different websites to google.
When your client's website is fixed with https, follow these steps:- Allow the whole website (or the parts wanted to be indexed) to be crawable in robots.txt
- Remove the robots meta tag
- Redirect 301 http to https
- Sit and wait.
Information about the redirection to HTTPS and a cool checklist:
The Big List of SEO Tips and Tricks for Using HTTPS on Your Website - Moz Blog
The HTTP to HTTPs Migration Checklist in Google Docs to Share, Copy & Download - AleydaSolis
Google SEO HTTPS Migration Checklist - SERoundtableHope I'm helpful.
Best luck.
GR.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Domain Authority Dropped and Indexed Pages Went Down on Google?
Hi there, We run an e-commerce site on Shopify. Our Domain Authority was 28 at the start of our campaign in May of this year. We also had 610 indexed pages on Google. We did some SEO work which included: Renaming Images for SEO Adding in alt tags Optimizing the meta title to "Product Name - Keyword - Brand Name" for products Optimizing meta descriptions Transition of Hubspot blog to Shopify (it was on a subdomain at Hubspot previously) Fixing some 404s Resubmitting site map after the changes Now it is almost at the 3-month mark and it looks like our Domain Authority has gone down 4 points to 24. The # of indexed pages has gone to down to 555. We made sure all our SEO updates weren't spammy or keyword-stuffed, but took a natural and helpful-sounding approach. We followed guidelines. So there shouldn't be any penalty right? I checked site traffic and it does not coincide with the drop. Our site traffic remains steady. I also looked at "site:" as well as conducted some test searches for the important pages (i.e. main pages, blog pages, and product pages) and they still come up on Google. So could it only be non-important pages being deindexed? My questions are: Why did both the Domain Authority and # of indexed pages go down? Is there any way to see which pages were deindexed? I checked Google Search Console, but couldn't find it. Thank you!
Intermediate & Advanced SEO | | kindalpaca70 -
We are redirecting http and non www versions of our website. Should all versions http (non www version and www version) and https (non www version) should just have 1 redirect to the https www version?
We are redirecting http and non www versions of our website. Should all versions http (non www version and www version) and https (non www version) should just have 1 redirect to the https www version? Thant way all forms of the website are pointing to one version?
Intermediate & Advanced SEO | | Caffeine_Marketing0 -
Old pages STILL indexed...
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when. 3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index. Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now! I'm half-debating this: User-agent: *
Intermediate & Advanced SEO | | LiamMcArthur
Disallow: /some-directory-for-all/* User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/ Sitemap: http://www.example.co.uk/sitemap.xml0 -
Redirect HTTP to HTTPS
Hello, Simple question - Should we be redirecting our HTTP pages to HTTPS? If yes, why, if not, why? Thanks!
Intermediate & Advanced SEO | | HB170 -
Google Is Indexing My Internal Search Results - What should i do?
Hello, We are using a CMS/E-Commerce platform which isn't really built with SEO in mind, this has led us to the following problem.... a large number of internal (product search) search result pages, which aren't "search engine friendly" or "user friendly", are being indexed by google and are driving traffic to the site, generating our client revenue. We want to remove these pages and stop them from being indexed, replacing them with static category pages - essentially moving the traffic from the search results to static pages. We feel this is necessary as our current situation is a short-term (accidental) win and later down the line as more pages become indexed we don't want to incur a penalty . We're hesitant to do a blanket de-indexation of all ?search results pages because we would lose revenue and traffic in the short term, while trying to improve the rankings of our optimised static pages. The idea is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages. Our main focus is to improve user experience and not have customers enter the site through unexpected pages. All thoughts or recommendations are welcome. Thanks
Intermediate & Advanced SEO | | iThinkMedia0 -
Why are some pages indexed but not cached by Google?
The question is simple but I don't understand the answer. I found a webpage that was linking to my personal site. The page was indexed in Google. However, there was no cache option and I received a 404 from Google when I tried using cache:www.thewebpage.com/link/. What exactly does this mean? Also, does it have any negative implication on the SEO value of the link that points to my personal website?
Intermediate & Advanced SEO | | mRELEVANCE0 -
Google Is Indexing The Wrong Page For My Keyword
For a long time (almost 3 mounth) google indexing the wrong page for my main keyword.
Intermediate & Advanced SEO | | Tiedemann_Anselm
The problem is that each time google indexed another page each time for a period of 4-7 days, Sometimes i see the home page, sometimes a category page and sometimes a product page.
It seems though Google has not yet decided what his favorite / better page for this keyword. This is the pages google index: (In most cases you can find the site on the second or third page) Main Page: http://bit.ly/19fOqDh Category Page: http://bit.ly/1ebpiRn Another Category: http://bit.ly/K3MZl4 Product Page: http://bit.ly/1c73B1s All links I get to the website are natural links, therefore in most cases the anchor we got is the website name. In addition I have many links I get from bloggers that asked to do a review on one of my products, I'm very careful about that and so I'm always checking the blogger and their website only if it is something good, I allowed it. also i never ask for a link back (must of the time i receive without asking), and as I said, most of their links are anchor with my website name. Here some example of links that i received from bloggers: http://bit.ly/1hF0pQb http://bit.ly/1a8ogT1 http://bit.ly/1bqqRr8 http://bit.ly/1c5QeC7 http://bit.ly/1gXgzXJ Please Can I get a recommendation what should you do?
Should I try to change the anchor of the link?
Do I need to not allow bloggers to make a review on my products? I'd love to hear what you recommend,
Thanks for the help0 -
Google is Really Slow to Index my New Website
(Sorry for my english!) A quick background: I had a website at thewebhostinghero.com which had been slapped left and right by Google (both Panda & Penguin). It also had a manual penalty for unnatural links which had been lifted in late april / early may this year. I also had another domain, webhostinghero.com, which was redirecting to thewebhostinghero.com. When I realized I would be better off starting a new website than trying to salvage thewebhostinghero.com, I removed the redirection from webhostinghero.com and started building a new website. I waited about 5 or 6 weeks before putting any content on webhostinghero.com so Google had time to notice that the domain wasn't redirecting anymore. So about a month ago, I launched http://www.webhostinghero.com with 100% new content but I left thewebhostinghero.com online because it still brings a little (necessary) income. There are no links between the websites except on one page (www.thewebhostinghero.com/speed/) which is set to "noindex,nofollow" and is disallowed to search engines in robots.txt. I made sure the web page was deindexed before adding a "nofollow" link from thewebhostinghero.com/speed => webhostinghero.com/speed Since the new website launch, I've been publishing new content (from 2 to 5 posts) daily. It's getting some traction from social networks but it gets barely any clicks from Google search. It seems to take at least a week before Google indexes new posts and not all posts are indexed. The cached copy of the homepage is 12 days old. In Google Webmaster Tools, it looks like Google isn't getting the latest sitemap version unless I resubmit it manually. It's always 4 or 5 days old. So is my website just too young or could it have some kind of penalty related to the old website? The domain has 4 or 5 really old spammy links from the previous domain owner which I couldn't get rid of but otherwise I don't think there's anything tragic.
Intermediate & Advanced SEO | | sbrault740