Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Google Indexing Of Pages As HTTPS vs HTTP
-
We recently updated our site to be mobile optimized. As part of the update, we had also planned on adding SSL security to the site. However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet. So, those iframes weren't displaying the content.
As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for.
However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors. The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https. I have fixed the htaccess file to no longer have https.
My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?
-
That's not going to solve your problem, vikasnwu. Your immediate issue is that you have URLs in the index that are HTTPS and will cause searchers who click on them not to reach your site due to the security error warnings. The only way to fix that quickly is to get the SSL certificate and redirect to HTTP in place.
You've sent the search engines a number of very conflicting signals. Waiting while they try to work out what URLs they're supposed to use and then waiting while they reindex them is likely to cause significant traffic issues and ongoing ranking harm before the SEs figure it out for themselves. The whole point of what I recommended is it doesn't depend on the SEs figuring anything out - you will have provided directives that force them to do what you need.
Paul
-
Remember you can force indexing using Google Search Console
-
Nice answer!
But you forgot to mention:
- Updating the sitemap files with the good URLs
- Upload them to Google Search Console
- You can even force the indexing at Google Search Console
Thanks,
Roberto
-
Paul,
I just provided the solution to de-index the https version. I understood that what's wanted, as they need their client to fix their end.And of course that there is no way to noindex by protocol. I do agree what you are saying.
Thanks a lot for explaining further and prividing other ways to help solvinf the issue, im inspired by used like you to help others and make a great community.
GR.
-
i'm first going to see what happens if I just upload a sitemap with http URLs since there wasn't a sitemap in webmaster tools from before. Will give you the update then.
-
Great! I'd really like to hear how it goes when you get the switch back in.
P.
-
Paul that does make sense - i'll add the SSL certificate back, and then redirect from https to http via the htaccess file.
-
You can't noindex a URL by protocol, Gaston - adding no-index would eliminate the page from being returned as a search result regardless of whether HTTP or HTTPS, essentially making those important pages invisible and wasting whatever link equity they may have. (You also can't block in robots.txt by protocol either, in my experience.)
-
There's a very simple solution to this issue - and no, you absolutely do NOT want to artificially force removal of those HTTPS pages from the index.
You need to make sure the SSL certificate is still in place, then re-add the 301-redirect in the site's htaccess file, but this time redirecting all HTTPS URLs back their HTTP equivalents.
You don't want to forcibly "remove" those URLs from the SERPs, because they are what Google now understands to be the correct pages. If you remove them, you'll have to wait however long it takes for Google and other search engines to completely re-understand the conflicting signals you've sent them about your site. And traffic will inevitably suffer in that process. Instead, you need to provide standard directives that the search engines don't have to interpret and can't ignore. Once the search engines have seen the new redirects for long enough, they'll start reverting the SERP listings back to the HTTP URLs naturally.
The key here is the SSL cert must stay in place. As it stands now, a visitor clicking a page in the search engine is trying to make an HTTPS connection to your site. If there is no certificate in place, they will get the harmful security warning. BUT! You can't just put in a 301-redirect in that case. The reason for this is that the initial connection from the SERP is coming in over the "secure channel". That connection must be negotiated securely first, before the redirect can even be read. If that first connection isn't secure, the browser will return the security warning without ever trying to read the redirect.
Having the SSL cert in place even though you're not running all pages under HTTPS means that first connection can still be made securely, then the redirect can be read back to the HTTP URL, and the visitor will get to the page they expect in a seamless manner. And search engines will be able to understand and apply authority without misunderstandings/confusion.
Hope that all makes sense?
Paul
-
Noup, Robots.txt works on a website level. This means that there has to be a file for the http and another for the https website.
And, there is no need for waiting until the whole site is indexed.Just to clarify, robots.txt itself does not remove pages already indexed. It just blocks bots from crawling a website and/or specific pages with in it.
-
GR - thanks for the response.
Given our site is just 65 pages, would it make sense to just put all of the site's "https" URLs in the robots.txt file as "noindex" now rather than waiting for all the pages to get indexed as "https" and then remove them?
And then upload a sitemap to webmaster tools with the URLS as "http://"?
VW
-
Hello vikasnwu,
As what you are looking for is to remove from index the pages, follow this steps:
- Allow the whole website to be crawable in the robots.txt
- add the robots meta tag with "noindex,follow" parametres
- wait several weeks, 6 to 8 weeks is a fairly good time. Or just do a followup on those pages
- when you got the results (all your desired pages to be de-indexed) re-block with robots.txt those pages
- DO NOT erase the meta robots tag.
Remember that http://site.com andhttps://site.com are different websites to google.
When your client's website is fixed with https, follow these steps:- Allow the whole website (or the parts wanted to be indexed) to be crawable in robots.txt
- Remove the robots meta tag
- Redirect 301 http to https
- Sit and wait.
Information about the redirection to HTTPS and a cool checklist:
The Big List of SEO Tips and Tricks for Using HTTPS on Your Website - Moz Blog
The HTTP to HTTPs Migration Checklist in Google Docs to Share, Copy & Download - AleydaSolis
Google SEO HTTPS Migration Checklist - SERoundtableHope I'm helpful.
Best luck.
GR.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed. The above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!? Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
Intermediate & Advanced SEO | | alphonseha0 -
Mass Removal Request from Google Index
Hi, I am trying to cleanse a news website. When this website was first made, the people that set it up copied all kinds of articles they had as a newspaper, including tests, internal communication, and drafts. This site has lots of junk, but this kind of junk was on the initial backup, aka before 1st-June-2012. So, removing all mixed content prior to that date, we can have pure articles starting June 1st, 2012! Therefore My dynamic sitemap now contains only articles with release date between 1st-June-2012 and now Any article that has release date prior to 1st-June-2012 returns a custom 404 page with "noindex" metatag, instead of the actual content of the article. The question is how I can remove from the google index all this junk as fast as possible that is not on the site anymore, but still appears in google results? I know that for individual URLs I need to request removal from this link
Intermediate & Advanced SEO | | ioannisa
https://www.google.com/webmasters/tools/removals The problem is doing this in bulk, as there are tens of thousands of URLs I want to remove. Should I put the articles back to the sitemap so the search engines crawl the sitemap and see all the 404? I believe this is very wrong. As far as I know this will cause problems because search engines will try to access non existent content that is declared as existent by the sitemap, and return errors on the webmasters tools. Should I submit a DELETED ITEMS SITEMAP using the <expires>tag? I think this is for custom search engines only, and not for the generic google search engine.
https://developers.google.com/custom-search/docs/indexing#on-demand-indexing</expires> The site unfortunatelly doesn't use any kind of "folder" hierarchy in its URLs, but instead the ugly GET params, and a kind of folder based pattern is impossible since all articles (removed junk and actual articles) are of the form:
http://www.example.com/docid=123456 So, how can I bulk remove from the google index all the junk... relatively fast?0 -
Can I tell Google to Ignore Parts of a Page?
Hi all, I was wondering if there was some sort of html trick that I could use to selectively tell a search engine to ignore texts on certain parts of a page. Thanks!
Intermediate & Advanced SEO | | Charles_Murdock
Charles0 -
What referrer is shown in http request when google crawler visit a page?
Is it legit to show different content to http request having different referrer? case a: user view one page of the site with plenty of information about one brand, and click on a link on that page to see a product detail page of that brand, here I don't want to repeat information about the brand itself case b: a user view directly the product detail page clicking on a SERP result, in this case I would like to show him few paragraph about the brand Is it bad? Anyone have experience in doing it? My main concern is google crawler. Should not be considered cloaking because I am not differentiating on user-agent bot-no-bot. But when google is crawling the site which referrer will use? I have no idea, does anyone know? When going from one link to another on the website, is google crawler leaving the referrer empty?
Intermediate & Advanced SEO | | max.favilli0 -
How long takes to a page show up in Google results after removing noindex from a page?
Hi folks, A client of mine created a new page and used meta robots noindex to not show the page while they are not ready to launch it. The problem is that somehow Google "crawled" the page and now, after removing the meta robots noindex, the page does not show up in the results. We've tried to crawl it using Fetch as Googlebot, and then submit it using the button that appears. We've included the page in sitemap.xml and also used the old Google submit new page URL https://www.google.com/webmasters/tools/submit-url Does anyone know how long will it take for Google to show the page AFTER removing meta robots noindex from the page? Any reliable references of the statement? I did not find any Google video/post about this. I know that in some days it will appear but I'd like to have a good reference for the future. Thanks.
Intermediate & Advanced SEO | | fabioricotta-840380 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0 -
Should you stop indexing of short lived pages?
In my site there will be a lot of pages that have a short life span of about a week as they are items on sale, should I nofollow the links meaning the site has a fwe hundred pages or allow indexing and have thousands but then have lots of links to pages that do not exist. I would of course if allowing indexing make sure the page links does not error and sends them to a similarly relevant page but which is best for me with the SEarch Engines? I would like to have the option of loads of links with pages of loads of content but not if it is detrimental Thanks
Intermediate & Advanced SEO | | barney30120