Why google stubbornly keeps indexing my http urls instead of the https ones?
-
I moved everything to https in November, but there are plenty of pages which are still indexed by google as http instead of https, and I am wondering why.
Example: http://www.gomme-auto.it/pneumatici/barum correctly redirect permanently to https://www.gomme-auto.it/pneumatici/barum
Nevertheless if you search for pneumatici barum: https://www.google.it/search?q=pneumatici+barum&oq=pneumatici+barum
The third organic result listed is still http.
Since we moved to https google crawler visited that page tens of time, last one two days ago. But doesn't seems to care to update the protocol in google index.
Anyone knows why?
My concern is when I use API like semrush and ahrefs I have to do it twice to try both http and https, for a total of around 65k urls I waste a lot of my quota.
-
Thanks again Dirk! At the end I used xenu link sleuth and I am happy with the result.
-
Hi Massimiliano,
In Screaming Frog there is the option: Bulk Export > All inlinks -> this generates the full list of all your internal links with both source & destination. In Excel you just have to put a filter on the "Destination" column - to show only the url's starting with "http://" and you get all the info you need. This will probably not solve the issues with the images. For this the next solution below could be used.
The list can be quite long depending on the total number of url's on your site. An alternative would be to add a custom filter under 'Configuration>Custom' - only including url's that contain "http://www.gomme-auto.it" or "http://blog.gomme-auto.it" in the source, but in your case this wouldn't be very helpful as all the pages on your site contain this url in the javascript part. If you change the url's in the Javascript to https this could be used to find references to non https images.
If you want to do it manually, it's also an option - in the view 'internal' of the crawler you put "http://" in the search field - this shows you the list of all the http:// url's. You have to select the http url's one by one. For each of the url's you can select "Inlinks" at the bottom of the screen & then you see all the url's linking to the http version. This works for both the html & the images.
Hope this helps,
rgds
Dirk
-
Forgot to mention, yes I checked the scheme of the serp results for those pages, is not just google not displaying it, it really still have the http version indexed.
-
Hi DC,
in screaming frog I can see the old http links. Usually are manually inserted links and images in wordpress posts, I am more than eager to edit them, my problem is how to find all the pages containing them, in screaming frog I can see the links, but I don't see the referrer, in which page they are contained. Is there a way to see that in screaming frog, or in some other crawling software?
-
Hi,
First of all, are you sure that Google didn't take the migration into account?I just did a quick check on other https sites. Example: when I look for "Google Analytics" in Google - the first 3 results are all pointing to Google Analytics site, however only for the 3rd result the https is shown, even when all three are in https. So it's possible it is just a display issue rather than a real issue.
Second, I did a quick crawl of your site and I noticed that on some pages you still have links to the http version of your site (they are redirected but it's better to keep your internal links clean - without redirections).
When I checked one of these pages (https://www.gomme-auto.it/pneumatici/pneumatici-cinesi) I noticed that this page has some issues as it seems to load elements which are not in https - possible there are others as well.
example: /pneumatici/pneumatici-cinesi:1395 Mixed Content: The page at 'https://www.gomme-auto.it/pneumatici/pneumatici-cinesi' was loaded over HTTPS, but requested an insecure image 'http://www.gomme-auto.it/i/pneumatici-cinesi.jpg'. This content should also be served over HTTPS.
The page you mention as example: the http version still receives two internal links from https://www.gomme-auto.it/blog/pneumatici-barum-gli-economici-che-assicurano-ottime-prestazioni and https://www.gomme-auto.it/pneumatici/continental with anchor texts 'pneumatici Barmum' & 'Barum'
Guess google reasons, if the owner of the site is not updating his internal links, I'm not going to update my index
On all your pages there is a part of the source which contains calls to the http version - it's inside a script so not sure if it's really important, but you could try to change it to https as well
My advice would be to crawl your site with Screaming Frog, and check where links exist to http versions and update these links to https (or use relative links - which is adviced by Google (https://support.google.com/webmasters/answer/6073543?hl=en see part 'common pitfalls')
rgds
Dirk
-
Mhhh, you are right theoretically could be the crawler budget. But if that is the case I should see that from the log, I should miss crawler visits on that page. Instead the crawler is happily visiting them.
By the way, how would you "force" the crawler to parse these pages?
I am going to check the sitemap now to remove that port number and try to split them. Thanks.
-
Darn it, you are right, we added a new site, not a change of address, sorry about that. Apparently my coffee is no longer effective!
-
As far as I know the change of address for http to https doesn't work, the protocol is not accepted when you do a change of address. And somewhere I read google itself saying when moving to https you should not do a change of address.
But they suggest to add a new site for the https version in GWT, which I did, and in fact the traffic slowly transitioned from the http site to the https site in GWT in the weeks following the move.
-
Are you sure? On https://support.google.com/webmasters/answer/6033080?hl=en&ref_topic=6033084 it says: "No need to submit a change of address if you are only moving your site from HTTP to HTTPS."
I dont think you are given the option to select the same domain for change of address in GWT.
-
Looks like you are doing everything right (set up 301 redirects, updated all links on the site, updated canonical urls) - just need to force the crawlers to parse those pages more. perhaps crawler is hitting its budget before it gets to recrawl all of your new urls?
You should also update your sitemap as it contains a bunch of links that look like: https://www.gomme-auto.it:443/pneumatici/estivi/pirelli/cinturato-p1-verde/145/65/15/h/72
I recommend creating several sitemaps for different sections of the site and seeing how they are indexed via GWT.
-
Did you do a change of address in Google Webmaster Tools? Http and Https are considered different URLs, and you will have to do a change of address if you switched to a full https site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How is Google Analytics defining page depth?
We run two websites and as part of our KPIs we are treating those who visit 3 or more pages of our website as a client served. As a digital team we are not convinced that this is the best metric to use as the improvements we are making to the sites mean that people are able to find the information quicker. Additionally other organisations including forums etc link to us so those users will get the info they need in one click. What I would like to know is how Google calculates page depth in GA. Are they treating the landing page as ground zero and then when users clicks a link they go one page deep? Or is the landing page, page depth 1 . Is page depth a measure of how many clicks a user needs to find their information?
Reporting & Analytics | | MATOnlineServices0 -
Referral issue in Google analytics
We have an eCommerce website that counts paypal as a referral source in Analytics. The site takes people to Paypal to make a payment and then back to the website to a Thank You page once that payment has been made. Due to this, Analytics sees this as a conversion that has come from Paypal, and also records it as a referral source, when we know this is not really the case. This also distorts the data in analytics and prohibits us from clearly seeing which channels sales have come from. Is there anyway in Analytics to include Paypal as a part of the website so that it does not record Paypal as a separate referral website?
Reporting & Analytics | | Gavo0 -
Google Webmaster Tools, about multiple entries for your website
Hi I have a doubt about Google Webmaster Tools or Central as it is call today. I remember that google recommended to have one profile of your website for each domain structure. Let me try to be more clear one profile for http://www.yoursite.com, an other for http://yoursite.com, an other for https://www.yoursite.com, etc. Then in each of them we uploaded our sitemaps and cross our fingers. Now from my experience always the complete url have better index status from the sitemap. Now my question is, today as Google requested all our websites run under https, so conserving the other profiles is affecting how google index our pages? shall we have to delete the old profiles or is better to maintain them? Thanks. Pablo
Reporting & Analytics | | FWC_SEO0 -
Large event site - how should I structure my URLs?
Hi guys, I'm working on a new website which is consolidating a number of existing event sites into one. The existing sites use a variety of URL structures: www.eventsite1.com/events/event-name www.eventsite2.com/festival-program/event-name www.eventsite3.com/event-name This inconsistency has led to issues with tracking category usage properly in analytics - for instance, with eventsite3.com, events fall within categories (www.eventsite3.com/category-name) but as soon as you drill into an event detail page (www.eventsite3.com/event-name) from the category page, the category is lost to analytics. This is compounded when one event lives within multiple categories, as I can't figure out which category is the most effective for a particular event. I've seen other event sites establish a canonical URL for a primary category, display it in the URL (i.e. www.eventsite4.com/primary-category/event-name) yet still let that event get hit via the secondary categories (www.eventsite4.com/secondary-category/event-name). This way, the categories get passed to analytics without any duplicate content issues (i.e. via the setting of canonicals) Basically, I want to make sure that whatever instruction I give to the devs for the new site re: URL structure is correct from an SEO perspective and analytics perspective. Do I even need to worry about having the category in the URL? Can someone please help me with this? Hope this makes sense Cheers
Reporting & Analytics | | cos20300 -
Alternative to Google Analytics
Hey Everyone, My company has just changed the order processing software we are using and it's causing some issues with Google Analytics conversion funnel tracking. Specifically, there is one point in the funnel where making certain selections (which about half the people do) causes the page to reload. Also, on the login/register page, if they miss a field, the software tells them missed the field, but loads a new page which has a different URL which is not a part of the funnel when a mistake like that isn't made. All of this is causing Google Analytics to report people as leaving the conversion funnel when they really haven't. About a third of the traffic is being shown as exiting the funnel with the exit URL being the exact same URL as the step they are supposedly exiting from (example: the visitor enters on page1, moves to page2, Google is showing that they exit on page2 and go to...page2. Does anyone have any suggestions of how to deal with this in Google Analytics? If not, do you have a recommendation of an alternative analytics program which can deal with the situations mentioned above? PS - Changing the way the checkout software works does not appear to be a viable option. Kurt Steinbrueck
Reporting & Analytics | | Kurt_Steinbrueck
OurChurch.Com1 -
Google.co.uk & Google.com difference of ranking
How can our website rank on page 3 in google.co.uk and yet it ranks on page 20 for the same keyword on google.com? This doesn't seem to affect our competitors though and its only our site that is being affected/penalized.
Reporting & Analytics | | dobersby0 -
Google Analytics - In-Page Analytics
I had a strange thought waking up this morning, and was curious to hear other people's opinions on it. In Google Analytics, under Content > In-Page Analytics, Google shows what links on your site pages get clicked and how many times plus other metrics. Do you think they use that data for ranking back links so-to-speak? What I mean is, say I had a back link to my site on example.com, and example.com had google analytics installed. Google can see through google analytics whether my link has been clicked on. Say that my link gets no clicks, do you think that Google would use that metric against my site deeming it "not popular" or "not a good resource", even if example.com was a very popular site? And it could work the other way. Say my link got thousands of clicks on example.com, do you think that Google might use that to promote my site? I couldn't find any other discussion on this anywhere, so am not sure if people have already thought about this.
Reporting & Analytics | | THB0 -
Regular Expressions in Google Analytics
I want to use the Google Analytics landing page reports to look at the bounce rate of top level pages excluding the homepage. So pages with urls: www.example.com/example Does anyone know a regular expression that will allow me to do this? Just to clarify I do not want to look at the bounce rate of the homepage or any pages deeper than www.example.com/example e.g: www.example.com/example/example www.example.com/example/example/example etc Thanks in advance
Reporting & Analytics | | CPLDistribution0