WordPress - How to stop both http:// and https:// pages being indexed?
-
Just published a static page 2 days ago on WordPress site but noticed that Google has indexed both http:// and https:// url's. Usually I only get http:// indexed though.
Could anyone please explain why this may have happened and how I can fix? Thanks!
-
Just one adjustment to this - although I think David's right that the canonical tag can be a good solution. Although Google can index https: fine, the issue is whether you're creating duplicates. If you have duplicates, then it's possible that the https: version could be the one you want as canonical. In this case, it doesn't sound like it, but I just wanted to point that out.
Of course, long-term, you should sort out why these are being created. A desktop crawler like Xenu or Screaming Frog may be the best bet, but I'd hit the WordPress forums, too. Odds are it's a common issue. Typically, it happens when some deeper page (like a shopping cart) on a site is secure, and then the links are all relative ("/about.php", for example). Then, those links get crawled as both secure and non-secure.
Unfortunately, I'm not a WordPress expert, so I can only speak in generalities.
-
Thanks David, I feel like going out to buy some Swedish Fish for some reason now.
-
I actually just did a wealth of research on this topic a few days ago. Without going into the nitty gritty details, if the https is site-wide Google recommends a Rel="canonical" attribute (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394) pointing to the non-secure http version. Google claims it can index https fine, but Matt Cutts said he would "lean towards pointing the canonical to the http version." Also, on the Rel="canonical" page Google says:
If you publish content on both http://www.example.com/product.php?item=swedish-fish and https://www.example.com/product.php?item=swedish-fish, you can specify the canonical version of the page. Create the element:
Add this link to the section of https://www.example.com/product.php?item=swedish-fish.
Make sure the canonical is on every page of your site.
Not sure why this may have happened, but it is creating duplicate content, which is why the canonical is necessary.
Hope that helps!
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden Indexation of "Index of /wp-content/uploads/"
Hi all, I have suddenly noticed a massive jump in indexed pages. After performing a "site:" search, it was revealed that the sudden jump was due to the indexation of many pages beginning with the serp title "Index of /wp-content/uploads/" for many uploaded pieces of content & plugins. This has appeared approximately one month after switching to https. I have also noticed a decline in Bing rankings. Does anyone know what is causing/how to fix this? To be clear, these pages are **not **normal /wp-content/uploads/ but rather "index of" pages, being included in Google. Thank you.
Technical SEO | | Tom3_150 -
Home Page Being Indexed / Referral URLs /
I have a few questions related to home page URLs being indexed, canonicalization, and GA reporting... 1. I can view the home page by typing in domain.com , domain.com/ and domain.com/index.htm There are no redirects and it's canonicalized to point to domain.com/index.htm -- how important is it to have redirects? I don't want unnecessary redirects or canonical tags, but I noticed the trailing slash can sometimes be typed in manually on other pages, sometimes not. 2. When I do a site search (site:domain.com), sometimes the HP shows up as "domain.com/", never "domain.com/index.htm" or "domain.com", and sometimes the HP doesn't show up period. This seems to change several times a day, sometimes within 15 minutes. I have no idea what is causing it and I don't know if it has anything to do with #1. In a perfect world, I would ask for the /index.htm to be dropped and redirected to .com/, and the canonical to point to .com/ 3. I've noticed in GA I see / , /index.htm, and a weird Google referral URL (/index.htm?referrer=https://www.google.com/) all showing up as top pages. I think the / and /index.htm is because I haven't setup a default URL in GA, but I'm not sure what would cause the referrer. I tracked back when the referrer URL started to show up in the top pages, and it was right around the time they moved over to https://, so I'm not sure what the best option is to remove that. I know this is a lot - I appreciate any insight anyone can provide.
Technical SEO | | DigMS0 -
Drupal, http/https, canonicals and Google Search Console
I’m fairly new in an in-house role and am currently rooting around our Drupal website to improve it as a whole. Right now on my radar is our use of http / https, canonicals, and our use of Google Search Console. Initial issues noticed: We serve http and https versions of all our pages Our canonical tags just refer back to the URL it sits on (apparently a default Drupal thing, which is not much use) We don’t actually have https properties added in Search Console/GA I’ve spoken with our IT agency who migrated our old site to the current site, who have recommended forcing all pages to https and setting canonicals to all https pages, which is fine in theory, but I don’t think it’s as simple as this, right? An old Moz post I found talked about running into issues with images/CSS/javascript referencing http – is there anything else to consider, especially from an SEO perspective? I’m assuming that the appropriate certificates are in place, as the secure version of the site works perfectly well. And on the last point – am I safe to assume we have just never tracked any traffic for the secure version of the site? 😞 Thanks John
Technical SEO | | joberts0 -
How do I redirect the Author archive page in Wordpress?
If you do a search for my name on Google, the first result is the author archive page of my Wordpress blog. I would like to redirect the author page to my "about me" page but cannot add a 301 as the author page is created dynamically in Wordpress. Anyone know how I can do this?
Technical SEO | | richdan0 -
23,000 pages indexed, I think bad
Thank you Thank you Moz People!! I have a successful vacation rental company that has terrible seo but getting better. When I first ran Moz crawler and page grader, I had 35,000 errors and all f's.... tons of problem with duplicate page content and titles because not being consistent with page names... mainly capitalization and also rel canonical errors... with that said, I have now maybe 2 or 3 errors from time to time, but I fix every other day. Problem Maybe My site map shows in Google Webmaster submitted 1155
Technical SEO | | nickcargill
1541 indexed But google crawl shows 23,000 pages probably because of duplicate errors or possibly database driven url parameters... How bad is this and how do I get this to be accurate, I have seen google remove tool but I do not think this is right? 2) I have hired a full time content writer and I hope this works My site in google was just domain.com but I had put a 301 in to www.domain.com becauses www. had a page authority where the domain.com did not. But in webmasters I had domain.com just listed. So I changed that to www.domain.com (as preferred domain name) and ask for the first time to crawl. www.domain.com . Anybody see any problems with this? THank you MOZ people, Nick0 -
https v http is there any difference in rankings what is the best for a online chemist store?
Have a client that has a https site do you think its better than http for this kind of site and is there any studies done regarding any difference in rankings?
Technical SEO | | ReSEOlve0 -
Duplicate website with http & https
I have a website that only in a specific state in the USA we had to add a certificate for it to appear with https. my question is how to prevent from the website to be penalized on duplicate content with the http version on that specific state. please advise. thanks!
Technical SEO | | taly0 -
Differing numbers of pages indexed with and without the trailing slash
I noticed today that a site: query in Google (UK) for a certain domain I'm looking at returns different numbers depending on whether or not the trailing slash is added at the end. With the trailing slash the numbers are significantly different. This is a domain with a few duplicate content issues. It seems very rare but I've managed to replicate it for a couple of other well known domains, so this is the phenomenon I'm referring to: site:travelsupermarket.com - 16'300 results
Technical SEO | | ianmcintosh
site:travelsupermarket.com/ - 45'500 results site:guardian.co.uk - 120'000'000 results
site:guardian.co.uk/ - 121'000'000 results For the particular domain I'm looking at the numbers are 19'000 without the trailing slash and 800'000 with it! As mentioned, there are a few duplicate content issues at the moment that I'm trying to tidy up, but how should I interpret this? Has anyone seen this before and can advise what it could indicate? Thanks in advance for any answers.0