Blocking https from being crawled
-
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue
www.example.com will be my domain
In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login
If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff?
The redirect part is what I am questioning.
-
Correct once /login gets redirected to https://www.example.com/login all nav links etc are https
What I ended up doing was blocking /login in robots and now doing canonicals on https as well as nofollow the /login link that is in the nav that redirects
Willl see what happens now.
-
So, the "/login" page gets redirected to https: and then every link on that page goes secure and Google crawls them all? I think blocking the "/login" page is a perfectly good way to go here - cut the crawl path, and you'll cut most of the problem.
You could request removal of "/login" in Google Webmaster Tools, too. Sometimes, I find that Robots.txt isn't great at removing pages that are already indexed. I would definitely add the canonical as well, if it's feasible. Cutting the path may not cut the pages that have already been indexed with https:.
Sorry, I'd actually reverse that:
(1) Add the canonicals, and let Google sweep up the duplicates
(2) A few weeks later, block the "/login" page
Sounds counter-intuitive, but if you block the crawl path to the https: pages first, then Google won't crawl the canonical tags on those versions. Use canonical to clean up the index, and then block the page to prevent future problems.
-
Gotcha. Yea I commented above how I was going to add a canonical as well as a noindex in the meta but was curious how it handled the redirect that was happening.
thanks for your help
-
Yea I was going to nofollow the link in the nav and add a meta tag but was curious how the robots file would handle this since the url is a redirect.
Thanks for your input
-
The pages that are being crawled under https, are the same pages available under http as well ? If yes, can you just add a canonical tag on these pages to go to the http version. That should fix it. And if your login page is the entry point, your fix will help as well. But then as Rebekah said, what if somebody is linking to your https page. I would suggest you look into making a canonical tag on these pages to http if that makes sense and is doable.
-
You can disallow the https portion in robots.txt, but remember robots.txt isn't always a sure fire way of not getting an area of your site crawled. If you have other important content to crawl from the secured page, be careful you are not blocking robots from there.
If this is linked to other places on the web, and the link doesn't include no-follow, search engines may still crawl the page. Can you change the link in your navigation to no-follow as well? I would also add a meta noindex tag to the page itself, and a canonical tag to the https version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking pages from Moz and Alexa robots
Hello, We want to block all pages in this directory from Moz and Alexa robots - /slabinventory/search/ Here is an example page - https://www.msisurfaces.com/slabinventory/search/granite/giallo-fiesta/los-angeles-slabs/msi/ Let me know if this is a valid disallow for what I'm trying to. User-agent: ia_archiver
Technical SEO | | Pushm
Disallow: /slabinventory/search/* User-agent: rogerbot
Disallow: /slabinventory/search/* Thanks.0 -
Schema redirects for https migration
Hi, we are migrating our website to https. We have a lot of 301s in htaccess that we need to keep, changing the destiny to the https version of the site. At the same time, we need to make new 301 redirects from the http url´s to https url´s
Technical SEO | | unirmk
Our question is Could we combine this redirects in htaccess with a Schema redirect with 301 code? (Is it the same to use schema redirecs as using redirects in htaccess?) This would be the situation: Htaccess redirects: A http url ->301-> B http url -> (we change this in htaccess and use:)-> A http url ->301->B https url Schema redirect: B http url ->301-> B https url Thanks!0 -
Will Switching to HTTPS Lower My Domain Authority?
Hi All, I had a quick look online but couldn't find any information regarding this so thought I would ask. Please point me in the right direction if it has been asked before of if there are any useful articles online. We are currently in the process of switching one of our clients old sites from http to https, we have done all of the steps except from making the https version the main domain, or 301ing the http version to the https version. If we were to do this would we expect to see a drop in domain authority? a drop in keyword rankings? or is there anything else we should be worried about? Thanks Mozzers
Technical SEO | | O2C0 -
Bingbot appears to be crawling a large site extremely frequently?
Hi All! What constitutes a normal crawl rate for daily bingbot server requests for large sites? Are any of you noticing spikes in Bingbot crawl activity? I did find a "mildly" useful thread at Black Hat World containing this quote: "The reason BingBot seems to be terrorizing your site is because of your site's architecture; it has to be misaligned. If you are like most people, you paid no attention to setting up your website to avoid this glitch. In the article referenced by Oxonbeef, the author's issue was that he was engaging in dynamic linking, which pretty much put the BingBot in a constant loop. You may have the same type or similar issue particularly if you set up a WP blog without setting the parameters for noindex from the get go." However, my gut instinct says this isn't it and that it's more likely that someone or something is spoofing bingbot. I'd love to hear what you guys think! Dana
Technical SEO | | danatanseo1 -
Absurdly High Crawl Stats
Over the past month and a half, our crawl stats have been rising violently. A few weeks ago, our crawl stats rose, such that the pages crawled per day worked out to the entire site being crawled 6 times a day, with a corresponding rise in KB downloaded per day. Last week, the crawl rate jumped again, such that the site is being crawled roughly 30x a day. I'm not seeing any chatter at there about an algorithm change, and I've checked and double-checked the site for signs of duplicate content, changes in our backlink profile, or anything else. We haven't seen appreciable changes in our search volume, either impressions or clicks. Any ideas what could be going on?
Technical SEO | | Tyler-Brown0 -
What effect does HTTPS have on SEO for a public site?
I have a client who I've been working with for 4 months but getting NO TRACTION at all on their SERPS. This is unusual for me! The only difference to their site from other clients is that the whole site is https so I'm wondering if that's making a big difference. The site is: https://www.cnc-ltd.co.uk Any help of hints would be great thanks in advance Steve
Technical SEO | | stevecounsell0 -
Https Duplicate Content
My previous host was using shared SSL, and my site was also working with https which I didn’t notice previously. Now I am moved to a new server, where I don’t have any SSL and my websites are not working with https version. Problem is that I have found Google have indexed one of my blog http://www.codefear.com with https version too. My blog traffic is continuously dropping I think due to these duplicate content. Now there are two results one with http version and another with https version. I searched over the internet and found 3 possible solutions. 1 No-Index https version
Technical SEO | | RaviAhuja
2 Use rel=canonical
3 Redirect https versions with 301 redirection Now I don’t know which solution is best for me as now https version is not working. One more thing I don’t know how to implement any of the solution. My blog is running on WordPress. Please help me to overcome from this problem, and after solving this duplicate issue, do I need Reconsideration request to Google. Thank you0 -
Https Version of Homepage in SERPS
The https version of our homepage appears in Google's SERPs. We have rel canonical on the page pointing to the http version. We have a redirect in our htaccess that sends https to http. I thought this was just a fluke and it would be fixed by the next crawl, but it's been like this for a few weeks now. Not only that, but we're losing rank a bit and I'm afraid there's a correlation. Has this ever happened to anyone?
Technical SEO | | UnderRugSwept0