Blocking https from being crawled
-
I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue
www.example.com will be my domain
In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login
If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff?
The redirect part is what I am questioning.
-
Correct once /login gets redirected to https://www.example.com/login all nav links etc are https
What I ended up doing was blocking /login in robots and now doing canonicals on https as well as nofollow the /login link that is in the nav that redirects
Willl see what happens now.
-
So, the "/login" page gets redirected to https: and then every link on that page goes secure and Google crawls them all? I think blocking the "/login" page is a perfectly good way to go here - cut the crawl path, and you'll cut most of the problem.
You could request removal of "/login" in Google Webmaster Tools, too. Sometimes, I find that Robots.txt isn't great at removing pages that are already indexed. I would definitely add the canonical as well, if it's feasible. Cutting the path may not cut the pages that have already been indexed with https:.
Sorry, I'd actually reverse that:
(1) Add the canonicals, and let Google sweep up the duplicates
(2) A few weeks later, block the "/login" page
Sounds counter-intuitive, but if you block the crawl path to the https: pages first, then Google won't crawl the canonical tags on those versions. Use canonical to clean up the index, and then block the page to prevent future problems.
-
Gotcha. Yea I commented above how I was going to add a canonical as well as a noindex in the meta but was curious how it handled the redirect that was happening.
thanks for your help
-
Yea I was going to nofollow the link in the nav and add a meta tag but was curious how the robots file would handle this since the url is a redirect.
Thanks for your input
-
The pages that are being crawled under https, are the same pages available under http as well ? If yes, can you just add a canonical tag on these pages to go to the http version. That should fix it. And if your login page is the entry point, your fix will help as well. But then as Rebekah said, what if somebody is linking to your https page. I would suggest you look into making a canonical tag on these pages to http if that makes sense and is doable.
-
You can disallow the https portion in robots.txt, but remember robots.txt isn't always a sure fire way of not getting an area of your site crawled. If you have other important content to crawl from the secured page, be careful you are not blocking robots from there.
If this is linked to other places on the web, and the link doesn't include no-follow, search engines may still crawl the page. Can you change the link in your navigation to no-follow as well? I would also add a meta noindex tag to the page itself, and a canonical tag to the https version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Migrating Http Site to Https Version
Hello, This coming weekend we will be changing our http sites to https versions. I have a very quick question regarding Google Search Console. Because the migration is happening over a weekend, we want to get as much as possible setup beforehand. Is there any risk to adding the new properties to the search console without the sites being live yet? I want to deliver the Search Console verify files to our IT team in advance for them to add to the site, and then once I get the okay that the migration went successfully, I would go into the Search Console and click on the Verify button to get the sites verified and of course, then fetch as Google to help speed up indexing a bit and ensure there are no errors. Any insight on this would be greatly appreciated! Amiee
Technical SEO | | Amiee0 -
GWT crawl errors: How big a ranking issue?
For family reasons (child to look after) I can't keep a close eye on my SEO and SERPs. But from top 10 rankings in January for a dozen keywords I'm now not in top 80 results -- save one keyword for which I'm ~18-20.
Technical SEO | | Jeepster
Not a sitewide penalty: some of my internal pages are still ranking top 3 or so. In GWT, late March I received warning of a rise in server errors:
17 Server Errors/575 soft 404s/17 Not Founds/Access Denied 1/Others 4
I've also got 2 very old sitemaps (from two different ex-SEO firms) & I'm guessing about 75% of the links on there no longer exist. Q: Could all this be behind my calamitous SERPS drop? Or should I be devoting my -- limited -- time to improving my links?0 -
Blocked by robots
my client GWT has a number of notices for "blocked by meta-robots" - these are all either blog posts/categories/or tags his former seo told him this: "We've activated following settings: Use noindex for Categories Use noindex for Archives Use noindex for Tag Archives to reduce keyword stuffing & duplicate post tags
Technical SEO | | Ezpro9
Disabling all 3 noindex settings above may remove google blocks but also will send too many similar tags, post archives/category. " is this guy correct? what would be the problem with indexing these? am i correct in thinking they should be indexed? thanks0 -
Googlebot Crawl Rate causing site slowdown
I am hearing from my IT department that Googlebot is causing as massive slowdown/crash our site. We get 3.5 to 4 million pageviews a month and add 70-100 new articles on the website each day. We provide daily stock research and marke analysis, so its all high quality relevant content. Here are the crawl stats from WMT: http://imgur.com/dyIbf I have not worked with a lot of high volume high traffic sites before, but these crawl stats do not seem to be out of line. My team is getting pressure from the sysadmins to slow down the crawl rate, or block some or all of the site from GoogleBot. Do these crawl stats seem in line with sites? Would slowing down crawl rates have a big effect on rankings? Thanks
Technical SEO | | SuperMikeLewis0 -
Summarize your question.Crawl Diagnostics Summary
Hi, Crawl Diagnostics Summary pointed on some mistakes I've done, I fixed them, but Crawl Diagnostics Summary still shows same errors, how often does ithe data refreshes?
Technical SEO | | AndreyStotsky0 -
3 pages crawled?
For some reason, my account says it only crawled 3 pages this week, where its usually about 3K. This is my robots which shouldnt affect http://www.theprinterdepo.com/robots.txt and this is my site http://www.theprinterdepo.com any idea?
Technical SEO | | levalencia10 -
How far into a page will a spider crawl to look for text?
How far into a page will a spider crawl to look for text? I've heard a spider will only crawl the first 3kb, but can't find an authoritative source for that information.
Technical SEO | | crvw0 -
Crawl report showing only 1 crawled page
Hi, I´m really new to this and have just setup some Campaigns. I have setup a Campaign for the root domain: portaldeldiablo.com.uy which returned only 2 crawled pages.. As this page had a 301 redirect from the non-www to the www version, I deleted this Campaign and setup a new one for www.portaldeldiablo.com.uy which returned only 1 crawled page.. I really don´t know why is my website not being crawled..Thanks in advance for your help.
Technical SEO | | ceci27100