Getting Pages Requiring Login Indexed
-
Somehow certain newspapers' webpages show up in the index but require login. My client has a whole section of the site that requires a login (registration is free), and we'd love to get that content indexed. The developer offered to remove the login requirement for specific user agents (eg Googlebot, et al.). I am afraid this might get us penalized.
Any insight?
-
My guess: It's possible, but it would be an uphill battle. The reason being Google would likely see the page as a duplicate of all the other pages on your site with a login form. Not only does Google tend to drop duplicate pages from it's index (especially if it has a duplicate title tag - more leeway is giving the more unique elements you can place on a page) but now you face a situation where you have lots of duplicate or "thin" pages, which is juicy meat for a Panda-like penalty. Generally, you want to keep this pages out of the index, so it's a catch 22.
-
That makes sense. I am looking into whether any portion of our content can be made public in a way that would still comply with industry regulations. I am betting against it.
Does anyone know whether a page requiring login like this could feasibly rank with a strong backlink profile or a lot of quality social mentions?
-
The reason Google likes the "first click free" method is because they want the user to have a good result. They don't want users to click on a search result, then see something else on that page entirely, such as a login form.
So technically showing one set of pages to Google and another to users is considered cloaking. It's very likely that Google will figure out what's happening - either through manual review, human search quality raters, bounce rate, etc - and take appropriate actions against your site.
Of course, there's no guarantee this will happen, and you could argue that the cloaking wasn't done to deceive users, but the risk is high enough to warrant major consideration.
Are there any other options for displaying even part of the content, other than "first-click-free"? For example, can you display a snippet or few paragraphs of the information, then require login to see the rest? This at least would give Google something to index.
Unfortunately, most other methods for getting anything indexed without actually showing it to users would likely be considered blackhat.
Cyrus
-
Should have read the target:
"Subscription designation, snippets only: If First Click Free isn't a feasible option for you, we will display the "subscription" tag next to the publication name of all sources that greet our users with a subscription or registration form. This signals to our users that they may be required to register or subscribe on your site in order to access the article. This setting will only apply to Google News results.
If you prefer this option, please display a snippet of your article that is at least 80 words long and includes either an excerpt or a summary of the specific article. Since we do not permit "cloaking" -- the practice of showing Googlebot a full version of your article while showing users the subscription or registration version -- we will only crawl and display your content based on the article snippets you provide. If you currently cloak for Googlebot-news but not for Googlebot, you do not need to make any changes; Google News crawls with Googlebot and automatically uses the 80-word snippet.
NOTE: If you cloak for Googlebot, your site may be subject to Google Webmaster penalties. Please review Webmaster Guidelines to learn about best practices."
-
"In order to successfully crawl your site, Google needs to be able to crawl your content without filling out a registration form. The easiest way to do this is to configure your webservers not to serve the registration page to our crawlers (when the user-agent is "Googlebot") so that Googlebot can crawl these pages successfully. You can choose to allow Googlebot access to some restricted pages but not others. More information about technical requirements."
-http://support.google.com/webmasters/bin/answer.py?hl=en&answer=74536
Any harm in doing this while not implementing the rest of First Click Free??
-
What would you guys think about programming the login requirement behavior in such a way that only Google can't execute it--so Google wouldn't know that it is the only one getting through?
Not sure whether this is technically possible, but if it were, would it be theoretically likely to incur a penalty? Or is it foolish for other reasons?
-
Good idea--I'll have to determine precisely what I can and cannot show publicly and see if there isn't something I can do to leverage that.
I've heard about staying away from agent-specific content, but I wonder what the data are and whether there are any successful attempts?
-
First click free unfortunately won't work for us.
How might I go about determining how adult content sites handle this issue?
-
Have you considered allowing only a certain proportion of each page to show to any visitors including search engines. This way your pages will have some specific content that can be indexed and help you rank in the SERPs.
I have seen it done where publications behind a pay wall only allow the first paragraph or two to show - just enough to get them ranked appropriately but not enough to stop user wanting to register to access the full articles when they find them either through the SERPs, other sites or directly.
However for this to work it all depends on what the regualtions you mention require - would a proportion of the content being shown to all be ok??
I would definitely stay away from serving up different content to different users if I were you as this is likely to end up causing you trouble in the search engines..
-
I believe newspapers use a feature called "first click free" that enables this to work. I don't know if that will work with your industry regulations or not, however. You may also want to see how sites that deal with adult content, such as liquor sites, have a restriction for viewing let allow indexing.
-
Understood. The login requirement is necessary for compliance with industry regulations. My questions is whether I will be penalized for serving agent-specific content and/or whether there is a better way to get these pages in the index.
-
Search engines aren't good at completing online forms (such as a login), and thus any content contained behind them may remain hidden, so the developers option sounds like a good solution.
You may want to read:
http://www.seomoz.org/beginners-guide-to-seo/why-search-engine-marketing-is-necessary
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trying to get Google to stop indexing an old site!
Howdy, I have a small dilemma. We built a new site for a client, but the old site is still ranking/indexed and we can't seem to get rid of it. We setup a 301 from the old site to the new one, as we have done many times before, but even though the old site is no longer live and the hosting package has been cancelled, the old site is still indexed. (The new site is at a completely different host.) We never had access to the old site, so we weren't able to request URL removal through GSC. Any guidance on how to get rid of the old site would be very appreciated. BTW, it's been about 60 days since we took these steps. Thanks, Kirk
Intermediate & Advanced SEO | | kbates0 -
Google slow to index pages
Hi We've recently had a product launch for one of our clients. Historically speaking Google has been quick to respond, i.e when the page for the product goes live it's indexed and performing for branded terms within 10 minutes (without 'Fetch and Render'). This time however, we found that it took Google over an hour to index the pages. we found initially that press coverage ranked until we were indexed. Nothing major had changed in terms of the page structure, content, internal linking etc; these were brand new pages, with new product content. Has anyone ever experienced Google having an 'off' day or being uncharacteristically slow with indexing? We do have a few ideas what could have caused this, but we were interested to see if anyone else had experienced this sort of change in Google's behaviour, either recently or previously? Thanks.
Intermediate & Advanced SEO | | punchseo0 -
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
If I only Link to Page via Sitemap, can it still get indexed?
Hi there! I am creating a ton of content for specific geographies. Is it possible for these pages to get indexed if I only put them in my sitemap and don't link to them through my actual site (though the pages will be live). Thanks!
Intermediate & Advanced SEO | | Travis-W
Travis0 -
Huge google index with un-relevant pages
Hi, i run a site about sport matches, every match has a page and the pages are generated automatically from the DB. pages are not duplicated, but over time some look a little bit similar. after a match finishes it has no internal links or sitemap entry, but it's reachable by direct URL and continues to be on google index. so over time we have more than 100,000 indexed pages. since past matches have no significance and they're not linked and a match can repeat and it may look like duplicate content....what you suggest us to do: when a match is finished - not linked, but appears on the index and SERP 301 redirect the match Page to the match Category which is a higher hierarchy and is always relevant? use rel=canonical to the match Category do nothing.... *301 redirect will shrink my index status, some say a high index status is good... *is it safe to 301 redirect 100,000 pages at once - wouldn't it look strange to google? *would canonical remove the past matches pages from the index? what do you think? Thanks, Assaf.
Intermediate & Advanced SEO | | stassaf0 -
Why are so many pages indexed?
We recently launched a new website and it doesn't consist of that many pages. When you do a "site:" search on Google, it shows 1,950 results. Obviously we don't want this to be happening. I have a feeling it's effecting our rankings. Is this just a straight up robots.txt problem? We addressed that a while ago and the number of results aren't going down. It's very possible that we still have it implemented incorrectly. What are we doing wrong and how do we start getting pages "un-indexed"?
Intermediate & Advanced SEO | | MichaelWeisbaum0 -
Is 404'ing a page enough to remove it from Google's index?
We set some pages to 404 status about 7 months ago, but they are still showing in Google's index (as 404's). Is there anything else I need to do to remove these?
Intermediate & Advanced SEO | | nicole.healthline0