Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to allow googlebot past paywall
-
Does anyone know of any ways or ideas to allow Google/Bing etc. to index your content, but have it behind a paywall for users?
-
Thanks Mark,
I have been researching this idea from Google, but it is only for Google News and not Google Web Search.
Also, users would be able to jump the paywall by returning to Google News to search fro more links through to the site.
-
Google has a program called first click free - basically, you need to allow google bot, along with users, to view the first full article they land on. So if you have multiple page articles, you need to give them access to the entire article. After that though, the rest of the content can be behind a paywall.
You can read more about it here - http://support.google.com/webmasters/bin/answer.py?hl=en&answer=74536
And here are the technical guidelines for implementation - http://support.google.com/news/publisher/bin/answer.py?hl=en&answer=40543
Hope this helps,
Mark
-
Not possible. Google's not going to index something that is not accessible to everyone.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
GoogleBot still crawling HTTP/1.1 years after website moved to HTTP/2
Whole website moved to https://www. HTTP/2 version 3 years ago. When we review log files, it is clear that - for the home page - GoogleBot continues to only access via HTTP/1.1 protocol Robots file is correct (simply allowing all and referring to https://www. sitemap Sitemap is referencing https://www. pages including homepage Hosting provider has confirmed server is correctly configured to support HTTP/2 and provided evidence of accessing via HTTP/2 working 301 redirects set up for non-secure and non-www versions of website all to https://www. version Not using a CDN or proxy GSC reports home page as correctly indexed (with https://www. version canonicalised) but does still have the non-secure version of website as the referring page in the Discovery section. GSC also reports homepage as being crawled every day or so. Totally understand it can take time to update index, but we are at a complete loss to understand why GoogleBot continues to only go through HTTP/1.1 version not 2 Possibly related issue - and of course what is causing concern - is that new pages of site seem to index and perform well in SERP ... except home page. This never makes it to page 1 (other than for brand name) despite rating multiples higher in terms of content, speed etc than other pages which still get indexed in preference to home page. Any thoughts, further tests, ideas, direction or anything will be much appreciated!
Technical SEO | | AKCAC1 -
SEO + Structured Data for Metered Paywall
I have a site that will have 90% of the content behind a metered paywall. So all content is accessible in a metered way. All users who aren't logged in will have access to 3 articles (of any kind) in a 30 day period. If they try to access more in a 30 day period they will hit a paywall. I was reading this article here on how to handle structured data with Google for content behind a paywall: https://www.searchenginejournal.com/paywalls-seo-strategy/311359/However, the content is not ALWAYS behind a paywall, since it is metered. So if a new user comes to the site, they can see the article (regardless of what it is). Is there a different way to handle content that will be SOMETIMES behind a paywall bc of a metered strategy? Theoretically I want 100% of the content indexed and accessible in SERPs, it will just be accessible depending on the user's history (cookies) with the site. I hope that makes sense.
Technical SEO | | triveraseo0 -
How to allow bots to crawl all but WP-content
Hello, I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others. User-agent: *
Technical SEO | | Tom3_15
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/ User-agent: GoogleBot
Allow: / User-agent: GoogleBot-Mobile
Allow: / User-agent: GoogleBot-Image
Allow: / User-agent: Bingbot
Allow: / User-agent: Slurp
Allow: /0 -
Why Can't Googlebot Fetch Its Own Map on Our Site?
I created a custom map using google maps creator and I embedded it on our site. However, when I ran the fetch and render through Search Console, it said it was blocked by our robots.txt file. I read in the Search Console Help section that: 'For resources blocked by robots.txt files that you don't own, reach out to the resource site owners and ask them to unblock those resources to Googlebot." I did not setup our robtos.txt file. However, I can't imagine it would be setup to block google from crawling a map. i will look into that, but before I go messing with it (since I'm not familiar with it) does google automatically block their maps from their own googlebot? Has anyone encountered this before? Here is what the robot.txt file says in Search Console: User-agent: * Allow: /maps/api/js? Allow: /maps/api/js/DirectionsService.Route Allow: /maps/api/js/DistanceMatrixService.GetDistanceMatrix Allow: /maps/api/js/ElevationService.GetElevationForLine Allow: /maps/api/js/GeocodeService.Search Allow: /maps/api/js/KmlOverlayService.GetFeature Allow: /maps/api/js/KmlOverlayService.GetOverlays Allow: /maps/api/js/LayersService.GetFeature Disallow: / Any assistance would be greatly appreciated. Thanks, Ruben
Technical SEO | | KempRugeLawGroup1 -
Some URLs were not accessible to Googlebot due to an HTTP status error.
Hello I'm a seo newbie and some help from the community here would be greatly appreciated. I have submitted the sitemap of my website in google webmasters tools and now I got this warning: "When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted." How do I fix this? What should I do? Many thanks in advance.
Technical SEO | | GoldenRanking140 -
How does a search engine bot navigate past a .PDF link?
We have a large number of product pages that contain links to a .pdf of the technical specs for that product. These are all set up to open in a new window when the end user clicks. If these pages are being crawled, and a bot follows the link for the .pdf, is there any way for that bot to continue to crawl the site, or does it get stuck on that dangling page because it doesn't contain any links back to the site (it's a .pdf) and the "back" button doesn't work because the page opened in a new window? If this situation effectively stops the bot in its tracks and it can't crawl any further, what's the best way to fix this? 1. Add a rel="nofollow" attribute 2. Don't open the link in a new window so the back button remains finctional 3. Both 1 and 2 or 4. Create specs on the page instead of relying on a .pdf Here's an example page: http://www.ccisolutions.com/StoreFront/product/mackie-cfx12-mkii-compact-mixer - The technical spec .pdf is located under the "Downloads" tab [the content is all on one page in the source code - the tabs are just a design element] Thoughts and suggestions would be greatly appreciated. Dana
Technical SEO | | danatanseo0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Blocking URL's with specific parameters from Googlebot
Hi, I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor". How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages. DON'T want to block: http://www.mysite.com/product.php?productid=1234 WANT to block: http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2 Javacript button code: onclick="javascript: document.voteform.submit();" Thanks in advance for any advice given. Regards,
Technical SEO | | aethereal
Asim0