Prevent Rodger Bot for crwaling pagination
-
Hello,
I have a site that has around 300k static pages but each one of these has pagination on it.
I would like to stop Rodger Bot from crawling the paginated pages and maybe even Google.
The paginated pages are results that change daily so there is no need to index them.
What's the best way to prevent them from being crawled?
The pages are dynamic so I don't know the URLs.
I have seen people mention add no follow to the pagination links would this do it? or is there a better way?
Many thanks
Steve
-
Robots.TXT Rules
If you have architecture like:
Then use:
User-agent: rogerbot
Disallow: /page/
If you have architecture like:
Then use:
User-agent: rogerbot
Disallow: /*?p=
If you have architecture like:
Then use:
User-agent: rogerbot
Disallow: /*?page=
That should pretty much stop Rogerbot from crawling paginated content. It would certainly stop Googlebot, but I don't quite know if Rogerbot respects the "*" wildcard like Googlebot does. Give it a try, see what happens
Don't worry, in the robots.txt file only "*" is respected as a wildcard, so you won't have any problems with "?" and there won't be any need for an escape character
-
Hi,
Lets separate topics here:
- Prevent crawling is by robots.txt and won't index pages that are already indexed.
- Prevent indexing and de-index pages already indexed, is done by robots tag with a noindex parameter.
Here, an article from google about that: Block search indexing with 'noindex' - Google Search Console Help
That said, another action you might take is adding a nofollow in pagination links. Nofollow only tells Google: "I don't want that page to considered as important." It will probably reduce its chances to rank high but won't prevent from crawling nor indexing.
Another way, yet a little more expensive in development, is adding a specific parameter in the URL when you know is pagination. Then you can block that in robots.txt. Again, this won't remove what's been already indexed.Hope it helps.
Best luck.
Gaston
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved How can I improve SEO for my auditing firm’s website?
Hi Moz community! I run aauditing.com, offering audit, tax consulting, and accounting services in the UAE. I’m looking for effective SEO strategies to boost visibility and attract clients. Any tips on content optimization, backlinks, or local SEO would be appreciated!
Getting Started | | HFM972 -
Unsolved How can I shorten a url?
I've got way too many long url's but I have no idea how to shorten them?
Getting Started | | laurentjb0 -
Unsolved What are the top 5 things you recommend you should do when you first subscribe to Moz?
Hi
Getting Started | | djnlowe
Im new to Moz and would like to get the most out of it as quickly as possible. so like the subject say - What are the top 5 things you would recommend I should to get the most out of Moz? #getstarted0 -
Unsolved Scripts?
0 -
Unsolved Website Traffic
Greetings All. I'm working on a new business website for a client, and I've been accessing the site numerous times daily (troubleshooting, confirming changes, etc.). I've been using Google search to access the site, and I use a VPN so that my IP would be random. So I would presume that the site traffic should be increasing. But on the last Moz Pro crawl, the site traffic was still listed as 0.
Getting Started | | depawl52
Is there a minimum amount of traffic required before Moz recognizes it or is something else going on?
Thank you.0 -
Using the free domain analysis tool - what would cause "Bummer no data found"
When I enter my domain in the free analysis tool, I get a "bummer, no data found". I am certain whatever is causing that to happen is causing other SEO problems https://academicanv.org
Getting Started | | verdet32323 -
Moz not able to crawl our site - any advice?
When I try and crawl our site through Moz it gives this message: Moz was unable to crawl your site on Aug 7, 2019. Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster. I have been through all the help and doesn't seem to be any issues. You can check the site and robots.txt here: https://myfamilyclub.co.uk/robots.txt. Anyone got any advice on where I could go to get this sorted?
Getting Started | | MyFamilClubLtd1 -
Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/ To resolve this we have set up a disallow statement in the robots.txt file that says
Getting Started | | btreloar
Disallow: /page/ For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?0