Prevent Rodger Bot for crwaling pagination

twpnglobal

Hello,

I have a site that has around 300k static pages but each one of these has pagination on it.

I would like to stop Rodger Bot from crawling the paginated pages and maybe even Google.

The paginated pages are results that change daily so there is no need to index them.

What's the best way to prevent them from being crawled?

The pages are dynamic so I don't know the URLs.

I have seen people mention add no follow to the pagination links would this do it? or is there a better way?

Many thanks

Steve

effectdigital

Robots.TXT Rules

If you have architecture like:

site.com/blog/post/page/1

Then use:

User-agent: rogerbot

Disallow: /page/

If you have architecture like:

site.com/blog/post?p=1

Then use:

User-agent: rogerbot

Disallow: /*?p=

If you have architecture like:

site.com/blog/post?page=1

Then use:

User-agent: rogerbot

Disallow: /*?page=

That should pretty much stop Rogerbot from crawling paginated content. It would certainly stop Googlebot, but I don't quite know if Rogerbot respects the "*" wildcard like Googlebot does. Give it a try, see what happens

Don't worry, in the robots.txt file only "*" is respected as a wildcard, so you won't have any problems with "?" and there won't be any need for an escape character

Gaston Riera

Hi,

Lets separate topics here:

Prevent crawling is by robots.txt and won't index pages that are already indexed.
Prevent indexing and de-index pages already indexed, is done by robots tag with a noindex parameter.
Here, an article from google about that: Block search indexing with 'noindex' - Google Search Console Help

That said, another action you might take is adding a nofollow in pagination links. Nofollow only tells Google: "I don't want that page to considered as important." It will probably reduce its chances to rank high but won't prevent from crawling nor indexing.
Another way, yet a little more expensive in development, is adding a specific parameter in the URL when you know is pagination. Then you can block that in robots.txt. Again, this won't remove what's been already indexed.

Hope it helps.
Best luck.
Gaston

Explore more categories

Unsolved How can I improve SEO for my auditing firm’s website?

Unsolved How can I shorten a url?

Unsolved What are the top 5 things you recommend you should do when you first subscribe to Moz?

Unsolved Scripts?

Unsolved Website Traffic

Using the free domain analysis tool - what would cause "Bummer no data found"

Moz not able to crawl our site - any advice?

Standard Syntax in robots.txt doesn't prevent Moz bot from crawling

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.