Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Rookie question re Moz Crawl errors after deleting a property from console.
Hi all, I stupidly removed the "http" url of my one website a few days back (it is one of three, the other two being the https), then re-added it around a day later and, while google console isn't reporting back any errors, Moz Crawl is going to town on this now for one critical "4xx" issues, canonicals and various other content issues that I addressed days previously...last Moz crawl performed an hour ago, url deleted and re-added two days ago. I have resubmitted a sitemap, will this smooth itself out or shall I go and make changes? Many thanks in advance.
Moz Pro | | UkPod0 -
Duplicate pages coming from links from the login page - what should we do about them?
This is a follow on to an earlier question which was well answered by Dirk Ceuppens regarding abnormal crawl issues. We are seeing that the issues relating to Duplicate Pages are coming from links from the login page which shows information about where the user was redirected from. For example, if the visitor is not logged on and wishes to wish-list an item, they will be redirected to the login page, with the item code and intended action in the url; which can then continue on to the desired page once logged on. The MOZ crawler is seeing these pages as having Duplicated Content whilst they are all the same apart from a piece of information in the URL. Should we be blocking these duplications? Are they a risk to us? What should we be doing? Many thanks, Sarah
Moz Pro | | Mutatio_Digital0 -
Pages Crawled: 1 Why?
I have some campaigns which have only 1 page crawled, while some other campaigns, having completely similar URL (subdomain) and number of keywords and pages, have all pages crawled... Why is that so? It has been also a while I waited and so far no change...
Moz Pro | | BritishCouncil0 -
Crawl Test - Taking too long
The last crawl test I invoked seems to be in progress for over 24 hours. The one before that completed in a few hours. Wish there was a progress indicator or an option to cancel. The crawl (from Tool > Crawl Test) should not take this long. Any ideas or suggestions? Also, the keyword research tool (plus a few others) have been down ever since I signed up. Is this a normal?
Moz Pro | | MomoMasta0 -
Settings to crawl entire site
Not sure what happened but I started a third campaign yesterday and only 1 pages was crawled, The other two campaigns has 472 and 10K respectively. What is the proper setting to choose in the beginning of campaign setup to have the entire site crawled. Not sure what I did different and I must be reading the instructions incorrectly. Thanks, Don
Moz Pro | | NicheGuy210 -
Hyphens in Page Titles?
We are using a combination of keywords using our brand name. So the keyword is structure as: brand name - word (separated by a hyphen) When I run a report on the page for the keywords that have the above format, the report tells me that I need to use the keyword in the title of the page. Is it okay to have hyphens in Page Titles? I assume not, but I want to double check. Thanks, Alex
Moz Pro | | costarica.com0 -
"Duplicate Page Title" and "Duplicate Page Content" issue
Hi I am having an issue with my site showing duplicate page title and content issues for www.domain.com and www.domain.com/ Is the trailing slash really an issue? Can someone help me with a mod_rewrite rule to sort this please? Thanks,
Moz Pro | | JoeBrewer
Joe0 -
SEO Moz Tools - too many on the page links result driving me nuts
A while back I remember Rand and I having a conversation about how many links on the page and up until that point I had followed the NO MORE THAN 100 links on a page rule - which is what the MOZ tools are telling me now in the campaigns I have running. But then during a seminar both of us were holding this 100 link rule question came up and Rand commented that this was probably old hat now as the search engines can crawl a much greater number of links in the page. I was encouraged by his answer especially where ecommerce websites are concerned. But the MOZ tool is driving me nuts telling me that this 100 link rule is still something to be adhered too. It is especially frustrating when we are discussing ecommerce website sites with editable mega menus. Examples to support this question are www.bohemiadesign.co.uk or www.flowersbuydelivery.co.uk which are 2 ecommerce sites I am aware of using such mega menus that are editable and give a link count greater than 100. and I am sure there are many more sites like this, even Amazon for example. So, how much notice do we take of this warning in MOZ tools that is telling me about excessive numbers of links on the pages it lists as needing corrected?
Moz Pro | | ICTADVIS0