Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps

-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz-Specific 404 Errors Jumped with URLs that don't exist
Hello, I'm going to try and be as specific as possible concerning this weird issue, but I'd rather not say specific info about the site unless you think it's pertinent. So to summarize, we have a website that's owned by a company that is a division of another company. For reference, we'll say that: OURSITE.com is owned by COMPANY1 which is owned by AGENCY1 This morning, we got about 7,000 new errors in MOZ only (these errors are not in Search Console) for URLs with the company name or the agency name at the end of the url. So, let's say one post is: OURSITE.com/the-article/ This morning we have an error in MOZ for URLs OURSITE.com/the-article/COMPANY1 OURSITE.com/the-article/AGENCY1 x 7000+ articles we have created. Every single post ever created is now an error in MOZ because of these two URL additions that seem to come out of nowhere. These URLs are not in our Sitemaps, they are not in Google... They simply don't exist and yet MOZ created an an error with them. Unless they exist and I don't see them. Obviously there's a link to each company and agency site on the site in the about us section, but that's it.
Moz Pro | | CJolicoeur0 -
Source page showsI have 2 h1 tags on my page. I can only find one.
When I grade my page it says I have more than one h1 tag. I view the source page and it shows there are two h1 headings with the same wording. If I delete the one h1 heading I can find, the page source shows I have deleted both of them. I don't know how to get to the other heading to delete it. And I'm off page one of google! Can anybody help? Clay Stephens
Moz Pro | | Coot0 -
Robots.txt blocking Moz
Moz are reporting the robots.txt file is blocking them from crawling one of our websites. But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems. We have never come up against this before, even with this site. Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error. Can anyone enlighten us to the problem please? http://www.wychwoodflooring.com -Christina
Moz Pro | | ChristinaRadisic0 -
Pages with URL Too Long
Hello Mozzers! MOZ keeps kindly telling me the URLs are too long. However, this is largely due to the structure of E-commerce site, which has to include 'brand' 'range' and 'products' keyword. For example -
Moz Pro | | tigersohelll
https://www.choicefurnituresuperstore.co.uk/Devonshire-Rustic-Oak-Bedside-Cabinet-1-Drawer-p40668.html MOZ recommends no more than 75 characters. This means we have 25-30 characters for both the brand name and product name. Questions:
If it is an issue, how to fix it on my site?
If it's not an issue, how can we turn off this alert from MOZ?
Anyone know how big an issue URLs are as a ranking factor? I thought pretty low.0 -
404 Crawl Diagnostics with void(0) appended to URL
Hello I am getting loads of 404 reported in my Crawl report, all appended with void(0) at the end. For example: http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0)
Moz Pro | | moshen
The site is running on Drupal 7, Has anyone come across this before? Kind Regards Moshe | http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0) |0 -
Special Characters in URL & Google Search Engine (Index & Crawl)
G'd everyone, I need help with understanding how special characters impact SEO. Eg. é , ë ô in words Does anyone have good insights or reference material regarding the treatment of Special Characters by Google Search Engine? how Page Title / Meta Desc with Special Chars are being index & Crawl Best Practices when it comes to URLs - uses of Unicode, HTML entity references - when are where? any disadvantage using special characters Does special characters in URL have any impact on SEO performance & User search, experience. Thanks heaps, Amy
Moz Pro | | LabeliumUSA0 -
How you can manipulate your MOZ DA
I have become frustrated at MOZ in the last few months, none of my backlinks have made it into the index. Old back links. Long story short, I figured out the issue and I figured out how anyone can manipulate their DA. I wrote a blog post about it here, http://blog.dh42.com/manipulate-moz/
Moz Pro | | LesleyPaone1 -
How long does a crawl take?
A crawl of my site started on the 8th July & is still going on - is there something wrong???
Moz Pro | | Brian_Worger1