Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Source page showsI have 2 h1 tags on my page. I can only find one.
When I grade my page it says I have more than one h1 tag. I view the source page and it shows there are two h1 headings with the same wording. If I delete the one h1 heading I can find, the page source shows I have deleted both of them. I don't know how to get to the other heading to delete it. And I'm off page one of google! Can anybody help? Clay Stephens
Moz Pro | | Coot0 -
Problem to log into moz
Every time the moz logs me out from the account and then I can not log in. It shows on the left side my name like I am logged in and then when I want go to community suddenly I am not logged in. It offen shows 502 error. It was first doing on firefox, then I manage to log in chrome and now I had to log in private browsing.
Moz Pro | | Rebeca11 -
Mass Moz Grading
I want to know if there's a workaround to grade my individual pages for multiple keywords in a more efficient manner. Let's say for example (this is an extreme hypothetical) that I'm targeting these keywords: "Red Castles" "Blue Cars" "Green Bottles" on one page. I want to know if there is a way to run three grade reports at once for the single page that they're on with the Moz tools. Thanks a bunch!
Moz Pro | | OOMDODigital0 -
Long URLs
My website is hosted by Hubspot. When I create a blog, the URL, as an example, would be: http://www.boxtheorygold.com/blog/bid/27061/Manage-By-the-Numbers/ Instead I am getting the URL below. Google Webmaster tools and moz see this as an error and google says it can't crawl because it is a non-existent page. Users cannot see this page, and Hubspot can't figure it out, but google and moz see it. This problem is occurring on about 25 blogs out of 150. Any ideas? And thanks. URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers
Moz Pro | | Rong0 -
Crawl Errors Confusing Me
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt. Thanks, MJ
Moz Pro | | mjtaylor0 -
Crawl still in progress ...
Hi guys, New crawl on one of my campaigns is still in progress since November 27th, i didn't get new data since November 19th 2011 ... What should i do ?
Moz Pro | | DavidEichholtzer0 -
How do i get rid of a duplicate page error when you can not access that page?
How do i get rid of a duplicate page error when you can not access that page? I am using yahoo store manager. And i do not know code. The only way i can get to this page is by copying the link that the error message gives me. This is the duplicate that i can not find in order to delete. http://outdoortrailcams.com/busebo.html
Moz Pro | | tom14cat140 -
How long does a crawl take?
A crawl of my site started on the 8th July & is still going on - is there something wrong???
Moz Pro | | Brian_Worger1