Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Pro subscription
We have Pro subscription in the name of Convonix and we are not able to perform more than 10 tests on https://analytics.moz.com/pro/link-explorer/ Also, we are not able to see all 10 rows for "Top Followed Links" Request you to help us in this case.
Moz Pro | | Convonix0 -
Magento: Moz finding URL and URL?p=1 as duplicate. Solution?
Good day Mozzers! Moz bot is finding URL's in the Catalogue pages with the format www.example.com/something and www.example.com/something?p=1 as duplicate (since they are the same page) Whats the best solution to implement here? Canonical? Any other? Cheers! MozAddict
Moz Pro | | MozAddict0 -
Moz tools are returning "url is inaccessible"
Hello everyone, I have been trying to use the on page grader tool and I have also tried to do a site crawl test, and both tools have come back with a "Sorry, but that URL is inaccessible" error. This has not been a problem before. Any ideas why this is happening eg what is blocking it. The url is www.livinghouse.co.uk any help for a novice would be appreciated. PS. I have had another tool also not giving any results, so I assume its something on the site which is blocking the tools. Could this also block Google? Thanks Giles
Moz Pro | | livinghouse0 -
404 : Errors in crawl report - all pages are listed with index.html on a WordPress site
Hi Mozers, I have recently submitted a website using moz, which has pulled up a second version of every page on the WordPress site as a 404 error with index.html at the end of the URL. e.g Live page URL - http://www.autostemtechnology.com/applications/civil-blasting/ Report page URL - http://www.autostemtechnology.com/applications/civil-blasting/index.html The permalink structure is set as /%postname%/ For some reason the report has listed every page with index.html at the end of the page URL. I have tried a number of redirects in the .htaccess file but doesn't seem to work. Any suggestions will be strongly appreciated. Thanks
Moz Pro | | AmanziDigital0 -
Keyword Stuffing - MOZ On-Page Grader
We sell a great number of insulation products, many of which are produced by individual manufacturers. On the page identified below the Keyword "Kingspan" is repeated numerous times as these items are included in our online shop. However, the many mentions of Kingspan are recorded in the HTML5 Source Code, rather than an external database. When I used the MOZ On-Page Grader, using the keywords "Kingspan" I was surprised to achieve an "A" Grade! I know I shouldn't be complaining, but I am wondering why the significant repetition of the word "Kingspan" has not negatively impacted my score? http://www.just-insulation.com/001-eshop/buy-kingspan-thermapitch-thermawall-thermafloor-insulation-boards.html
Moz Pro | | JustInsulation0 -
Only One page crawled..Need help
I have run a website in Seomoz which have many URLs with it. But when I saw the seomoz report that showing Pages Crawled: 1. Why this is happen my campaign limit is OK Tell me what to do for all page crawling in seomoz report. wV6fMWx
Moz Pro | | lucidsoftech0 -
When will be the 250 pages crawled limit eliminated?
Hi, I signed up yesterday for a SEOMoz Pro Account, and would like to know, please, when will be the 250 pages crawled limit eliminated? 🙂 Thanks in advance for your help!
Moz Pro | | Andarilho0 -
How do i get to know th pages crawled by SEOMOZ?
My SEOMOZ campaign says that "n" number of pages were crawled. How do i get access to the list of the pages crawled by SEOMOZ?
Moz Pro | | IM_Learner0