Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WEbsite cannot be crawled
I have received the following message from MOZ on a few of our websites now Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. I have spoken with our webmaster and they have advised the below: The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place. For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end. _Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _ Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
Moz Pro | | threecounties0 -
URL Length Issue
MOZ is telling me the URLs are too long. I did a little research and I found out that the length of the URLs is not really a serious problem. In fact, others recommend ignoring the situation. Even on their blog I found this explanation: "Shorter URLs are generally preferable. You do not need to take this to the extreme, and if your URL is already less than 50-60 characters, do not worry about it at all. But if you have URLs pushing 100+ characters, there's probably an opportunity to rewrite them and gain value. This is not a direct problem with Google or Bing - the search engines can process long URLs without much trouble. The issue, instead, lies with usability and user experience. Shorter URLs are easier to parse, copy and paste, share on social media, and embed, and while these may all add up to a fractional improvement in sharing or amplification, every tweet, like, share, pin, email, and link matters (either directly or, often, indirectly)." And yet, I have these questions: In this case, why do I get this error telling me that the urls are too long, and what are the best practices to get this out? Thank You
Moz Pro | | Cart_generation1 -
404 Crawl Diagnostics with void(0) appended to URL
Hello I am getting loads of 404 reported in my Crawl report, all appended with void(0) at the end. For example: http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0)
Moz Pro | | moshen
The site is running on Drupal 7, Has anyone come across this before? Kind Regards Moshe | http://lfs.org.uk/films-and-filmmakers/watch-our-films/1289/void(0) |0 -
What is Linking C-Blocks
Currently i am using MOZ pro tool under moz analyticls >> Moz Competitive Link Metrics >> history having a graph "Linking C-Blocks" Please help me understanding Linking C-Blocks, what is, How to build, how to define ...
Moz Pro | | shankar3335 -
How to increase page authority
I wonder how to increase the page authority or the domain authority to begin with. It seems you are putting a lot of weight on this in your analysis.
Moz Pro | | wcsinc0 -
Page Authority is the same on every page of my site
I'm analyzing a site and the page authority is the exact same for every page in the site. How can this be since the page authority is supposed to be unique to each page?
Moz Pro | | azjayhawk0 -
Duplicate page titles are the same URL listed twice
The system says I have two duplicate page titles. The page titles are exactly the same because the two URLs are exactly the same. These same two identical URLs show up in the Duplicate Page Content also - because they are the same. We also have a blog and there are two tag pags showing identical content - I have blocked the blog in robots.txt now, because the blog is only for writers. I suppose I could have just blocked the tags pages.
Moz Pro | | loopyal0 -
Is there a tool to upload multiple URLs and gather statistics and page rank?
I was wondering if there is a tool out there where you can compile a list of URL resources, upload them in a CSV and run a report to gather and index each individual page. Does anyone know of a tool that can do this or do we need to create one?
Moz Pro | | Brother220