Crawl diagnostic errors due to query string
-
I'm seeing a large amount of duplicate page titles, duplicate content, missing meta descriptions, etc. in my Crawl Diagnostics Report due to URLs' query strings. These pages already have canonical tags, but I know canonical tags aren't considered in MOZ's crawl diagnostic reports and therefore won't reduce the number of reported errors. Is there any way to configure MOZ to not consider query string variants as unique URLs? It's difficult to find a legitimate error among hundreds of these non-errors.
-
I'm glad to hear you got this figured out - thank you Patrick for your help!
Kevin
Help Team -
Hi Kevin,
I understand how MOZ's duplicate content system works. It would just be nice if it could take canonical URLs into consideration for Crawl Diagnostics Reports or give you the option of not counting URLs appended with parameters as unique pages.
Patrick was able to help me figure out that I can do the latter via the robots.txt feature by using a wildcard: Disallow: *?.
-
Hi there!
Our tool has a 90% tolerance for duplicate content, which means it will flag any content that has 90% of the same code between pages. This includes all the source code on the page and not just the viewable text. You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php. In the case of http://www.optp.com/SMARTROLLER?cat_id=205#.VZreQhNVhBc and http://www.optp.com/SMARTROLLER?cat_id=54#.VZrdJhNVhBc, these pages are 100% similar, which is why they're being flagged.
I hope this helps! If you need any more help with your crawl, feel free to contact our Help Team at help@moz.com.
Thanks!
Kevin
Help Team -
This is very interesting! Strange that Webmaster tools wouldn't display duplicate content, but Google would still penalize you. I'd like to try this on my site, but am a little wary because I think some pages rank with the query string version of the URL, despite a canonical being specified.
-
Is your traffic lower than expected?
I was having an issue like this where moz was showing a lot more duplicate content than webmaster tools was, actually webmaster tools showed none, but I was being penalized. I realized this when I added an exclusion to robots.txt to exclude any query strings on my site. After I did this I saw my rankings shoot through the roof.
Not saying that this is happening to you but I just like to err on the side of caution.
-
Hi there
My bad! Yeah - you could just do this:
User-agent: Rogerbot
Disallow: (check out this resource on how to block specific query strings)Hope this helps! Good luck!
-
Hi Patrick,
Thanks for the quick reply as always. As far as Google is concerned, these pages are set up correctly with canonical tags and URL strings - MOZ actually reports far more duplicate content than Webmaster tools.
My issue is just with the number of errors reported in MOZ. You mentioned that I can handle this via the robots.txt file - Is there a way to only disallow Rogerbot from crawling URLs with query strings, or URLs that contain a certain phrase such as "item_id=" or "cat_id="?
-
Hi there
Check out Google's duplicate content resources - they provide help on how to categorize your parameters and URL strings.
You can also handle this via your robots.txt. Make sure that you have a canonical tag on that page as well.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Server blocking crawl bot due to DOS protection and MOZ Help team not responding
First of all has anyone else not received a response from the help team, ive sent 4 emails the oldest one is a month old, and one of our most used features on moz on demand crawl to find broken links doesnt work and its really frustrating to not get a response, when we're paying so much a month for a feature that doesnt work. Ok rant over now onto the actual issue, on our crawls we're just getting 429 errors because our server has a DOS protection and is blocking MOZ's robot, im sure it will be as easy as whitelisting the robots IP, but i cant get a response from MOZ with the IP. Cheers, Fergus
Feature Requests | | JamesDavison0 -
Why my site not crawl?
my error in dashboard: **Moz was unable to crawl your site on Jul 23, 2020. **Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster i thing edit robots.txt how fix that?
Feature Requests | | alixxf0 -
Access all crawl tests
How can I see all crawl tests ran in the history of the account? Also, can I get them sent to an email that isn't the primary one on the account? Please advise as I need this historical data ASAP.
Feature Requests | | Brafton-Marketing0 -
Is there a way to take notes on a crawled URL?
I'm trying to figure out the best way to keep track of there different things I've done to work on a page (for example, adding a longer description, or changing h2 wording, or adding a canonical URL. Is there a way to take notes for crawled URLs? If not what do you use to accomplish this?
Feature Requests | | TouchdownTech0 -
MOZ Site Crawl - Ignore functionality question
Quick question about the ignore feature found in the MOZ Site Crawl. We've made some changes to pages containing errors found by the MOZ Site Crawl. These changes should have resolved issues but we're not sure about the "Ignore" feature and do not want to use it without first understanding what will happen when using it. Will it clear the item from the current list until the next Site Crawl takes place. If Roger finds the issue again, it will relist the error? Will it clear the item from the list permanently, regardless if it has not been properly corrected?
Feature Requests | | StickyLife1 -
Crawl test limitaton - ways to take advantage of large sites?
Hello I have a large site (120,000+) and crawl test is limited to 3,000 pages. I want to know if you have a way to take advantage to crawl a type of this sites. Can i do a regular expression for example? Thanks!
Feature Requests | | CamiRojasE0 -
High Priority - Error Code 804: HTTPS (SSL) Error Encountered
I know that Rogerbot has an issue with SSL / SNI technology used with our Amazon Cloudfront setup even though most browsers support it. Is there a timeframe for this to be fixed? If not is there a specific set of instructions or any way for us to get this from showing up on our High Priority list? Thanks! Sean
Feature Requests | | zspace0