Crawl diagnostic errors due to query string
-
I'm seeing a large amount of duplicate page titles, duplicate content, missing meta descriptions, etc. in my Crawl Diagnostics Report due to URLs' query strings. These pages already have canonical tags, but I know canonical tags aren't considered in MOZ's crawl diagnostic reports and therefore won't reduce the number of reported errors. Is there any way to configure MOZ to not consider query string variants as unique URLs? It's difficult to find a legitimate error among hundreds of these non-errors.
-
I'm glad to hear you got this figured out - thank you Patrick for your help!
Kevin
Help Team -
Hi Kevin,
I understand how MOZ's duplicate content system works. It would just be nice if it could take canonical URLs into consideration for Crawl Diagnostics Reports or give you the option of not counting URLs appended with parameters as unique pages.
Patrick was able to help me figure out that I can do the latter via the robots.txt feature by using a wildcard: Disallow: *?.
-
Hi there!
Our tool has a 90% tolerance for duplicate content, which means it will flag any content that has 90% of the same code between pages. This includes all the source code on the page and not just the viewable text. You can run your own tests using this tool: http://www.webconfs.com/similar-page-checker.php. In the case of http://www.optp.com/SMARTROLLER?cat_id=205#.VZreQhNVhBc and http://www.optp.com/SMARTROLLER?cat_id=54#.VZrdJhNVhBc, these pages are 100% similar, which is why they're being flagged.
I hope this helps! If you need any more help with your crawl, feel free to contact our Help Team at help@moz.com.
Thanks!
Kevin
Help Team -
This is very interesting! Strange that Webmaster tools wouldn't display duplicate content, but Google would still penalize you. I'd like to try this on my site, but am a little wary because I think some pages rank with the query string version of the URL, despite a canonical being specified.
-
Is your traffic lower than expected?
I was having an issue like this where moz was showing a lot more duplicate content than webmaster tools was, actually webmaster tools showed none, but I was being penalized. I realized this when I added an exclusion to robots.txt to exclude any query strings on my site. After I did this I saw my rankings shoot through the roof.
Not saying that this is happening to you but I just like to err on the side of caution.
-
Hi there
My bad! Yeah - you could just do this:
User-agent: Rogerbot
Disallow: (check out this resource on how to block specific query strings)Hope this helps! Good luck!
-
Hi Patrick,
Thanks for the quick reply as always. As far as Google is concerned, these pages are set up correctly with canonical tags and URL strings - MOZ actually reports far more duplicate content than Webmaster tools.
My issue is just with the number of errors reported in MOZ. You mentioned that I can handle this via the robots.txt file - Is there a way to only disallow Rogerbot from crawling URLs with query strings, or URLs that contain a certain phrase such as "item_id=" or "cat_id="?
-
Hi there
Check out Google's duplicate content resources - they provide help on how to categorize your parameters and URL strings.
You can also handle this via your robots.txt. Make sure that you have a canonical tag on that page as well.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Server blocking crawl bot due to DOS protection and MOZ Help team not responding
First of all has anyone else not received a response from the help team, ive sent 4 emails the oldest one is a month old, and one of our most used features on moz on demand crawl to find broken links doesnt work and its really frustrating to not get a response, when we're paying so much a month for a feature that doesnt work. Ok rant over now onto the actual issue, on our crawls we're just getting 429 errors because our server has a DOS protection and is blocking MOZ's robot, im sure it will be as easy as whitelisting the robots IP, but i cant get a response from MOZ with the IP. Cheers, Fergus
Feature Requests | | JamesDavison0 -
Why my site not crawl?
my error in dashboard: **Moz was unable to crawl your site on Jul 23, 2020. **Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster i thing edit robots.txt how fix that?
Feature Requests | | alixxf0 -
Errors for URLS being too long and archive data is duplicate
I have hundreds of errors for the same three things. The URL is too long. Currently the categories are domain.com/product-category/name main product/accessory product/
Feature Requests | | PacificErgo
-Do I eliminate somehow the product category? Not sure how to fix this. 2) It has all of my category pages listed as archives showing duplicates. I don't know why, as they are not blog posts, they hold products on them. I don't have an archived version of this. How do I fix this? 3. It is saying my page speed is slow. I am very careful to optimize all my photos in PhotoShop. Plus I have a tool on the site to further compress. I just went with another host company that is supposed to be faster. Any ideas/ I would so appreciate your help and guidance. All my best to everyone, be safe and healthy.0 -
Is there a way to take notes on a crawled URL?
I'm trying to figure out the best way to keep track of there different things I've done to work on a page (for example, adding a longer description, or changing h2 wording, or adding a canonical URL. Is there a way to take notes for crawled URLs? If not what do you use to accomplish this?
Feature Requests | | TouchdownTech0 -
Moz crawler is not able to crawl my website
Hello All, I'm facing an issue with the MOZ Crawler. Every time it crawls my website , there will be an error message saying " **Moz was unable to crawl your site on Sep 13, 2017. **Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. " We changed the robots.txt file and checked it . but still the issue is not resolved. URL : https://www.khadination.shop/robots.txt Do let me know what went wrong and wjhat needs to be done. Any suggestion is appreciated. Thank you.
Feature Requests | | Harini.M0 -
Moz crawler is not able to crawl my website
Hello All, I'm facing an issue with the MOZ Crawler. Every time it crawls my website , there will be an error message saying " **Moz was unable to crawl your site on Sep 13, 2017. **Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. " We changed the robots.txt file and checked it . but still the issue is not resolved. URL : https://www.khadination.shop/robots.txt Do let me know what went wrong and wjhat needs to be done. Any suggestion is appreciated. Thank you.
Feature Requests | | Harini.M0 -
Crawl error : 804 https (SSL) error
Hi, I have a crawl error in my report : 804 : HTTPS (SSL) error encountered when requesting page. I check all pages and database to fix wrong url http->https but this one persist. Do you have any ideas, how can i fix it ? Thanks website : https://ilovemypopotin.fr/
Feature Requests | | Sitiodev0 -
High Priority - Error Code 804: HTTPS (SSL) Error Encountered
I know that Rogerbot has an issue with SSL / SNI technology used with our Amazon Cloudfront setup even though most browsers support it. Is there a timeframe for this to be fixed? If not is there a specific set of instructions or any way for us to get this from showing up on our High Priority list? Thanks! Sean
Feature Requests | | zspace0