Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Top Pages and Tool Bar Not Crawling Internal Pages and Links
Hello, We’re having two issues with our Moz tools and we’re not sure what’s causing them and whether they are related. The Moz Bar isn’t highlighting some of our internal links (including navigation links). The Top Pages Report in Open Site Explorer is only picking up the homepage and a couple error pages (none of the internal pages). The full Crawl Report is picking up everything though. Could a potential cause of both these issues be the Title attribute in some our links? – We use <a <="" span="">title="Example" href="link"></a> <a <="" span="">Or is this most likely from something else blocking the crawler from accessing our links/pages? Google Search Console does seem to be picking up the links in the navigation and everything is indexed/rendered correctly so we also didn’t know if this is something that could be issue. Any insight or help would be appreciated. Please let us know if there are any details we could provide that might help. Looking forward to hearing from all of you! Thank you in advance. Best,</a>
Moz Bar | | Ben-R0 -
On-Page Grader URL inaccessible when copy/pasted but not when edited
Hi!, I've looked through multiple topics on this but none quite seem to fit what's going on - hopefully someone can help! I get the error message 'Sorry, but that URL is inaccessible.' when I copy and paste a url from my site into the search e.g. http://www.orbussoftware.com/enterprise-architecture/ However if I edit this to https the search completes fine. Since we redesigned our site approx 6 months ago, we've found most of our rankings have completely dropped off, and now I'm getting this error I'm wondering if it has something to do with how our site is structured? If I'm getting this error with Moz does that mean Google could be having issues too? Or is it all just a strange quirk? Thanks!
Moz Bar | | JennaOrbus0 -
Community Discussion - What's Been Your Experience With Moz Content?
When the content developed Moz Content, I was excited as can be about having another tool in the content marketing and content strategy repertoire. I knew it could and would help marketers better identify the content they should be creating and make it easier for them to move the needle for their brands. Since it's been available, I've had fun using Moz Content, seeing it as a great vehicle for flattening the learning curve for content ideation and creation. In a recent post, Here's How I'm Using Moz Content for Mining Local Link Opportunities, David Farkas described how brands can use Moz Content to better create localized content. I'd like to know how you're using it, or if you're using it: Have you tried Moz Content? And if not, what's stopping you? If you have used it, what are you really liking? What would you change? What, if any, additional features you'd like to see added? What tips can you share for helping others get the most out of the tool? Looking forward to reading the comments below.
Moz Bar | | ronell-smith3 -
MOZ On Page Grader Not Seeing Changes
Hello all, I've been doing some SEO work recently and using the really handy A-F rating on MOZ on page grader as a guide. It's worked great so far on my sites up until today. I've made some small changes to a site such as the title and MOZ isn't seeing these changes. For example if I were to put 'banana' in the title of one of my other sites and re-run the on page grader with 'banana' as a keyword, instantly MOZ would say this keyword is used in the title once. I've done the same to another site and the same thing hasn't happened. I did a test and wrote all the keywords from the title in the on page grader and it still said they appear 0 times in the title. Am I missing something here? The site is different from the others, it's flat html and not CMS, but the SEO changes I've made are still the same. Thanks!
Moz Bar | | HB170 -
Why Moz Rank tracker's results are different from results search in browser?
I would like to perform a competitor rank analysis in Google.gr . I have noticed that the results from the rank tracker for Google.gr are different than the results in Google.gr from a machine located in Greece with no Google account logged in. Could you inform me why such a behavior occurs?
Moz Bar | | thmavri0 -
Site crawl errors - download list of all urls
Hi Ive provided my clients developers with the pdf reports of crawl errors but these seem to miss some urls I see there are lots of csv file download/email options Will the email csv button send a report of everything listing all urls that are missing from the pdfs ? if not will the more specific csv reports Would be good if i can press 1 button and get all issues listed with all urls It does look like this happens but i just want confirmed best way asap since need to provide reports urgently, any guidance much appreciated ? All Best Dan
Moz Bar | | Dan-Lawrence0 -
Why'd Moz stop showing the list of users?
Curious to know if anyone else noticed that Moz stopped showing most of the active community users http://moz.com/community/users. It was nice to see who's who from visiting profiles and try to connect with them via email or see their websites, etc. There used to be pagination at the bottom. Why did they stop?
Moz Bar | | WhiteboardCreations0 -
How to make sense of data for this keyword (Moz Keyword vs. Google KW Planner)
The keyword "auto immune disorder" has an avg monthly (local/US) search of 880 in Google KW Planner with "low" competition In Moz, the same keyword has an avg monthly (local/US) search of 5 with a KW difficulty of 71%. I realize that the competition in Google KWP is based on ads, but is there a discrepancy when the search volume is so different in Google vs. Moz/Bing? Based on the above data, it seems like it would make sense to do a Google Adword (high search volume, low competition) instead of trying to rank organically...am I on the right track here? Thanks!
Moz Bar | | lulu710