Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why page load time is different in google webmaster vs what is displayed in moz?
When I analyze the site through Moz tool and compare the results with google webmaster, I am not able to figure what why Moz does not report the slow pages. Fro example this page has an avrage LCP of 3.0 sec https://www.collegehippo.com/graduate-school/programs/gre-score-business-analytics-data-analytics When I see the report in moz, it does not point to any such issue. Should I be worried about what google reports and try to fix the page?
Moz Bar | | etattva0 -
Limit MOZ crawl rate on Shopify or when you don't have access to robots.txt
Hello. I'm wondering if there is a way to control the crawl rate of MOZ on our site. It is hosted on Shopify which does not allow any kind of control over the robots.txt file to add a rule like this: User-Agent: rogerbot Crawl-Delay: 5 Due to this, we get a lot of 430 error codes -mainly on our products- and this certainly would prevent MOZ from getting the full picture of our shop. Can we rely on MOZ's data when critical pages are not being crawled due to 430 errors? Is there any alternative to fix this? Thanks
Moz Bar | | AllAboutShapewear2 -
Moz is not showing traffic stats..;(
I have connected my analytics account in moz. But it is not showing my traffic stats.
Moz Bar | | daniyal_ahmed0 -
Page Title showing Twice when I check it through Moz Bar
Hello Experts, I have a brand new website. And I know I have to do lot of things and make changes for better SEO. With that in mind, I recently installed Mozbar. I was checking one of my posts the other day and noticed Page Title is appearing twice in Mozbar.Please see the screenshot below. Could you please guide me what could be causing this? FYI, I have installed Yoast SEO Plugin. Also, I can see "Alt Text" is picking lot many words which are from other images or posts...Is there any way to customize that as well? KhP9y
Moz Bar | | techierichard0 -
Moz Forum Responses Crashing Outlook 365
Don't know if anyone else has this issue every time I get a forum response email and I try to open it it seizes up Outlook 365. It totally crashes, all I can do is to kill the process and restart Outlook!!
Moz Bar | | seoman100 -
Crawl Diagnostics: How many pages (deep) will it crawl for dup content
Does anyone know how deep the crawl diagnostics will crawl when searching for dup content? Will it crawl the entire site, or will it only crawl "x" amount of pages? Thanks!
Moz Bar | | tdawson090 -
How to find all 301 redirect for URL xyz.com/products (internal and external)?
This is what we are thinking: Get all URL of the xyz.com/products using XENU software. Search those URL on google (site;xyz.com url ) to find out if they are crawled by google, do the same on bing (as currently google shows 4k URL and bing 11k ) Use opensiteexplorer (301 redirect ) and using (internal external) to get the desired result. Is this the right approach? If not, what is the best way to find the correct result? All suggestions are welcome.
Moz Bar | | tpt.com0 -
Beta Moz is losing my campaigns.
I've been trying to make the transition to Beta especially now that Pro has such serious problems with the style sheet. The thing is, all of the campaigns I have added to the Pro side don't show up in the Beta side. They'll show in Pro but that's all but unusable on my end and there's no Archive button visible on Beta. I tried adding a campaign again to Beta thinking it just isn't there, but it tells me it's already been added. Well then were is it? I just added them earlier today! I can see them on Pro but nowhere on Beta, nor the other 3 I have added on Pro in the last 2 weeks.
Moz Bar | | seowhat0