Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will Removing or Disavowing Toxic Links Improve MOZ Domain Authority?
The vast majority of the 140 domains that link to our website are very low quality directories or and other toxic links. Only about 20-30 domains are not toxic (according to Link Research Tools confirmed by out manual inspection of these links). Would removing some of these links improve of MOZ Domain Rank? What if we cannot remove them, can NOZ detect a disavow file? In general would improving the ratio between good quality and poor quality links improve domain authority? Thanks,
Moz Bar | | Kingalan1
Alan2 -
When I try to run a Moz report, it sends me to a 404 page?
Hey there. I'm trying to export a .pdf to send to my client. When I click "export pdf", the page sits for a second then goes to a 404 page? I've never seen this before. Is anyone else getting this problem?
Moz Bar | | TaylorRHawkins2 -
Keyword Explorer search tips&tricks e.g. A&B , A&B Not C, AorB
I'm new to MOZ and I like the Keyword Explorer a lot. I'm curious if there is any additional helpful information that I can look up in order to optimized SEO? What if I want to search for a keyword combination? Apple and Oranges but not Kiwi Thanks in advance, Elmar
Moz Bar | | ElmarW0 -
URL is Inaccesible
Hi, I have tried this url: https://www.3dquickprinting.com/ on Moz onpage grader but it responds "Sorry, but that URL is inaccessible." I have checked my Robots.txt but it does not have any entry to block MOZ crawler. Please see: #User-Agent: *
Moz Bar | | HiteshP
#Crawl-Delay: 30 #For robots.txt
User-agent: BLEXBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: TwengaBot
Disallow: /
User-agent: 008
Disallow: /
User-agent: WotBox
Disallow: / Please advice what to do get rid from this error. Thanks,0 -
How do I disallow crawl on a directory when it's a prefix to my site's URL?
I am trying to disallow our media repository (hosted elsewhere, but appears as a directory on our site) from being crawled by robots but it is not a subdirectory of the site, it's a prefix. So I need to disallow: mediabank.mywebsite.org Not: mysite.org/mediabank What would I need to put in my robots.txt and/or the other host's robots.txt to make this happen? Thanks!
Moz Bar | | Simon-Plan0 -
Keyword research in Moz - Am I missing something?…
I'm fairly new at SEO and am still trying out MOZ. I'm finding myself wishing that Moz had a broader set of tools for keyword research, but am wondering if maybe I'm just missing something. Am I correct in saying that there is no place within Moz that you can get keyword ideas, select/organize keywords, find new/related keywords? I've appreciated the keyword difficulty tool and have played around with Moz Analytics keyword tracking, but these don't help with the broader keyword research aspect. The various tutorials talk about using Google AdWords Keyword Planner and other tools to do that job. But with all that Moz can do, it just seems backwards to do lots of copying-pasting from other tools into Moz, or to have to rely on old excel spreadsheets to organize keywords. It seems that life would be much easier if Moz would be able to pull that kind of external research information in, like Raven does. That way you do most of your research within Moz – allowing you to organize/evaluate potential keywords BEFORE choosing the ones you would like to test for difficulty or track. That way you could simply select the desired keywords and add them to your tracked list. The way it is now, even the one research tool that Moz does have (keyword difficulty) does not allow for an easy way to track or manage a chosen set of keywords. For example, after doing some keyword difficulty analysis, I've chose 5 keywords. To add these 5 to the tracked keywords in a specific campaign I can't just select them and say "track keywords > in xxx campaign". From what I've understood, I must copy-and-paste them into the manage keyword section of the desired campaign. To me this all feels a bit awkward. But since I'm new at this, maybe I'm just missing something? Or maybe it's unrealistic to expect a single tool to address all my SEO needs?
Moz Bar | | Mike_E0 -
Moz crawl sees meta description but there are none
I have a new site I have run a starter crawl on. The crawl came back saying some of the pages do indeed have meta descriptions. When I go to the same page and use the Moz Chrome toolbar it says they do not have meta descriptions. I also know they do not have meta descriptions. Are there any instances in which the Moz crawler would see them when they are not there?
Moz Bar | | SBXMedia0