Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Title showing Twice when I check it through Moz Bar
Hello Experts, I have a brand new website. And I know I have to do lot of things and make changes for better SEO. With that in mind, I recently installed Mozbar. I was checking one of my posts the other day and noticed Page Title is appearing twice in Mozbar.Please see the screenshot below. Could you please guide me what could be causing this? FYI, I have installed Yoast SEO Plugin. Also, I can see "Alt Text" is picking lot many words which are from other images or posts...Is there any way to customize that as well? KhP9y
Moz Bar | | techierichard0 -
Current Title is short but Moz show error that Title is Too Long
Hello, The current title of an article is of 30 characters. But in Moz it say "Title Element Too Long". The URL of the article is long. Please let me know why Moz is showing it as "Title Element Too Long"
Moz Bar | | ProcessSEO0 -
Duplicate Page Content
The site crawl is registering duplicate page content for our storefront site, but the pages aren't the same. They're ascending pages within the same category (ex: Featured, Featured pg2, Featured pg3, and so on). What can be done to fix these errors or prevent them in the future?
Moz Bar | | MGuid550 -
Moz On-Page Grader doesn't pick up my Title, URL, Meta, H1, Body, IMG ALT's....does this mean Google won't?
Good morning, As my title says, 'Moz On-Page Grader doesn't pick up my Title, URL, Meta, H1, Body, IMG ALT's....does this mean Google won't?' My URL is www.refusedcarfinance.com and I'm currently targeting the Keyword 'bad credit car finance'. I am using Yoast SEO and have the keyword in my title, meta, content, h1's etc. Any advice would my much appreciated. Kindest regards, Joshua
Moz Bar | | RocketStats0 -
Moz Crawler not Identifying all Duplicate Pages
On two recent site crawls (9/27/14 and 11/4/14) for duplicate content the Moz tool did not ID the following 2 pages, which are 100% duplicate to each other: http://www.hooksandlattice.com/planter-hampton-241212.html ; Screenshot: http://screencast.com/t/DdwWroUU http://www.hooksandlattice.com/planter-hampton-721212.html ; Screenshot: http://screencast.com/t/8Lb1cJZmGrhX As I'm working feverishly to re-write and update the site (goal is ZERO duplicates) I'm finding it challenging to use the Moz tool to get the project done. Does anyone have any feedback or help they can provide for how I can identify all duplicate pages associated with my domain? Thank you! Lindsey Pfeiffer
Moz Bar | | CMC-SD0 -
All in favor of an "Announcements" category section say MOZ!
So this is just a thought. I am not Rand, therefor I am not the Moz man himself, and therefor I am not telling anybody how to run this website or forum. But, I've been thinking... and usually that a dangerous thing. So many times when I answer a question I end it with: "Keep me posted" "Keep us updated" "Good Luck" In the same light there have been plenty of times when I have had my questions answered by people like EGOL, RobertFIsher, KeriMorget and they have said that they have wanted me to keep them posted. Now sure, some of you are sitting there being Melancholy Mozzers thinking that all those wonderful people are just being polite by adding some nice parting words... but what if I they are being Sincere Seoers? What if I WANT to fill them in and don't want to ask a question? What if... THEY REALLY LIKE ME?!?! I could also send a DM to those 3 people, I mean let's get honest I'm not THAT popular.... but I can dream can't I? Again, this isn't really an SEO question, more of a thought on where I'm supposed to give updates on my project. In all honesty I do want feedback on what I'm doing, and sometimes I don't even know what questions I'm supposed to ask!
Moz Bar | | HashtagHustler1 -
Checking Do-follow using Moz Bar
Does anyone know how to check if a link is a do-follow or no-follow using the Moz bar? I believe there's a function that highlights and color codes the link so as to tell? Can't seem to turn this function on in my Moz Bar settings. Am I missing something completely obvious?
Moz Bar | | Gavo0 -
Ajax #! URL support?
Hi Moz, My site is currently following the convention outlined here: https://support.google.com/webmasters/answer/174992?hl=en Basically since pages are generated via Ajax we are setup to direct bots that replace the #! in a url with ?escaped_fragment to cached versions of the ajax generated content. For example, if the bot sees this url: http://www.discoverymap.com/#!/California/Map-of-Carmel/73 it will replace it will instead access the page: http://www.discoverymap.com/?escaped_fragment=/California/Map-of-Carmel/73 In which case my server serves the cached html instead of the live page. This is all per Googles direction and is indexing fine. However the MOZ bot does not do this. It seems like a fairly straight-forward feature to support. Rather than ignoring the hash, you look to see if it is a #! and then try to spider the url replaced with ?escaped_fragment. Our server does the rest. If this is something MOZ plans on supporting in the future I would love to know. If there is other information that would be great. Also, pushstate is not practical for everyone due to limited browser support, etc. Thanks, Dustin Updates: I am editing my question because it won't let me respond to my own question. It says I need to sign up for MOZ analytics. I was signed up for Moz Analytics?! Now I am not? I responded to my invitation weeks ago? Anyway, you are misunderstanding how this process works. There is no site-map involved. The bot reads this URL on the page: http://www.discoverymap.com/#!/California/Map-of-Carmel/73 And when it is ready to spider the page for content it, it spider's this URL instead: http://www.discoverymap.com/?escaped_fragment=/California/Map-of-Carmel/73 The server does the rest, it is simply telling Roger to recognize the #! format and replace it with ?escaped_fragment Though I obviously do not know how Roger is coded but it is a simple string replacement. Thanks.
Moz Bar | | oneactlife0