Moz Crawler URL paramaters & duplicate content
-
Hi all, this is my first post on Moz Q&A
Questions:
- Does the Moz Crawler take into account rel="canonical" for search results pages with sorting / filtering URL parameters?
- How much time does it take for an issue to disappear from the issues list after it's been corrected? Does it come op in the next weekly report?
I'm asking because the crawler is reporting 50k+ pages crawled, when in reality, this number should be closer to 1000. All pages with query parameters have the correct canonical tag pointing to the root URL, so I'm wondering whether I need to noindex the other pages for the crawler to report correct data?:
Original (canonical URL): DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas
Filter active URL: DOMAIN.COM/charters/search/mx/BS?search_location=cabo-san-lucas&booking_date=&booking_days=1&booking_persons=1&priceFilter%5B%5D=0%2C500&includedPriceFilter%5B%5D=drinks-soft
Also, if noindex is the only solution, will it impact the ranking of the pages involved?
Note: Google and Bing are semi-successful in reporting index page count, each reporting around 2.5k result pages when using the site:DOMAIN.com query. The rel canonical tag was missing for a short period of time about 4 weeks ago, but since fixing the issue these pages still haven't been deindexed.
Appreciate any suggestions regarding Moz Crawler & Google / Bing index count!
-
Happy to help!
We crawled roughly 49k pages because there were that many links on the site that we could find. 50k is also the new standard crawl limit for campaigns in Standard and Medium subscriptions. Adding a rel=canonical to a page doesn't mean it won't get crawled by our campaign crawler, only that the crawler is to refer to the canonicalized link for reporting purposes.
Without going into too specific of URL details, these pages are considered duplicates because their canonical tags point to different URLs. For example,
is considered a duplicate of
DOMAIN.COM/charters/search/mx/QR?booking_date=&booking_days=&booking_persons=limit%252525253D20
because the canonical tag for the first page is
DOMAIN.COM/charters/search/mx/QR?offset=20
while the canonical for the second URL is
DOMAIN.COM/charters/search/mx/QR
Since the canonical tags point to different pages it is assumed that DOMAIN.COM/charters/search/mx/QR?offset=20 and DOMAIN.COM/charters/search/mx/QR are likely to be duplicates themselves.
Here is how our system interprets duplicate content vs. rel=canonical:
Assuming A, B, C, and D are all duplicates,
If A references B as the canonical, then they are not considered duplicates
If A and B both reference C as canonical, A and B are not considered duplicates of each other
If A references C as a canonical, A and B are considered duplicated
If A references C as canonical, B references D, then A and B are considered duplicatesThe above example from your campaign actually falls into the fourth example I've listed above. Hope this helps clear things up
-
Thanks Sam!
I've read the post and checked my canonical tags but still can't seem to find what's causing the canonicalized pages to be indexed by RogerBot. The same page shows up in Moz's crawl test 100 times with slightly different parameters.
I'll keep investigating but some specific feedback from Moz staff would be appreciated
-
Hi!
I'm going to leave the strategy discussion open to the community but from a technical standpoint, we will count rel=canonical on dynamic urls as long as they are implemented correctly. Dr. Pete has a great post where he talks about canonicals that might be helpful as well. Updates to campaigns happen on a weekly basis depending on when the campaign was created. So if it was created on a Tuesday, you'll see updated campaign data every Tuesday after. You can run a crawl test (accessible from Research Tools) to get 3k page crawls in between your updates though. Hope this helps!
-
Thanks for the info searchbuzz. So if I understand correctly, new pages are crawled and kept in the index (up to the campaign limit), but issues on indexed pages are reported separately.
My issue is that due to the dynamic URLs used in search filters on my site I actually have 49k issues detected (over 95% are duplicate content and long URL issues because the crawler is indexing the same page many times for each URL parameter combination). The crawl test can't index the entire site because it generates a huge amount of pages.
It's a travel-related website with listings in 233 cities and multiple filter functionality, so each unique 'page' of results is indexed more than 100 times, even though there's a rel="canonical" tag pointing to the non-parametrized URL of that page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Link Explorer Query Field ObscuresParts of Long URL
Hi, When using the Link Explorer to query URLs, when a long URL is entered, the field obscures the full URL eventough the field width should be able to comfortably fit the whole URL. This started to happen a few weeks ago. Hope this can be fixed soon. Thanks!
Moz Bar | | JacobMojiwat
Jake0 -
Lack of UK Keyword Volume Data In Moz
Is it just me or is there a considerable lack of keyword Volume data for UK Google search terms on Moz? I have 53 keywords and not a single one has any keyword volume data - these are not obscure terms and include the following as examples... leather satchel, leather laptop bag, satchels, leather bag, leather backpack, leather school satchel. Without this information aren't a lot of the services offered by Moz rather academic as it is impossible to know which terms are really worth targeting. What is the solution? I could use US data and hope it is similar but this seems close to a deal breaker for UK subscribers.
Moz Bar | | MrFrisbee0 -
Moz Bar not providing any data. Tried logging out/back in and un/re-installing, but no dice.
Used Mozbar for a long time, and normally works fine. Suddenly finding that it is not providing any data. All of the fields are there, but it does not provide me with PA/DA, etc, and all social metrics are at 0. This is across all sites, not just on in particular. Have tried logging out and in, deactivating and activating, and reinstalling. Nothing has worked.
Moz Bar | | SearchPros2 -
How non-US Moz customers will use Keyword Explorer after the Keyword Difficulty tool is retired?
The new Moz Keyword Explorer looks good but its search volume is US based and completely useless for non-US websites. This is from Rand's post: "while the tool can search any Google domain in any country, the volume numbers will always be for US-volume. In the future, we hope to add volume data for other geos as well." In the Keyword Difficulty tool, Moz shows Google search volume data, which is similar to what I see in the Google Keyword Planner and Google Search Console. For example, keyword X in the Australian search market has 6-7k searches in the Google Keyword Planner and 8k searches in Moz. The very same keyword has 118k-300k search volume in the new Keyword Explorer! Obviously this new search volume is not useful in the Australian market. I often used the Keyword Difficulty tool to identify new keyword opportunities but what can I do to complete the same tasks after they retire the tool?
Moz Bar | | Gyorgy.B2 -
I update content and then craw but the MOZ spider still shows old content. Do I need to update something else?
"This shows but was replaced a day before I ran Moz crawer: | We provide a full service for low cost automated phone calls, robocalls, Bulk SMS service, Political robo calls without needing computer skills | "
Moz Bar | | ThomasDaBomb
I look in the link on website and see:
<title>Our customers talk about: Currently the tremendous growth of organi</title> Why does the craw not reflect the current content? Thanks.
Thomas0 -
500 errors showing up differently on moz and google wmt
Lately, I've been having the issue of a large increase in 500 errors. These errors seem to be intermittent, in other words, Google and Moz are showing that I have server 500 errors for many pages but, when I actually check the links, everything's fine. I've run tests to see if there is any virus on the server or if I have any corrupt files and as far as I can tell, there are none. I'm left with the possibility that maybe one of my plugins is causing this issue (I'm built on top of Wordpress). Moz is showing that I had nearly five hundred 500 server errors on the 12th or the 11th. On the other hand, Google shows that on the 13th I had 179 server errors and then an additional 200 for the 15th. I'm assuming Google is slow to find or report these things? I would like to know which is more reliable so that I can try to figure out which of these plugins may be causing the problem, if any or if I'm investigating this the wrong way, I'd love to have more suggestions. Thanks in advance! Sorry, the url is http://www.heartspm.com if you'd like to take a look.
Moz Bar | | GerryWeitz0 -
Moz crawler
I have a site which is in a non production status. Crawlers are blocked vis robot txt. User-agent: *
Moz Bar | | Emanuele_Ricci
Disallow: / I WANT TO MAKE A CRAWLING TEST WITH MOZ CRAWLER (RogerBot) ,
how can I allow your crawler to get in and prevent other crawlers from indexing the site? Thanks memok0 -
New website launched 3 weeks ago but not sure what Moz Analysis is telling me?
Hi 3 weeks ago I re-branded and completely changed the proposition on www.over50choices.co.uk as the URL is appropriate to the proposition. And whilst it looks good I am unsure what the Moz Tools are telling me, other than it doesnt look good? There should be about 163 pages, but Moz reports over 500 due to i think so many 301's? Plus most pages under the HTTP Status reports No Data! Most of the the 404's are from the old site. Webmaster Tools say 163 pages submitted but only 2 indexed, but on site:search it says about 117 but 40+ of these are the old site. It looks a mess...! Sleepless nights have started - any thoughts!? Thanks Ash
Moz Bar | | AshShep10