Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have had a huge increase in direct traffic to our website but not sure why this suddenly happened? (no promos during this time period)
I have had a huge increase in direct traffic to our website but not sure why this suddenly happened? (no promos during this time period), traffic up 200%+ according to Google Analytics
Reporting & Analytics | | Julia_a1a1 -
How to analysis Product List position in Analytic for 2nd Page Clicks?
Hi All, While implementing Product list position when user move to 2nd page then position again start from 1st position so how I will know in google analytic that 1st Position clicked from 1st page, 2nd page, 3rd page etc and same way how do i know if page is sorted with highest or lowest price? I am talking about given below position. dataLayer.push({ 'ecommerce':{ 'currencyCode':'EUR', // Local currency is optional. 'impressions':[ { 'name':'Triblend Android T-Shirt', // Name or ID is required. 'id':'12345', 'price':'15.25', 'brand':'Google', 'category':'Apparel', 'variant':'Gray', 'list':'Search Results', 'position':1 }, Regards, Mitesh
Reporting & Analytics | | Arnold30 -
Visits from Google with ccTLD are showing as referrals
Hi!I was seeing on of my clients Analytics report and it shows that some of the main sites that send visits and get tagged as referral traffic are google.com.br, google.cl, google.com.ar, among others. Do you know why is this happening? Shouldn't they get tagged by default as organic?
Reporting & Analytics | | arielbortz0 -
Showing Button Clicked in Google Analytics - How to check which button has been clicked?
For my E commerce site in my google analytic in Event Action it is showing "Button Clicked" and Total Events 4005, so i want to know visitors have clicked on which buttons?
Reporting & Analytics | | bkmitesh0 -
Why is Google Analytics reporting 20% fewer goals than Unique pageviews of same thank you page?
This is really puzzling me and my research has not thrown out the answer. I have always understood URL goals to be unique pageviews of the thank you page you are tracking. UPVs and goals should both only be counted once per session... Has anyone else seen this issue? Goals were not set up historically so I wanted to use unique pageviews of the thank you page for year on year comparisons, but 20% is a big difference! Background There are multiple pages to track so goal is set up using Regex There is no mistake in the goal set up (honest!) The goal URLs all match the unique pageview URLs, there are no rogue URLs There has been no change to the site or the tracking set up Data is not being sampled It's a lead gen site in an area where multiple enquiries within one visit would be very unusual Thanks in advance!
Reporting & Analytics | | McCannSEO0 -
Landing page URL appearing as keyword
Hi Mozers, I've recently experienced the URLs of my key landing pages coming up as keywords. This has been on the rise since early July (when it was relatively insignificant) to the current position (see image below) where they make up the majority of my top keywords. Drilling down into a bit more detail, this seems to be almost exclusively Desktop traffic but in terms of Technology there are no clear standouts (seems to be mostly Windows OS and Chrome). Has anyone else been experiencing this?
Reporting & Analytics | | mopland0 -
Get a list of robots.txt blocked URL and tell Google to crawl and index it.
Some of my key pages got blocked by robots.txt file and I have made required changes in robots.txt file but how can I get the blocked URL's list. My webmaster page Health>blocked URL's shows only number not the blocked URL's.My first question is from where can I fetch these blocked URL's and how can I get them back in searches, One other interesting point I see is that blocked pages are still showing up in searches.Title is appearing fine but Description shows blocked by robots.txt file. I need urgent recommendation as I do not want to see drop in my traffic any more.
Reporting & Analytics | | csfarnsworth0 -
SEOMoz & GWT crawl error conflicting info
Site im working on has zero crawl errors according to SEOMoz (it did previously have lots since ironed out) but now looking at GWebmaster Tools saying 5000 errors. Date of those are not that recent but Webmaster Tools line graph of errors still showing aprox 5000 up to yesterday There is an option to bulk action/tick them all as fixed so thinking/hoping GWT just keeping a historical record that can now be deleted since no longer applicable. However i'm not confident this is the case since still showing on the line graph. Any ideas re this anomalous info (can i delete and forget in GWT) ? Also side question I take it its not possible to link a GA property with a GWT account if created with different logins/accounts ? Many Thanks Dan
Reporting & Analytics | | Dan-Lawrence0