Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking Standard pages with Robots.txt (t&c's, shipping policy, pricing & privacy policies etc)
Hi I've just had best practice site migration completed for my old e-commerce store into a Shopify environment and I see in GSC that it's reporting my standard pages as blocked by robots.txt, such as these below examples. Surely I don't want these blocked ? is that likely due to my migrators or s defaults setting with Shopify does anyone know? : t&c's shipping policy pricing policy privacy policy etc So in summary: Shall I unblock these? What caused it Shopify default settings or more likely my migration team? All Best Dan
Reporting & Analytics | | Dan-Lawrence0 -
UTM Links Showing Up as Separate Pages in Google Analytics
Hey everyone, I was just looking at landing pages in Google Analytics, and in addition to just the URL of the landing page, the UTM links are being listed as separate pages. Is this normal? I anticipated seeing the landing page URL and then using the secondary dimension to see source/medium. If this isn't normal, what would I check next?
Reporting & Analytics | | rachelmeyer0 -
Filter Tracking works fine at staging site but not on LIVE site why?
Hello Expert, For my ecommerce site I want to track filter url's like price range, size, width, color etc and fully filter url should display in google analytic. I have implemented filter tracking at staging server and it works perfectly but on LIVE site it not show me full filter url. Do you guys think any parameter which i have configured in search console affect this? Note - I have configured in this way - http://webmasters.stackexchange.com/questions/93008/how-to-track-a-product-filter-in-the-product-list-view-with-google-analytics My filter url's are given below. And in search console I have configure two parameters. 1) effect - Sort, Crawl - No urls 2) FT - effect- ( - ) , crawl - Let google bot decide. But as per me this parameter is for crawling should not affect tracking right? mysite.com?FP=0&filtSeq=Price&Sort=BS
Reporting & Analytics | | adamjack
mysite.com?FT=7581&filtSeq=Type&Sort=BS
mysite.com?FT=1042&filtSeq=Colour&Sort=BS In robot file nothing is block. In analytic it showing me url till mysite.com only where as in staging it shows me full filter url. Thanks!0 -
How to get multiple pages to appear under main url in search - photo attached
How do you get a site to have an organized site map under the main url when it is searched as in the example photo? SIte-map.png
Reporting & Analytics | | marketingmediamanagement0 -
Does anyone know of a way to do a profile level filter to exclude all traffic if it enters the site via certain landing pages?
Does anyone know of a way to do a profile level filter to exclude all traffic if it enters the site via certain landing pages? The problem I have is that we have several pages that are served to visitors of numerous other domains but are also served to visitors of our site. We end up with inflated Google Analytics numbers because people are viewing these pages from our partners' domains but never actually entering our site. I've made an advanced segment that serves the purpose but I'd really like to filter it at the profile level so the numbers across the board are more accurate without having to apply an advanced segment to every report. The advanced segment excludes visits that hit these pages as landing pages but includes visits where people have come from other pages on our domain. I know that you can do profile filters to exclude visits to pages or directories entirely but is there a way to filter them only if they are a landing pages? Any other creative thoughts? Thanks in advance!
Reporting & Analytics | | ATIseo0 -
What's the difference between landing pages and entrance pages on Google Analytics?
I'm confused about the difference between entrance pages and landing pages on Google Analytics. If I compare our search traffic to entrance pages with search traffic to landing pages it seems very similar -- but not identical. But this is probably because all of our GA analytics is sampled (we're a huge site). Can anyone help?
Reporting & Analytics | | CecilyP0 -
Google Analytics: Do url redirects show up
Good Morning from 6 degrees c mostly cloudy wetherby UK 🙂 A redirect has been set up with the effect that when you key in www.hyload.co.uk it forwards to www.ikogroup.co.uk My question is please...within Google analytics will referall traffic from www.hyload.co.uk show up in referral traffic or because its a redirect will it be counted as direct traffic. Thanks in advance, David
Reporting & Analytics | | Nightwing0 -
SEOMoz crawls skewing Avg Visit Duration in Google Analytics
Hello, We are a UK based company. Our Google Analytics account is showing a rise in Avg Visit Duration for 'Direct traffic' since we started using SEOMoz. Are any other users experiencing this issue? We tracked it down to City Seattle, and when doing an Advanced filter by removing Seattle, the results are normal again. What does SEOMoz or other users recommend we do besides continuously using advanced filters? We have enquired about excluding Roger's IP address, but have been told that Roger uses the Amazon cloud, so the IP is not static. See attachment for screenshot of our Google Analytics account of Avg Visit Duration since we began with SEOMoz. Rich Talbot 718g3.gif
Reporting & Analytics | | STL1