Crawl Diagnostics - Crawling way more pages than my site has?
-
Hello all,
I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform).
Any reasons this would be happening?
-
Ok - Here is an update. I found that it has a basketful of entries for each Category and I have a pretty good list of categories.
Attached is an image showing what is happening in one category. There is an entry for each sort option which I understand where this is coming from (Sort Name, Sort Price Ascending, Sort Price Descending) what i don't understand are all the "rw=1" entries. And why they stack up like they do.
Is this an issue? I am assuming it is because there seems to be no real reason for it.
-
Thanks to both of you. I will start to dig in to your suggested steps later today.
I just took this one and they really don't have anything set-up. I just got them set-up on Webmaster tools as well so not even sure if they had their site indexed before.
The Crawl Diagnostics doesn't show much duplicate content (60 pages?) but the Too Many On Page Links, Overly Dynamic URL, Duplicate Title, Long URL warnings are all showing 6000-10000 pages.
The site sells crystals, each item is unique and as I did my first review they don't really even have item descriptions written let alone page titles and meta-descriptions.
I am in analysis mode working up my comments in review and detailing an action plane to help them focus moving forward. I was just shocked by the 10,000 pages listed in one of the crawl warnings.
anyway, I'll dig into this info and let you know what I find. It's an adventure!
-
I'm guessing that as an ecommerce site you've got multiple ways to browse your content, by category / brand / special offers etc. The thing to watch out for is interesting URLs with categories or lots of parameters.As a result, chances are you've got a duplicate content problem.
As Nakul mentioned a good first step is to take a look at your crawl report or use one of the tools he mentioned to see if you've got the same content being indexed multiple times.
Once you've done that, check is to see how many of these pages being crawled are appearing in Google's index. Is Google doing a reasonable job identifying the right version? How many pages are there in the index. Are recently added products being discovered quickly?
The Site: operators will be your friend here and Dr Pete did a great article on ways you can use it.
http://www.seomoz.org/blog/25-killer-combos-for-googles-site-operator
Once you understand what is being crawled and what's making it to the index you need to decide what pages you really do want to be indexed and make sure that these become the canonical versions and block parts of your site using robots.txt. (But understand the problem and what you want to achieve before you start doing this.)
Hope this helps.
<object id="plugin0" style="position: absolute; z-index: 1000;" width="0" height="0" type="application/x-dgnria"><param name="tabId" value="ff-tab-10"> <param name="counter" value="138"></object>
-
You can download the entire crawl and see if there's actually that many pages. Or post the URL here.
You can also test using a crawling software tool like Xenu or Screaming Frog to test it.
You can also post/private message the link here and I can take a look.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Not Cached By Google
Hi My name is Apoorwa and i have my own website, My website is not cached by google, Why is this problem happening with my site.can somebody help me please? its urgent....this is my site - http://www.holifestival.org/Please assist me.......
Moz Pro | | Packersmove0 -
Duplicate Page Content on pages that appear to be different?
Hi Everyone! My name's Ross, and I work at CHARGED.fm. I worked with Luke, who has asked quite a few questions here, but he has since moved on to a new adventure. So I am trying to step into his role. I am very much a beginner in SEO, so I'm trying to learn a lot of this on the fly, and bear with me if this is something simple. In our latest MOZ Crawl, over 28K high priority issues were detected, and they are all Duplicate Page Content issues. However, when looking at the issues laid out, the examples that it gives for "Duplicate URLs" under each individual issue appear to be completely different pages. They have different page titles, different descriptions, etc. Here's an example. For "LPGA Tickets", it is giving 19 Duplicate URLs. Here are a couple it lists when you expand those:
Moz Pro | | keL.A.xT.o
http://www.charged.fm/one-thousand-one-nights-tickets
http://www.charged.fm/trash-inferno-tickets
http://www.charged.fm/mylan-wtt-smash-hits-tickets
http://www.charged.fm/mickey-thomas-tickets Internally, one reason we thought this might be happening is that even though the pages themselves are different, the structure is completely similar, especially if there are no events listed or if there isn't any content in the News/About sections. We are going to try and noindex pages that don't have events/new content on them as a temporary fix, but is there possibly a different underlying issue somewhere that would cause all of these duplicate page content issues to begin appearing? Any help would be greatly appreciated!0 -
How do I retrieve crawl and ranking data about a site from the past?
Hey. One of my main clients has asked to see the crawl data and rankings data for the past eight months. He wants to have tangible evidence of the effects of Penguin. I would like that info too. Is it possible to retrieve that information on a weekly crawl and ranking basis through SEO Moz and if so, how do you do it? I simply want to show a graph, timeline and brief explanation across several main keywords... Help me as you guys always do - You rock Best Ben
Moz Pro | | creativeguy0 -
Crawl Diagnostics Error Spike
With the last crawl update to one of my sites there was a huge spike in errors reported. The errors jumped by 16,659 -- majority of which are under the duplicate title and duplicate content category. When I look at the specific issues it seems that the crawler is crawling a ton of blank pages on the sites blog through pagination. The odd thing is that the site has not been updated in a while and prior to this crawl on Jun 4th there were no reports of these blank pages. Is this something that can be an error on the crawler side of things? Any suggestions on next steps would be greatly appreciated. I'm adding an image of the error spike Xovep.jpg?1 Xovep.jpg?1
Moz Pro | | VanadiumInteractive1 -
Why would the SEOMoz Page analysis pick up exact keywords used in page title and text?
Hi, I am trying to optimise this URL : www.adaptiveconsultancy.com/ecommerce/features/advanced-ecommerce with the keyword being 'advanced ecommerce' With the 'On-Page Report Card' from SEOMoz that the exact keyword isn't featured in the page title or text, but it is in there. Why would this not be picked up? Thank you in advance,
Moz Pro | | adaptiveconsultancy
M0 -
Only 1 page has been crawled. Why?
I set a new profile up a fortnight ago. Last week seomoz crawled the entire site (10k pages), and this week has only crawled 1 page. Nothing's changed on the site that I'm aware of, so what's happened?
Moz Pro | | tompollard0 -
Links listed in MozPro Crawl Diagnostics
Ok, seeing as I'm getting to the end of my first week as a Pro Member, I'm getting more and more feedback regarding the pages on my site. I'm slightly concerned though that, having logged in this morning, I'm being shown 407 warnings for pages with 'Too Many On Page Links.' According to the blurb at the top of the page, 'Too Many' is generally defined as being over 100 links on a page ... but when I look at the pages which are being thrown up in the report, none of them contain anywhere near 100 links. I seriously doubt there is a glitch with the tool which has led me to think that maybe there's an issue with the way my site is coded. Is anyone aware of a coding problem that may lead Google and SEOMoz to suspect that I have a load of links across my site? P.S. As an aside, when this tool mentions 'Too Many Links' is it referring purely to OBL or does it count links to elsewhere on my domain too? Cheers,
Moz Pro | | theshortstack0 -
Is there any way to manually initiate a crawl through SEOMoz?
... or do you actually have to wait a week for the next scheduled crawl date on a particular campaign? We've just made a ton of changes to our site, and it would be helpful to know if they will generate any warnings or errors sooner rather than later. Thanks!
Moz Pro | | jadeinteractive1