Crawl Diagnostics - Crawling way more pages than my site has?
-
Hello all,
I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform).
Any reasons this would be happening?
-
Ok - Here is an update. I found that it has a basketful of entries for each Category and I have a pretty good list of categories.
Attached is an image showing what is happening in one category. There is an entry for each sort option which I understand where this is coming from (Sort Name, Sort Price Ascending, Sort Price Descending) what i don't understand are all the "rw=1" entries. And why they stack up like they do.
Is this an issue? I am assuming it is because there seems to be no real reason for it.
-
Thanks to both of you. I will start to dig in to your suggested steps later today.
I just took this one and they really don't have anything set-up. I just got them set-up on Webmaster tools as well so not even sure if they had their site indexed before.
The Crawl Diagnostics doesn't show much duplicate content (60 pages?) but the Too Many On Page Links, Overly Dynamic URL, Duplicate Title, Long URL warnings are all showing 6000-10000 pages.
The site sells crystals, each item is unique and as I did my first review they don't really even have item descriptions written let alone page titles and meta-descriptions.
I am in analysis mode working up my comments in review and detailing an action plane to help them focus moving forward. I was just shocked by the 10,000 pages listed in one of the crawl warnings.
anyway, I'll dig into this info and let you know what I find. It's an adventure!
-
I'm guessing that as an ecommerce site you've got multiple ways to browse your content, by category / brand / special offers etc. The thing to watch out for is interesting URLs with categories or lots of parameters.As a result, chances are you've got a duplicate content problem.
As Nakul mentioned a good first step is to take a look at your crawl report or use one of the tools he mentioned to see if you've got the same content being indexed multiple times.
Once you've done that, check is to see how many of these pages being crawled are appearing in Google's index. Is Google doing a reasonable job identifying the right version? How many pages are there in the index. Are recently added products being discovered quickly?
The Site: operators will be your friend here and Dr Pete did a great article on ways you can use it.
http://www.seomoz.org/blog/25-killer-combos-for-googles-site-operator
Once you understand what is being crawled and what's making it to the index you need to decide what pages you really do want to be indexed and make sure that these become the canonical versions and block parts of your site using robots.txt. (But understand the problem and what you want to achieve before you start doing this.)
Hope this helps.
<object id="plugin0" style="position: absolute; z-index: 1000;" width="0" height="0" type="application/x-dgnria"><param name="tabId" value="ff-tab-10"> <param name="counter" value="138"></object>
-
You can download the entire crawl and see if there's actually that many pages. Or post the URL here.
You can also test using a crawling software tool like Xenu or Screaming Frog to test it.
You can also post/private message the link here and I can take a look.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical URLs all show trailing slash on main site pages - using Yoast SEO for Wordpress - how to correct
We are using Yoast for a number of our sites. We use naked domain as the canonical. I have noticed in the header tags that all our sites show the canonical URLs as having a trailing slash: Example: http;//foxspizzajc.com, when I look at the source code, it shows the canonical as http;//foxspizzajc.com/ Of course, it is much more likely that all sites that link to us will not use the trailing slash - so preferably we do not want that to be the canonical - among other reasons. Does this need to be fixed so the trailing slash is removed? I cannot see how to do this in Yoast SEO or in Permalinks structure for Wordpress. Sorry for my ignorance. Thanks for any help.
Moz Pro | | Adam_RushHour_Marketing1 -
Open site explorer
"Unable to retrieve linking pages on this anchor at this time." This is the notice I get when trying to see links for anchor text. Can someone help?
Moz Pro | | Joseph-Green-SEO0 -
Only few pages (308 pages of 1000 something pages) have been crawled and diagnosed in 4 days, how many days till the entire website is crawled complete?
Setup campaign about 4-5 days ago and yesterday rogerbot said 308 pages were crawled and the diagnostics were provided. This website has over 1000+ pages and would like to know how long it would take for roger to crawl the entire website and provide diagnostics. Thanks!
Moz Pro | | TejaswiNaidu0 -
Is there a way to perform a crawl diagnostics without creating a campaign?
If you wanted to perform a crawl diagnostics but your campaigns are at full capacity are you able to do this and how (or does this mean you will have to remove one campaign to make space for another)?
Moz Pro | | SarahAhmed3790 -
Too many on-page links
one of my SEOmoz pro campaigns has given me the warning: Too many on-page links and the page in question is my html sitemap. How do i resolve this because I obviously need my sitemap. How do i get around this?
Moz Pro | | CompleteOffice1 -
Can i force another crawl on my site to see if it recognizes my changes?
i had a problem w/dup content and titles on my site, i fixed them immediately and im wondering if i can run another crawl on my site to see if my changes were recognized thanks shaun
Moz Pro | | daugherty0 -
When to stop link building because page authority is low - open site explorer
Hi, I'm link building with Open Site Explorer. I'm really picky in get links from only high quality sites. When do you stop going down the list of possible backlink providers because the page authority is too low. I usually stop at 40, but what do you do, why, and what does it depend on?
Moz Pro | | BobGW0