Crawl Diagnostics - Crawling way more pages than my site has?
-
Hello all,
I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform).
Any reasons this would be happening?
-
Ok - Here is an update. I found that it has a basketful of entries for each Category and I have a pretty good list of categories.
Attached is an image showing what is happening in one category. There is an entry for each sort option which I understand where this is coming from (Sort Name, Sort Price Ascending, Sort Price Descending) what i don't understand are all the "rw=1" entries. And why they stack up like they do.
Is this an issue? I am assuming it is because there seems to be no real reason for it.
-
Thanks to both of you. I will start to dig in to your suggested steps later today.
I just took this one and they really don't have anything set-up. I just got them set-up on Webmaster tools as well so not even sure if they had their site indexed before.
The Crawl Diagnostics doesn't show much duplicate content (60 pages?) but the Too Many On Page Links, Overly Dynamic URL, Duplicate Title, Long URL warnings are all showing 6000-10000 pages.
The site sells crystals, each item is unique and as I did my first review they don't really even have item descriptions written let alone page titles and meta-descriptions.
I am in analysis mode working up my comments in review and detailing an action plane to help them focus moving forward. I was just shocked by the 10,000 pages listed in one of the crawl warnings.
anyway, I'll dig into this info and let you know what I find. It's an adventure!
-
I'm guessing that as an ecommerce site you've got multiple ways to browse your content, by category / brand / special offers etc. The thing to watch out for is interesting URLs with categories or lots of parameters.As a result, chances are you've got a duplicate content problem.
As Nakul mentioned a good first step is to take a look at your crawl report or use one of the tools he mentioned to see if you've got the same content being indexed multiple times.
Once you've done that, check is to see how many of these pages being crawled are appearing in Google's index. Is Google doing a reasonable job identifying the right version? How many pages are there in the index. Are recently added products being discovered quickly?
The Site: operators will be your friend here and Dr Pete did a great article on ways you can use it.
http://www.seomoz.org/blog/25-killer-combos-for-googles-site-operator
Once you understand what is being crawled and what's making it to the index you need to decide what pages you really do want to be indexed and make sure that these become the canonical versions and block parts of your site using robots.txt. (But understand the problem and what you want to achieve before you start doing this.)
Hope this helps.
<object id="plugin0" style="position: absolute; z-index: 1000;" width="0" height="0" type="application/x-dgnria"><param name="tabId" value="ff-tab-10"> <param name="counter" value="138"></object>
-
You can download the entire crawl and see if there's actually that many pages. Or post the URL here.
You can also test using a crawling software tool like Xenu or Screaming Frog to test it.
You can also post/private message the link here and I can take a look.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page with "Missing Title Tag" isn't a page
Hello, I am going through the various errors that the Moz Pro Crawl report and some non-existent pages keep coming up in the report. For example, one error category is "Missing Title Tag" with one page identified. But this page http://www.immigroup.com/news/“http%3A/crs.yorku.ca”?page=2 isn't real. It would have been a 404 were there not a redirect for everything that is /news/gobbledygook to /news. So my question is: when moz (or GA for that matter) identifies these pages as "real" and having errors, do I need to take this seriously? And what do I do about it? Thanks! George
Moz Pro | | canadageorge0 -
Duplicate Page Content on pages that appear to be different?
Hi Everyone! My name's Ross, and I work at CHARGED.fm. I worked with Luke, who has asked quite a few questions here, but he has since moved on to a new adventure. So I am trying to step into his role. I am very much a beginner in SEO, so I'm trying to learn a lot of this on the fly, and bear with me if this is something simple. In our latest MOZ Crawl, over 28K high priority issues were detected, and they are all Duplicate Page Content issues. However, when looking at the issues laid out, the examples that it gives for "Duplicate URLs" under each individual issue appear to be completely different pages. They have different page titles, different descriptions, etc. Here's an example. For "LPGA Tickets", it is giving 19 Duplicate URLs. Here are a couple it lists when you expand those:
Moz Pro | | keL.A.xT.o
http://www.charged.fm/one-thousand-one-nights-tickets
http://www.charged.fm/trash-inferno-tickets
http://www.charged.fm/mylan-wtt-smash-hits-tickets
http://www.charged.fm/mickey-thomas-tickets Internally, one reason we thought this might be happening is that even though the pages themselves are different, the structure is completely similar, especially if there are no events listed or if there isn't any content in the News/About sections. We are going to try and noindex pages that don't have events/new content on them as a temporary fix, but is there possibly a different underlying issue somewhere that would cause all of these duplicate page content issues to begin appearing? Any help would be greatly appreciated!0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
Is SeoMOZ Crawl Diagnostics wrong here?
We've been getting a ton of critical errors (about 80,000) in SeoMoz' Crawl Diagnostics saying we have duplicate content in our client's E-commerce site. Some of the errors are correct, but a lot of the pages are variations like: www.example.com/productlist?page=1 www.example.com/productlist?page=2 However, in our source code we have used rel="prev" and rel="next" so in my opinion we should be alright. Would love to hear from you if we have made a mistake or if it is an error in SeoMoz. Here's a full paste of the script:
Moz Pro | | Webdannmark0 -
How do YOU use site explorer?
I normally use open site explorer to identify links that competitors of my clients have and sometimes this gives me what I call 'some low hanging fruit' to go after. (and of course links that are more challenging to get) I don't know why this didn't occur to me sooner. If my client is a chiropractor why not look at the links for 50 or 100 of the top rankings chiropractic sites all over the US? This would HAVE to uncover a wealth of blogs to comment on that have good authority, great industry associations, publications, forums - a whole wealth of items. It made me wonder how many people use site explorer like I have been (top 3-4 competitors that your client has) or identifying links pointing to LOTS of competitors? How do you use it? Couldn't you almost base an entire link building campaign using OSE? Why would this be a bad idea if not? Just some random thoughts. THE WEEKEND IS ALMOST HERE - Have a great day everybody! 🙂
Moz Pro | | Mrupp441 -
Crawl Report Warnings
How much notice should be paid to the warnings on the SEO Moz crawl reports? We manage a fairly large property site and a lot of the errors on the crawl reports relate to automated responses. As a matter of priority which of the list below will have negative affects with the search engines? Temporary RedirectToo Many On-Page LinksOverly-Dynamic URLTitle Element Too Long (> 70 Characters)Title Missing or EmptyDuplicate Page ContentDuplicate Page TitleMissing Meta Description Tag
Moz Pro | | SoundinTheory0 -
Not all pages are being crawled
I am set up on the PRO plan, I was under the impression that it would crawl up to 10,000 pages. My site has just over 200 pages, but whenever I am crawled it only crawls 121 pages. Is this normal? It's hard to know how reliable my data is because a significant amount of pages are missing.
Moz Pro | | KristinHarding0 -
Duplicate content pages
Crawl Diagnostics Summary shows around 15,000 duplicate content errors for one of my projects, It shows the list of pages with how many duplicate pages are there for each page. But i dont have a way of seeing what are the duplicate page URLs for a specific page without clicking on each page link and checking them manually which is gonna take forever to sort. When i export the list as CSV, duplicate_page_content column doest show any data. Can anyone please advice on this please. Thanks <colgroup><col width="1096"></colgroup>
Moz Pro | | nam2
| duplicate_page_content |1