Crawl Diagnostics - Crawling way more pages than my site has?
-
Hello all,
I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform).
Any reasons this would be happening?
-
Ok - Here is an update. I found that it has a basketful of entries for each Category and I have a pretty good list of categories.
Attached is an image showing what is happening in one category. There is an entry for each sort option which I understand where this is coming from (Sort Name, Sort Price Ascending, Sort Price Descending) what i don't understand are all the "rw=1" entries. And why they stack up like they do.
Is this an issue? I am assuming it is because there seems to be no real reason for it.
-
Thanks to both of you. I will start to dig in to your suggested steps later today.
I just took this one and they really don't have anything set-up. I just got them set-up on Webmaster tools as well so not even sure if they had their site indexed before.
The Crawl Diagnostics doesn't show much duplicate content (60 pages?) but the Too Many On Page Links, Overly Dynamic URL, Duplicate Title, Long URL warnings are all showing 6000-10000 pages.
The site sells crystals, each item is unique and as I did my first review they don't really even have item descriptions written let alone page titles and meta-descriptions.
I am in analysis mode working up my comments in review and detailing an action plane to help them focus moving forward. I was just shocked by the 10,000 pages listed in one of the crawl warnings.
anyway, I'll dig into this info and let you know what I find. It's an adventure!
-
I'm guessing that as an ecommerce site you've got multiple ways to browse your content, by category / brand / special offers etc. The thing to watch out for is interesting URLs with categories or lots of parameters.As a result, chances are you've got a duplicate content problem.
As Nakul mentioned a good first step is to take a look at your crawl report or use one of the tools he mentioned to see if you've got the same content being indexed multiple times.
Once you've done that, check is to see how many of these pages being crawled are appearing in Google's index. Is Google doing a reasonable job identifying the right version? How many pages are there in the index. Are recently added products being discovered quickly?
The Site: operators will be your friend here and Dr Pete did a great article on ways you can use it.
http://www.seomoz.org/blog/25-killer-combos-for-googles-site-operator
Once you understand what is being crawled and what's making it to the index you need to decide what pages you really do want to be indexed and make sure that these become the canonical versions and block parts of your site using robots.txt. (But understand the problem and what you want to achieve before you start doing this.)
Hope this helps.
<object id="plugin0" style="position: absolute; z-index: 1000;" width="0" height="0" type="application/x-dgnria"><param name="tabId" value="ff-tab-10"> <param name="counter" value="138"></object>
-
You can download the entire crawl and see if there's actually that many pages. Or post the URL here.
You can also test using a crawling software tool like Xenu or Screaming Frog to test it.
You can also post/private message the link here and I can take a look.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
"On-Page Report Card"- why is still showing " F grade" after introducing the keyword in page and title.
Hello, "On-Page Report Card"- why is still showing " F grade" after introducing the keyword in page and title. After changing the title and putting the keyword inside the title, in this section, "Exact Keyword Usage in Page Title", it shows the first title, without updating my changes. I have updated several times. In some cases worked, in this case doesn't. For example "online project management software" grades F, and "project management software" grades A, even if I've put the "online" word in title an so on. Now I have the same issue with "stock management software" which grades F. "stock management" grades A, even if i've put exactly "stock management software" thanks.
Moz Pro | | directspark0 -
How do YOU use site explorer?
I normally use open site explorer to identify links that competitors of my clients have and sometimes this gives me what I call 'some low hanging fruit' to go after. (and of course links that are more challenging to get) I don't know why this didn't occur to me sooner. If my client is a chiropractor why not look at the links for 50 or 100 of the top rankings chiropractic sites all over the US? This would HAVE to uncover a wealth of blogs to comment on that have good authority, great industry associations, publications, forums - a whole wealth of items. It made me wonder how many people use site explorer like I have been (top 3-4 competitors that your client has) or identifying links pointing to LOTS of competitors? How do you use it? Couldn't you almost base an entire link building campaign using OSE? Why would this be a bad idea if not? Just some random thoughts. THE WEEKEND IS ALMOST HERE - Have a great day everybody! 🙂
Moz Pro | | Mrupp441 -
Crawl Diagnostics Warnings - Duplicate Content
Hi All, I am getting a lot of warnings about duplicate page content. The pages are normally 'tag' pages. I have some news stories or blog posts tagged with multiple 'tags'. Should I ask google not to index the tag pages? Does it really affect my site? Thanks
Moz Pro | | skehoe0 -
SEOMOZ Crawl Test
Guys I really have an issue that i know have but cannot see if that makes sense. Basically 3 months ago i did a site wide 301 from economyleasinguk.co.uk to www.economy-car-leasing.co.uk Every thing looks good get all the correct header responses , all canonicals work perfectly , Google webmaster tools is updated fetch as google bot shows the old site is 301 I tried the seomoz crawl test today on the old domain and got this message Oh no! Looks like the page you were trying to access is temporarily down which at first thought ok because the site was not there it wont do it on an old 301 domain, however i tried it on a domain i know has just been 301'd and i got this message The URL http://www.site1.com/ redirects to http://site2.com/. Do you want to crawl http://site2.com/ instead?
Moz Pro | | kellymandingo
Would you like to:
Continue with www.site1.com
Continue with site2.com I really do not know what to do, its either the redirect script is missing something however its doing what it should or the server is a problem but again its doing what it should so why would SEOMOZ not be able to crawl the old URL like it example site above. Now the strange thing is Open Site Explorer does see the 301 and asks if i want to check the new URL instead Ps the redirect is done using PHP redirect which i am asking him to change to a htaccess as its now on a apache server and was wondering if this could be an issue, all pages go to correct pages as requested Thanks in Advance1 -
Third crawl of my sites back to 250 pages
Hi all, I've been waiting some days for the third crawl of my sites, but SEOMOZ only crawled 277 pages. The next phrase appeared on my crawl report: Pages Crawled: 277 | Limit: 250 My last 2 crawls were of about 10K limit. Any idea? Kind regards, Simon.
Moz Pro | | Aureka0 -
"Issue: Duplicate Page Content " in Crawl Diagnostics - but sample pages are not related to page indicated with duplicate content
In the crawl diagnostics for my campaign, the duplicate content warnings have been increasing, but when I look at the sample pages that SEOMoz says have duplicate content, they are completely different pages from the page identified. They have different Titles, Meta Descriptions and HTML content and often are different types of pages, i.e. product page appearing as having duplicate content vs. a category page. Anyone know what could be causing this?
Moz Pro | | EBCeller0 -
Is there a way to see what keywords users of my site are using to find it online?
Since Google Analytics no longer shows the keywords used by people to find a site online, does the SEOMoz toolset provide somethng to show this data?
Moz Pro | | Mionkeybot0 -
Crawl Diagnostics shows two title and meta tag errors but they are false positives.
I got one hit each on "Missing Meta Description Tag" and "Title Missing or Empty" but in the source of my page they are clearly there: <title>Protein Powder | Compare and Get the Best Prices</title> <meta name="keywords" content="protein powder, whey protein, protein supplement, whey protein isolate, hydrolyzed whey" /> I understand there are conventions which may or may not be followed by Drupal (I read an earlier question where ordering and W3C conventions were suggested) but i'm not sure how to fix them given Drupal will just overwrite any hand editing the next time something is built and importantly, I can't get the crawl to work on cue - it works on the automatic once a week crawl in the main campaign summary but every time I've specifically used the Crawl Test tool it gives me a "There was an error submitting your request to the crawler. Please try again later" so I can't really test any changes. Given Google seems to be recognising the title tag - ie showing it in the results - Do I put this down as seomoz just not working? Kind Regards, Brian
Moz Pro | | btrr690