Why the number of crawled pages is so low¿?
-
Hi, my website is www.theprinterdepo.com and I have been in seomoz pro for 2 months.
When it started it crawled 10000 pages, then I modified robots.txt to disallow some specific parameters in the pages to be crawled.
We have about 3500 products, so thhe number of crawled pages should be close to that number
In the last crawl, it shows only 1700, What should I do?
-
Hi levelencia1,
This could have been caused by many factors. Was the robots.txt the only change you made? Other things that could have caused it could have been meta "noindex" tags, nofollow links, or broken navigation structures.
In rare instances, sometimes rogerbot has a hiccup.
Let us know if things return to normal on your next crawl. If you have any difficulties feel free to contact the help team (help@seomoz.org) and they should be able to get things straightened out.
Best of luck with your SEO!
-
levalencia1
Still don't know what you wanted to accomplish with Robots re: I modified robots.txt to disallow some specific parameters in the pages to be crawled.
Go to GWMT: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449&from=35237&rd=1
This will allow you to determine what your robots.txt accomplished or not:
The Test robots.txt tool will show you if your robots.txt file is accidentally blocking Googlebot from a file or directory on your site, or if it's permitting Googlebot to crawl files that should not appear on the web. When you enter the text of a proposed robots.txt file, the tool reads it in the same way Googlebot does, and lists the effects of the file and any problems found.
Hope it helps you out,
-
Sorry, This one got lost. I will look at it in the a.m. and give you the feedback. Have you run anything like Xenu on the site? Do you know what is not showing up that would be outside of the robots.txt?
-
Sorry, This one got lost. I will look at it in the a.m. and give you the feedback. Have you run anything like Xenu on the site? Do you know what is not showing up that would be outside of the robots.txt?
-
ANY IDEA?
-
this is my robots.txt
User-agent: * Disallow: */product_compare/* Disallow: *dir=* Disallow: *order=*
-
levalencia1
What did you disallow?
Are there specific categories or products you know are missing?
Is there a specific sub directory(s) that is missing?
What is it you wanted to block with robots?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why are crawlers not picking up these pages?
Hi there, I've been asked to audit a new subdomain for a travel company. It's all a bit messy, so it's going to take some time to remedy. However, one thing I couldn't understand was the low number of pages appearing in certain crawlers. The subdomain has many pages. A homepage, category pages then product pages. Unfortunately, tools like Screaming Frog and xml-sitemaps.com are only picking up 19 pages and I can't figure out why. Google has so far indexed around 90 pages - this is by no means all of them, but that's probably because of the new domain and lack of sitemap etc. After looking at the crawl results, only the homepage and category (continent pages) are showing. So all the product pages are not. for example, tours.statravel.co.uk/trip/Amsterdam_Kings_Day_(Start_London_end_London)-COCCKDM11 is not appearing in the crawl results. After reviewing the source code, I can't see anything that would prevent this page being crawled. Am I missing something? At the moment, the crawl should be picking up around 400+ product pages, but it's not picking up any. Thanks
Technical SEO | | PeaSoupDigital0 -
Is it easier to rank high with a front page than a landing page?
My product is laptop and of cause, I like to rank high for the keyword "laptop". Do any of you know if the search engines tends to rank a front page higher than a landing page? Eg. www.brand.com vs. www.brand.com/laptop
Technical SEO | | Debitoor0 -
Page Speed or Size?
Hi everyone. I have a client who really wants to add a 1min html5 video to the background of their homepage. I have managed to reduce the size of the video to 20MB and I have tested the page in pingdom. The results are 1.85 s to load, and weighed in at 21.2 MB. My question is does Google factor page load speed or size in it's ranking factors? I am also mindful of the negative effect this could have on bounce rate. Thanks.
Technical SEO | | WillWatrous0 -
Banned Page
I have been using a 3rd party checker on indexed pages in google. It has shown several banned pages. I type the page in and it comes up. But it is nowhere to be found for me to delete it. It is not in the wordpress pages. It also shows up in the duplicate content section in my campaigns in moz.com. I can find the page to delete it. If it is banned then I do not want to redirect it to the correct page. Any ideas on how to fix this?
Technical SEO | | Roots70 -
Home page URL
Hi, I work on this site: http://www.towerhousetraining.co.uk/about-us. This is the home page URL. Should this be 301'd to: http://www.towerhousetraining.co.uk? I have created a site map, which I submitted to Google Webmaster Tools, which includes these URL's: /about-us, /training-we-offer & /contact-us. There are a total of 3 pages on the website. Webmaster tools has only indexed 2 out of 3 pages. I think this is something to do with the /about-us URL, as when I do a site: search, these pages appear: www.towerhousetraining.co.uk/, /training-we-offer & /contact-us. I am not sure why Google has indexed the home page as www.towerhousetraining.co.uk/ and not /about-us? Is it a bad idea in general not to have your homepage as your root domain? I added a to the homepage, but am wondering if this was the right thing to do? Any help would be appreciated.
Technical SEO | | CWseo0 -
Two Domains for the Same Page
We are creating a website for a client that will have hundreds of geographically driven landing pages. These pages will all have a similar domain structure. For example www.domain.com/georgia-atlanta-fastfood-121 We want the domain to be SEO friendly, however it also needs to be print friendly for a business card. (ex www.domain.com/121) The client has requested that we have two domains for each page. One for the Search Engines and then another shorter one for print/advertising purposes. If we do that will search engines the site for duplicate content? I really appreciate any recommendations. Thanks! Anna
Technical SEO | | TracSoft0 -
Indexed pages and current pages - Big difference?
Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference? The domain canonical is set so can't really figure out if we've got a problem or not?
Technical SEO | | Nathan.Smith0 -
Home page penalty?
What does it mean when your home page has a penalty? I have a site that has good rankings for many pages, but my home page seems to be penalized by Google. I tried searching for my home page URL in Google, www.xxxxxx.com and my page doesn't show up, but sub pages do show up? What would cause this penalty and how do you correct this issue.
Technical SEO | | tadden0