Google only crawling a small percentage of the sitemap
-
Hi,
The company which I work for have developed a new website for a customer, there URL is https://www.wideformatsolutions.co.uk I've created a sitemap which has 25,555 URL's. I submitted this to Google around 4 weeks ago and the most crawls that have ever occurred has been 2,379.
I've checked everything I can think of, including;
- Speed of website
- Canonical Links
- 404 errors
- Setting a preferred domain
- Duplicate content
- Robots Txt
- .htaccess
- Meta Tags
I did read that Matt Cutts revealed in an interview with Eric Enge that the number of pages Google crawls is roughly proportional to your pagerank. But I'm sure it should crawl more than 2000 pages.
The website is based on Opencart, if anyone has experienced anything like this I would love hear from you.
-
No problem! I meant to mention this in my first comment, but I also noticed that there's no robots.txt file in place. That's obviously not going to help your indexation problem too much, but nonetheless something you should know about.
-
I did have some issues with this when we first launched the site, I will try and look into it further now. The HTTPS certificate is fairly new.
Thanks for commenting
-
Looks to me like Google can't properly access your XML sitemap. I tried to put it into 2 different validator tools and URI Valet and none of those tools were able to access it. It could be something with HTTPS. Did you recently switch the site over to secure?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
On our site by mistake some wrong links were entered and google crawled them. We have fixed those links. But they still show up in Not Found Errors. Should we just mark them as fixed? Or what is the best way to deal with them?
Some parameter was not sent. So the link was read as : null/city, null/country instead cityname/city
Technical SEO | | Lybrate06060 -
I have 3500 pages crawled by Google, - why is SEOMOZ only able to crawl 400 of these ?
I added my site almost two weeks ago to the PRO DashBoard, and so far only 404 pages has been crawled, - but I know for a fact that there is 3500 pages that should be crawled. Other search engines has no problem in crawling and indexing these pages, so what can be wrong here ?
Technical SEO | | haybob270 -
Google bot notification
Hi there! I've just made some changes in my website in order to optimize it but I don't know if there's a way to notify the googlebot that some aspects of the configuration (metas) have changed and must be "taken into account". The spider visited my site two days ago and obviously processed the sitemap file. I've heard that it's possible to do a ping to certain websites. Is this the way to proceed? I must say that there're not many updates in the site (just one way information) as the social media activity is still low. Thanks in advanced.
Technical SEO | | juanmiguelcr0 -
Google Sitelinks
We have an e-commerce site that has about 50k pageviews of our main shop page every week. However in our Google sitelinks we have one for 'Shop'. However, for the Shop sitelink Google is linking to a random URL that we have never & would never use as a URL and not to our Shop page. I can't work out why Google would pick up this random url as we have so many links etc to the main shop page. Why are they not linking to the right page? I have blocked that url in webmaster tools and done a redirect but I want to understand why it happened in the first place. It included 'swedish+fish' so it seems weirdly spammy?! Any thoughts would be really helpful (and I am only mildly techy). Many thanks
Technical SEO | | ahamill0 -
When is the last time Google crawled my site
How do I tell the last time Google crawled my site. I found out it is not the "Cache" which I had thought it was.
Technical SEO | | digitalops0 -
Is this against google rules
Hi i am wanting to know if this is against google rules. I am building a website which will have lots of different sections and i wanted to know if you were allowed to have a new domain name pointing to a section of the site. so for example if i had a site with a domain name of manchester and then i wanted a section of the site to be called www.manchester.com/complimentary health I want to know if to help with traffic to the site and to have a better domain name, if it was allowed to have a new domain name pointing to that section of the site which could be called www.complimentaryhealth.com and have that pointing to the section. would love to hear your thoughts on this
Technical SEO | | ClaireH-1848860 -
Google support eTag?
Hello~ People! I have a questions regarding eTag. I know Google support If-Modified-HTTP-Header aka last modified header. I used eTag instead of last modified header. It seems like Google does support, yet here is my questions. code.google suggest as following. GData-Version: 2.0
Technical SEO | | Artience
ETag: "C0QBRXcycSp7ImA9WxRVFUk." but I used etag as following . ETag: "10cd712-eaae-b279a480" I didnt include "GData-Version: 2.0". is this mean Google may not support my etag?0 -
Why do I see dramatic differences in impressions between Google Webmaster Tools and Google Insights for Search?
Has anyone else noticed discrepancies between these tools? Take keyword A and keyword B. I've literally seen situations where A has 3 or 4 times the traffic as B in Google Webmaster Tools, but half the traffic of B in Google Insights for Search. What might be the reason for this discrepancy?
Technical SEO | | ir-seo-account0