Indexing/Sitemap - I must be wrong
-
Hi All,
I would guess that a great number of us new to SEO (or not) share some simple beliefs in relation to Google indexing and Sitemaps, and as such get confused by what Web master tools shows us.
It would be great if somone with experience/knowledge could clear this up for once and all
Common beliefs:
-
Google will crawl your site from the top down, following each link and recursively repeating the process until it bottoms out/becomes cyclic.
-
A Sitemap can be provided that outlines the definitive structure of the site, and is especially useful for links that may not be easily discovered via crawling.
-
In Google’s webmaster tools in the sitemap section the number of pages indexed shows the number of pages in your sitemap that Google considers to be worthwhile indexing.
-
If you place a rel="canonical" tag on every page pointing to the definitive version you will avoid duplicate content and aid Google in its indexing endeavour.
These preconceptions seem fair, but must be flawed.
Our site has 1,417 pages as listed in our Sitemap. Google’s tools tell us there are no issues with this sitemap but a mere 44 are indexed! We submit 2,716 images (because we create all our own images for products) and a disappointing zero are indexed.
Under Health->Index status in WM tools, we apparently have 4,169 pages indexed. I tend to assume these are old pages that now yield a 404 if they are visited.
It could be that Google’s Indexed quotient of 44 could mean “Pages indexed by virtue of your sitemap, i.e. we didn’t find them by crawling – so thanks for that”, but despite trawling through Google’s help, I don’t really get that feeling.
This is basic stuff, but I suspect a great number of us struggle to understand the disparity between our expectations and what WM Tools yields, and we go on to either ignore an important problem, or waste time on non-issues.
Can anyone shine a light on this for once and all?
If you are interested, our map looks like this :
http://www.1010direct.com/Sitemap.xml
Many thanks
Paul
-
-
44 relates to the number of pages with the same urls as in your sitemap - it is not everything that is index. Your old site is still indexed and being found, as Google visits those pages and gets redirected to a new page it is likely that number will increase (from 44) and the number of old indexed will decrease.
Google doesn't index sites on a one-off go around because then if may take say 4 months to come back and index again and if you've a new important page that gets lots of links and you don't get indexed and ranked for it because you've not been visited you wouldn't be happy. Also if this was done on every site it would take forever and take much more resources than even google has. it is annoying but you've just got to grin and bear it - at least you old site is still ranking and being found.
-
Thanks Andy,
What I dont get, is why Google would index in this way. I can understand why they would weight the importance of a page based on the number/strength of incoming links but not the decision to index it at all when lead in by a sitemap.
I just get a little frustrated when Google offers you seemingly definitive stats only to find they are so vague and mysterious they have little to no value. We should have 1400+ pages indexed, we clearly have more than 44 indexed ... what on earth does the number 44 relate to?
-
I think that as your sitemap reflect your new urls and this is what the index is based on you are likely to have more indexed from what you say. I would suggest going to "indexed status" under health of GWT and click total index and ever crawled, this may help clear this up.
-
I experienced this issue with sandboxed websites.
Market your products and in a few months every page should be in Google's index.
Cheers.
-
Thanks for the quick responses.
We had a bit of a URL reshuffle recently to make them a little more informative and to prevent each page URL terminating with "product.aspx". But that was around a month ago. Prior to that, we were around 40% indexed for pages (from the sitemap section of WM tools), and always zero for images.
So given that we clearly have more than 44 pages indexed by Google, what do you think that figure actually means?
-
dealing with your indexing issue first - depending on when you submitted depends how soon those pages may be indexed. I say "may" because a sitemap (yes answering another question) is just an indicator of "i have these pages" it does not mean they will be indexed - indeed unless you've a small website you will never have 100% indexation in my experience.
Spiders (search robots) index / visit a website / page via another link. They follow links to a page from around the web, or the site itself. The more links from around the web the quicker you will get indexed. (this explains why if you've 10,000 pages you won't ever get a link from other websites to them all and so they won't all get indexed). This means if you've a web page that gets a ton of links it will be indexed sooner than those with just 1 link - assuming all links are equal (which they aren't).
Spiders are not cyclic in their searching, it's very ad-hoc based on links in your site and other sites linking to you. A spider won't be sent to spider every page on your site - it will do a small amount at a time, this is likely why 44 pages are indexed and not more at this point.
A sitemap is (as i say) an indicator of pages in your site, the importance of them and when they were updated / created. it's not really a definitive structure - it's more of a reference guide. Think of it as you being the guide on a bus tour of a city, the search engine is your passenger you are pointing out places of interest and every so often it will see something it wan't to see and get off to look, but it may take many trips to get off at every stop.
Finally, Canonicals are a great way to clear up duplicate content issues. They aren't 100% successful but they do help - especially if you are using dynamic urls (such as paginating category pages).
hope that helps
-
I see your frustration, how long ago did you submit these site maps? Are we talking a couple of weeks or a couple of days/ a day? As I've seen myself, Google is not that fast at calculating the nr of pages indexed (definitely not within GWT). Mostly within a couple of days/ within a week Google largely increased the nr of pages indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ is showing that I have non- indexed blog tag posts are they supposed to be nonindexed. My articles are indexed just not the blog tags that take you to other similar articles do I need to fix this or is it ok?
MOZ is showing that my blog post tags are not indexed my question is should they be indexed? my articles are indexed just not the tags that take you to posts that are similar. Do I need to fix this or not? Thank you
Intermediate & Advanced SEO | | Tyler58910 -
Should I no-index categories of my blog?
I have blog with lots of articles & it also has lots of categories. These categories are currently indexed in the google and moz showing missing title and description for these categories. Should I place no-index tag in all the categories or leave it as it is?
Intermediate & Advanced SEO | | jhakasseo0 -
Top-10 ranked site dropping in/out of Google index?
I work for a company that makes an important product in a category. The company has a website (www.company.org); the product is at www.company.org/product. We recently (early May) redesigned and rearchitected the product site for SEO purposes. The company site talks about the category a bit (imagine the Colgate site; it talks about "toothpaste" a bit). The blog (blog.company.org/product) also talks about the category quite a bit (and links to the company site of course). The product is a major product in the category, among the top 3. The site and blog have been around for 15+ years. The site has appx. a billion backlinks, most branded links to the product. It's in the top 50 highest ranked sites among all sites on the internet in the ahrefs rank index. Imagine you are searching for our product category, "category". If you search for "category" in Bing today, my company's site is the 3rd result, and it's the 1st result from a company that makes a product in this category. If you search for "category" in Google today, our site is not in the top 150 results. In fact, the site keeps dropping out of Google's index. (See attached for what that looks like in the search console.) What might cause a site to jump from "ranked in top 10" to "not ranked" in Google -- back and forth every couple of days? Penalties? Our recent (early May) site rearchitecture? We're not making giant, index-shifting changes every day. wE0Bn
Intermediate & Advanced SEO | | hoosteeno0 -
How to Get Permalinks Indexed?
Hey Everyone, I'm so happy to be apart of this community and assert knowledge where and when I can. I joined the community for one specific reason and I hope to employ the help of everyone here in conjunction with solving my SEO problem. I have a few years experience in SEO/SEM and have been continuously learning, while learning to adapt to continuous changes (I think we can all relate lol). At any rate, here is what I am experiencing frustration with. I'm the SEO Analyst for a company that is trying to compete for the keyword phrase "Lyft Promo Code". We have been trying to place page one on google for over a year now to no avail. I have gotten my direct domain url to appear on pages 1 & 2, but can't seem to get permalinks or "Sub-URL's" indexed. If you google this phrase you will see what I mean. The top result is:http://rideshareapps.com/lyft-promo-code-credit/
Intermediate & Advanced SEO | | Number_One_Deisgns
This url has an aggregated rating and appears page one for the phrase aforementioned above. What we have managed to do, as I mentioned is get www.couponcodeshero.com on page two. However, we have noticed that the page one trend is all permalinks. However when we have tried to emulate the pages structure and index priority, we are unable too. Our page:
http://couponcodeshero.com/lyft-promo-code-rideshare-guide/ I have ran multiple on-page graders from many resources and have not been able to get this page indexed as a permalink on any page that directly correlates with the Keyword Phrase. In essence, I'm looking for some direction from individuals who may have experienced this before. I have spent a good amount of time Googling and searching forum databases but can not find any direct content that explains how to index a permalink. I hope to get some great ideas from the individuals here! If you do know of any articles or even previously answered questions here please direct me there. it is only my intention to add value to the community! Schieler Mew
Number One Designs0 -
Domain.com/postname vs. Domain.com/blog/postname
I am wondering what is the best practice regarding blogs? I read that it would be best to structure a website like a pyramide instead of a flat panckage But I have seen many blogs where the post shows right after the domain name. Domain.com/postname instead of Domains/blog/postname My point is that if a website has many post then the structure will get very flat and this will maybe make your most optimized and important pages less important to google domain.com/page a) What do you think about this, which one of the two blog solutions do you prefer and why? b) in context to blog If for instance you had a keyword like Copenhagen property would you then consider renaming your blog to realetateagent.com/Copenhagen-property-news/post-name c) Would write a little intro like 200 words for the page 1 of your blog and add in some keywords.
Intermediate & Advanced SEO | | nm19770 -
Urls missing from product_cat sitemap
I'm using Yoast SEO plugin to generate XML sitemaps on my e-commerce site (woocommerce). I recently changed the category structure and now only 25 of about 75 product categories are included. Is there a way to manually include urls or what is the best way to have them all indexed in the sitemap?
Intermediate & Advanced SEO | | kisen0 -
Website.com/blog/post vs website.com/post
I have clients with Wordpress sites and clients with just a Wordpress blog on the back of website. The clients with entire Wordpress sites seem to be ranking better. Do you think the URL structure could have anything to do with it? Does having that extra /blog folder decrease any SEO effectiveness? Setting up a few new blogs now...
Intermediate & Advanced SEO | | PortlandGuy0 -
Can a XML sitemap index point to other sitemaps indexes?
We have a massive site that is having some issue being fully crawled due to some of our site architecture and linking. Is it possible to have a XML sitemap index point to other sitemap indexes rather than standalone XML sitemaps? Has anyone done this successfully? Based upon the description here: http://sitemaps.org/protocol.php#index it seems like it should be possible. Thanks in advance for your help!
Intermediate & Advanced SEO | | CareerBliss0