Indexing/Sitemap - I must be wrong
-
Hi All,
I would guess that a great number of us new to SEO (or not) share some simple beliefs in relation to Google indexing and Sitemaps, and as such get confused by what Web master tools shows us.
It would be great if somone with experience/knowledge could clear this up for once and all
Common beliefs:
-
Google will crawl your site from the top down, following each link and recursively repeating the process until it bottoms out/becomes cyclic.
-
A Sitemap can be provided that outlines the definitive structure of the site, and is especially useful for links that may not be easily discovered via crawling.
-
In Google’s webmaster tools in the sitemap section the number of pages indexed shows the number of pages in your sitemap that Google considers to be worthwhile indexing.
-
If you place a rel="canonical" tag on every page pointing to the definitive version you will avoid duplicate content and aid Google in its indexing endeavour.
These preconceptions seem fair, but must be flawed.
Our site has 1,417 pages as listed in our Sitemap. Google’s tools tell us there are no issues with this sitemap but a mere 44 are indexed! We submit 2,716 images (because we create all our own images for products) and a disappointing zero are indexed.
Under Health->Index status in WM tools, we apparently have 4,169 pages indexed. I tend to assume these are old pages that now yield a 404 if they are visited.
It could be that Google’s Indexed quotient of 44 could mean “Pages indexed by virtue of your sitemap, i.e. we didn’t find them by crawling – so thanks for that”, but despite trawling through Google’s help, I don’t really get that feeling.
This is basic stuff, but I suspect a great number of us struggle to understand the disparity between our expectations and what WM Tools yields, and we go on to either ignore an important problem, or waste time on non-issues.
Can anyone shine a light on this for once and all?
If you are interested, our map looks like this :
http://www.1010direct.com/Sitemap.xml
Many thanks
Paul
-
-
44 relates to the number of pages with the same urls as in your sitemap - it is not everything that is index. Your old site is still indexed and being found, as Google visits those pages and gets redirected to a new page it is likely that number will increase (from 44) and the number of old indexed will decrease.
Google doesn't index sites on a one-off go around because then if may take say 4 months to come back and index again and if you've a new important page that gets lots of links and you don't get indexed and ranked for it because you've not been visited you wouldn't be happy. Also if this was done on every site it would take forever and take much more resources than even google has. it is annoying but you've just got to grin and bear it - at least you old site is still ranking and being found.
-
Thanks Andy,
What I dont get, is why Google would index in this way. I can understand why they would weight the importance of a page based on the number/strength of incoming links but not the decision to index it at all when lead in by a sitemap.
I just get a little frustrated when Google offers you seemingly definitive stats only to find they are so vague and mysterious they have little to no value. We should have 1400+ pages indexed, we clearly have more than 44 indexed ... what on earth does the number 44 relate to?
-
I think that as your sitemap reflect your new urls and this is what the index is based on you are likely to have more indexed from what you say. I Â would suggest going to "indexed status" under health of GWT and click total index and ever crawled, this may help clear this up.
-
I experienced this issue with sandboxed websites.
Market your products and in a few months every page should be in Google's index.
Cheers.
-
Thanks for the quick responses.
We had a bit of a URL reshuffle recently to make them a little more informative and to prevent each page URL terminating with "product.aspx". But that was around a month ago. Prior to that, we were around 40% indexed for pages (from the sitemap section of WM tools), and always zero for images.
So given that we clearly have more than 44 pages indexed by Google, what do you think that figure actually means?
-
dealing with your indexing issue first - depending on when you submitted depends how soon those pages may be indexed. I say "may" because a sitemap (yes answering another question) is just an indicator of "i have these pages" it does not mean they will be indexed - indeed unless you've a small website you will never have 100% indexation in my experience.
Spiders (search robots) index / visit a website / page via another link. They follow links to a page from around the web, or the site itself. The more links from around the web the quicker you will get indexed. (this explains why if you've 10,000 pages you won't ever get a link from other websites to them all and so they won't all get indexed). This means if you've a web page that gets a ton of links it will be indexed sooner than those with just 1 link - assuming all links are equal (which they aren't).
Spiders are not cyclic in their searching, it's very ad-hoc based on links in your site and other sites linking to you. A spider won't be sent to spider every page on your site - it will do a small amount at a time, this is likely why 44 pages are indexed and not more at this point.
A sitemap is (as i say) an indicator of pages in your site, the importance of them and when they were updated / created. it's not really a definitive structure - it's more of a reference guide. Think of it as you being the guide on a bus tour of a city, the search engine is your passenger you are pointing out places of interest and every so often it will see something it wan't to see and get off to look, but it may take many trips to get off at every stop.
Finally, Canonicals are a great way to clear up duplicate content issues. They aren't 100% successful but they do help - especially if you are using dynamic urls (such as paginating category pages).
hope that helps
-
I see your frustration, how long ago did you submit these site maps? Are we talking a couple of weeks or a couple of days/ a day? As I've seen myself, Google is not that fast at calculating the nr of pages indexed (definitely not within GWT). Mostly within a couple of days/ within a week Google largely increased the nr of pages indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing
Hi We have roughly 8500 pages in our website. Google had indexed almost 6000 of them, but now suddenly I see that the pages indexed has gone to 45. Any possible explanations why this might be happening and what can be done for it. Thanks, Priyam
Intermediate & Advanced SEO | | kh-priyam0 -
How to Get Permalinks Indexed?
Hey Everyone, I'm so happy to be apart of this community and assert knowledge where and when I can. I joined the community for one specific reason and I hope to employ the help of everyone here in conjunction with solving my SEO problem. I have a few years experience in SEO/SEM and have been continuously learning, while learning to adapt to continuous changes (I think we can all relate lol). At any rate, here is what I am experiencing frustration with. I'm the SEO Analyst for a company that is trying to compete for the keyword phrase "Lyft Promo Code". We have been trying to place page one on google for over a year now to no avail. I have gotten my direct domain url to appear on pages 1 & 2, but can't seem to get permalinks or "Sub-URL's" indexed. If you google this phrase you will see what I mean. The top result is:http://rideshareapps.com/lyft-promo-code-credit/
Intermediate & Advanced SEO | | Number_One_Deisgns
This url has an aggregated rating and appears page one for the phrase aforementioned above. What we have managed to do, as I mentioned is get www.couponcodeshero.com on page two. However, we have noticed that the page one trend is all permalinks. However when we have tried to emulate the pages structure and index priority, we are unable too. Our page:
http://couponcodeshero.com/lyft-promo-code-rideshare-guide/ I have ran multiple on-page graders from many resources and have not been able to get this page indexed as a permalink on any page that directly correlates with the Keyword Phrase. In essence, I'm looking for some direction from individuals who may have experienced this before. I have spent a good amount of time Googling and searching forum databases but can not find any direct content that explains how to index a permalink. I hope to get some great ideas from the individuals here! If you do know of any articles or even previously answered questions here please direct me there. it is only my intention to add value to the community! Schieler Mew
Number One Designs0 -
Domain.com/old-url to domain.com/new-url
HI, I have to change old url`s to new one, for the same domain and all landing pages will be the same: domain.com/old-url I have to change to: domain.com/new-url All together more than 70.000 url. What is best way to do that? should I use 301st redirect? is it possible to do in code or how? what could you please suggest? Thank you, Edgars
Intermediate & Advanced SEO | | Edzjus3330 -
Google Is Indexing The Wrong Page For My Keyword
For a long time (almost 3 mounth) google indexing the wrong page for my main keyword.
Intermediate & Advanced SEO | | Tiedemann_Anselm
The problem is that each time google indexed another page each time for a period of 4-7 days, Sometimes i see the home page, sometimes a category page and sometimes a product page.
It seems though Google has not yet decided what his favorite / better page for this keyword. This is the pages google index: (In most cases you can find the site on the second or third page) Main Page: http://bit.ly/19fOqDh Category Page: http://bit.ly/1ebpiRn Another Category: http://bit.ly/K3MZl4 Product Page: http://bit.ly/1c73B1s All links I get to the website are natural links, therefore in most cases the anchor we got is the website name. In addition I have many links I get from bloggers that asked to do a review on one of my products, I'm very careful about that and so I'm always checking the blogger and their website only if it is something good, I allowed it. also i never ask for a link back (must of the time i receive without asking), and as I said, most of their links are anchor with my website name. Here some example of links that i received from bloggers: http://bit.ly/1hF0pQb http://bit.ly/1a8ogT1 http://bit.ly/1bqqRr8 http://bit.ly/1c5QeC7 http://bit.ly/1gXgzXJ Please Can I get a recommendation what should you do?
Should I try to change the anchor of the link?
Do I need to not allow bloggers to make a review on my products? I'd love to hear what you recommend,
Thanks for the help0 -
Sitemap Issue - vol 2
Hello everyone! I validated the sitemap with different tools (w3Schools, and so on..) and no errors were found. So I uploaded into my site, tested it through GWT and BANG! all of a sudden there is a parsing error, which correspond to the last, and I mean last piece of code of thousand of lines, . I don't know why it isn't reading the code and it's giving me this as there are no other errors and I haven't got a clue about what to do in order to fix it! Thanks
Intermediate & Advanced SEO | | PremioOscar0 -
Drop in indexed pages!
Hi everybody! I've been working on http://thewilddeckcompany.co.uk/ for a little while now. Until recently, everything was great - good rankings for the key terms of 'bird hides' and 'pond dipping platforms'. However, rankings have tanked over the past few days. I can't point my finger at it yet, but a site:thewilddeckcompany.co.uk search shows only three pages have been indexed. There's only 10 on the site, and it was fine beforehand. Any advice would be much appreciated,
Intermediate & Advanced SEO | | Blink-SEO0 -
How do you de-index and prevent indexation of a whole domain?
I have parts of an online portal displaying in SERPs which it definitely shouldn't be. It's due to thoughtless developers but I need to have the whole portal's domain de-indexed and prevented from future indexing. I'm not too tech savvy but how is this achieved? No index? Robots? thanks
Intermediate & Advanced SEO | | Martin_S0 -
Increasing index
Hi! I'm having some trouble getting Google to index pages which once had a querystring in them but now are being redirected with a 301. The pages have a lot of unique content but this doesn't seem to matter. I feels as if there stuck in limbo (or a sandbox 🙂 Any clues on how to fix this? Thanks / Niklas
Intermediate & Advanced SEO | | KAN-Malmo0