Total Indexed 1.5M vs 83k submitted by sitemap. What?
-
We recently took a good look at one of our content site's sitemap and tried to cut out a lot of crap that had gotten in there such as .php, .xml, .htm versions of each page. We also cut out images to put in a separate image sitemap.
The sitemap generated 83,000+ URLs for google to crawl (this partially used the Yoast Wordpress plugin to generate)
In webmaster tools in the index status section is showing that this site has a total index of 1.5 million.
With our sitemap coming back with 83k and google indexing 1.5 million pages, is this a sign of a CMS gone rogue? Is it an indication that we could be pumping out error pages or empty templates, or junk pages that we're cramming into Google's bot?
I would love to hear what you guys think. Is this normal? Is this something to be concerned about? Should our total index more closely match our sitemap page count?
-
As well as parameters mentioned you may possibly have heaps of duplicating categories, tags etc. What I would also do is start searching Google with something like site:www.example.com/directory/ or possibly site:www.example.com/category/directory/directory/ so you are tightly narrowing down the results, switch to 100 results per page and manually look for clues.
-
If you have 1.5 million pages and you think your sitemap is comprehensive at 83,000 then yes, your CMS is needlessly generating pages. It's usually not a big deal from a ranking standpoint, but it can make other important issues hard to detect. I would clean it up, but that's a business call you'll have to make.
The first step is diagnosing where are the URLs are coming from. What you do next will depend, but I will give you the best advice I can without knowing what types of extraneous URLs you have and how Google is treating them:
First, I'd start with WMT > Crawl > URL Parameters. Quite often your CMS will generate URLs, and Google usually knows how to handle them. If there are a lot of URL parameters, Google them and see if they're exactly the same as other pages. If they are, make sure you have canonical tags in place to point them to the main version. There's more you can do with parameters, but it'll depend on what you find so I won't go into more detail. As a general rule, though, a CMS should not generate a page unless it is uniquely useful as differentiated landing page or a page for people to link to.
Also check for parameters in your analytics program. They could actually be messing up your pageview data depending on how you report.There's a post on fixing that in GA here:
http://blog.crazyegg.com/2013/03/29/remove-url-parameters-from-google-analytics-reports/
Next I'd look at the "Advanced" tab in WMT > Google Index > Index Status . Are there a lot of URLs removed? If so, check on these pages and see why they're removed and why they exist.
I would also run a crawl with Xenu and Screaming Frog to make sure crawlers are finding a reasonable number of pages and that they're not getting stuck in crawl loops. (crawling variations of a page endlessly). These kinds of issues can prevent new pages from being indexed on time because Google is wasting time (your crawl budget) running in circles.
-
Rob,
Your sitemap is but an indication to Google about urls on your domain. The sitemap does not limit google to crawling or indexing only the urls listed on it, nor is it a directive that tells google to remove urls from the index that it has already crawled. As stated in GWT, use **robots.txt **to specify how search engines should crawl your site, or request **removal **of URLs from Google's search results with the URL removal tool Google webmaster tools under the "google index" link.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
2015/2016 Sitemaps Exclusions
Hello fellow mozrs!
Intermediate & Advanced SEO | | artdivision
Been working on a few Property (Real Estate for our American friends) websites recently and and two questions that constantly come up as we spec the site are: 1. What schema (schema.org) should the website use (throughout all pages as well as individual pages). Did anyone found that schema actually helped with their ranking/CTR?
2. Whilst setting up the sitemaps (usually Yaost is our preferred plugin for the job), what page would you EXCLUDE from the site map? Looking forward to some interesting comments.
Dan.0 -
2 eCommerce stores that are identical 1 for US 1 for CA, what's the best way to SEO?
Hello everyone! I have an SEO question that I cannot solve given the parameters of the project, and I was wondering if someone could provide me with the next best alternative to my situation. Thank you in advance. The problem: Two eCommerce stores are completely identical (structure, products, descriptions, content) but they are on separate domains for currency and targeting purposes. www.website-can.com is for Canada and www.website-usa.com is for US. Due to exchange rate issues, we are unable to combine the 2 domains into 1 store and optimize. What's been done? I have optimized the Canadian store with unique meta titles and descriptions for every page and every product. However I have left the US store untouched. I would like to gain more visibility for the US Store but it is very difficult to create unique content considering the products are identical. I have evaluated using canonicals but that would ask Google to only look at either the Canadian or US store, , correct me if i'm wrong. I am looking for the next best solution given the challenges and I was wondering if someone could provide me with some ideas.
Intermediate & Advanced SEO | | Snaptech_Marketing0 -
Sitemap for SmartPhone site
Hello I have a smartphone site (e.g.m.abc.com). To my understanding we do not need a mobile sitemap as its not a traditional mobile site. Shall I add those mobile site links in my regular www XML sitemap or not bother to add the links as we already have rel = canonical (on m.abc.com ) and rel= alternate in place (on www site) to respective pages. Please suggests a solution. I really look forward to an answer as I haven't found the "official" answer to this question anywhere.
Intermediate & Advanced SEO | | AdobeVAS0 -
Drop in indexed pages!
Hi everybody! I've been working on http://thewilddeckcompany.co.uk/ for a little while now. Until recently, everything was great - good rankings for the key terms of 'bird hides' and 'pond dipping platforms'. However, rankings have tanked over the past few days. I can't point my finger at it yet, but a site:thewilddeckcompany.co.uk search shows only three pages have been indexed. There's only 10 on the site, and it was fine beforehand. Any advice would be much appreciated,
Intermediate & Advanced SEO | | Blink-SEO0 -
Why are so many pages indexed?
We recently launched a new website and it doesn't consist of that many pages. When you do a "site:" search on Google, it shows 1,950 results. Obviously we don't want this to be happening. I have a feeling it's effecting our rankings. Is this just a straight up robots.txt problem? We addressed that a while ago and the number of results aren't going down. It's very possible that we still have it implemented incorrectly. What are we doing wrong and how do we start getting pages "un-indexed"?
Intermediate & Advanced SEO | | MichaelWeisbaum0 -
Static index page or not?
Are there any advantages of dis-advantages to running a static homepage as opposed to a blog style homepage. I have be running a static page on my site with the latest posts displayed as links after the homepage content. I would like to remove the static page and move to a more visually appealing homepage that includes graphics for each post and the posts droppping down the page like normal blogs do. How will this effect my site if I move from a static page to a more dynamic blog style page layout? Could I still hold the spot I currently rank for with the optimized index content if I turn to a more traditional blog format? cheers,
Intermediate & Advanced SEO | | NoCoGuru0 -
Online Sitemap Generator
I have a site that has around 5,000 pages now. Are there any recommened online free/paid tools to generate a sitemap for me?
Intermediate & Advanced SEO | | rhysmaster0