Indexed pages and current pages - Big difference?
-
Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference?
The domain canonical is set so can't really figure out if we've got a problem or not?
-
Hi Nathan,
The delta between the number of pages returned by the site: operator and the number of pages in your sitemap could be down to a number of issues:
- Your XML sitemap may represent only a percentage of the total number of valid content URLs that your site is capable of generating.
a) Often sites will only generate XML sitemaps for URLs that someone has decided are "important", when the total number of URLs is much larger.
- Your XML sitemap contains ALL the valid content URLs that your site is capable of generating, but search engines are somehow finding more URLs.
a) Look in Google Webmaster Tools under Optimization >> HTML improvements >> Duplicate title tags
i) Do the pages with duplicate titles have duplicate page content? If so, your publishing platform is allowing multiple URLs to render the same content, which is a bug that needs to be fixed
b) Run a crawler like Xenu Link Sleuth or Screaming Frog against your site, and see how many URLs they discover. Export the results to Excel and look for weird URLs
i) Usually culprits for duplicate content include incorrect canonicalization (www vs non-www, URLs ending in /index.html vs just /, etc)
ii) Look for URLs ending with strange query strings (affiliate tracking, session IDs, etc)
c) Use the site: operator in other engines (Bing, blekko, etc) and compare the numbers they return. Especially if this number is larger than the number Google is returning, starting looking for weird URL patterns
Also, I'm not sure what you mean by "the domain canonical has been set correctly". If you're referring to use of the canonical link element for every URL, there are plenty of ways that can go wrong. E.g., if your CMS requires that each published URL have rel="canonical", but allows URLs to be published with and without the trailing /index.html, you can end up with a canonical link element on the non-canonical version of the URL, further confusing engines. Something to look into.
-
You might have a duplicate content issue. You will want to check if you have the proper 301 redirect and a canonical command in the head of your code. If you don't have this set properly then the search engines will see the www and non-www versions of your site as duplicate. Also remember that the search engines also by default place this at the end of the url /
Here are two links that can help if this is the issue.
http://www.webconfs.com/how-to-redirect-a-webpage.php/
http://www.mattcutts.com/blog/rel-canonical-html-head/
Hope this helps. Good Luck
-
Yes this is a potentially significant problem. The easiest way to troubleshoot is to do the 'site:' command again, and go to the last page of results. You should be seeing pages that aren't in your sitemap. Very likely duplicated content.
If you are having a rough time troubleshooting, post a link and I'll be glad to take a peek.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Pages not indexable?
Hello, I've been trying to find out why Google Search Console finds these pages non-indexable: https://www.visitflorida.com/en-us/eat-drink.html https://www.visitflorida.com/en-us/florida-beaches/beach-finder.html Moz and SEMrush both crawl the pages and show no errors but GSC comes back with, "blocked by robots.txt" but I've confirmed it is not. Anyone have any thoughts? 6AYn1TL
Technical SEO | | KenSchaefer0 -
Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.
Hi there, I just made a crawl of the website of one of my clients with the crawl tool from moz. I have 2900 403 errors and there is only 140 pages on the website. I will give an exemple of what the crawl error gives me. | http://www.mysite.com/en/www.mysite.com/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | | | | | | | | | | There are 2900 pages like this. I have tried visiting the pages and they work, but they are only html pages without CSS. Can you guys help me to see what the problems is. We have experienced huge drops in traffic since Septembre.
Technical SEO | | H.M.N.0 -
Drop in Indexed Page + Organic Traffic
Hey Moz Community, I've been seeing a steady decrease in search console of pages being indexed by Google for our eCommerce site. This is corresponding to lower impressions and traffic in general this year. We started with around a million pages being indexed in Nov of 2015 down to 18,000 pages this Nov. I realized that since we don't have around 3,000 or so products year round this is mostly likely a good thing. I've checked to make sure our main landing pages are being indexed which they are and our sitemap was updated several times this year, although we're in the process of updating it again to resubmit. I also checked our robots.txt and there's nothing out of the ordinary. In the last month we've recently gotten rid of some duplicate content issues caused by pagination by using canonical tags but that's all we've done to reduce the number of pages crawled. We have seen some soft 404's and some server errors coming up in our crawl error report that we've either fixed or are trying to fix. Not really sure where to start looking to find a solution to the problem or if it's even a huge issue, but the drop in traffic is also not great. The drop in traffic corresponded to lose in rankings as well so there could be correlation or none. Any ideas here?
Technical SEO | | znotes0 -
How to know how much pages are indexed on Google?
I have a big site, there are a way to know what page are not indexed? I know that you can use site: but with a big site is a mess to check page by page. This is a tool or a system to check a entire site and automatically find non-indexed pages?
Technical SEO | | markovald0 -
I know I'm missing pages with my page level 301 re-directs. What can I do?
I am implementing page level re-directs for a large site but I know that I will inevitably miss some pages. Is there an additional safety net root level re-direct that I can use to catch these pages and send them to the homepage?
Technical SEO | | VMLYRDiscoverability0 -
The number of pages indexed on Bing DROPPED significantly.
I haven't signed in to bing webmaster tool for a while. and I found that Bing is not indexing my site properly all of a sudden. IT DROPPED SIGNIFICANTLY Any idea why it is behaving this way? (please check the attachment) INg1o.png
Technical SEO | | joony20080 -
Getting More Pages Indexed
We have a large E-commerce site (magento based) and have submitted sitemap files for several million pages within Webmaster tools. The number of indexed pages seems to fluctuate, but currently there is less than 300,000 pages indexed out of 4 million submitted. How can we get the number of indexed pages to be higher? Changing the settings on the crawl rate and resubmitting site maps doesn't seem to have an effect on the number of pages indexed. Am I correct in assuming that most individual product pages just don't carry enough link juice to be considered important enough yet by Google to be indexed? Let me know if there are any suggestions or tips for getting more pages indexed. syGtx.png
Technical SEO | | Mattchstick0