Indexed pages and current pages - Big difference?
-
Our website shows ~22k pages in the sitemap but ~56k are showing indexed on Google through the "site:" command. Firstly, how much attention should we paying to the discrepancy? If we should be worried what's the best way to find the cause of the difference?
The domain canonical is set so can't really figure out if we've got a problem or not?
-
Hi Nathan,
The delta between the number of pages returned by the site: operator and the number of pages in your sitemap could be down to a number of issues:
- Your XML sitemap may represent only a percentage of the total number of valid content URLs that your site is capable of generating.
a) Often sites will only generate XML sitemaps for URLs that someone has decided are "important", when the total number of URLs is much larger.
- Your XML sitemap contains ALL the valid content URLs that your site is capable of generating, but search engines are somehow finding more URLs.
a) Look in Google Webmaster Tools under Optimization >> HTML improvements >> Duplicate title tags
i) Do the pages with duplicate titles have duplicate page content? If so, your publishing platform is allowing multiple URLs to render the same content, which is a bug that needs to be fixed
b) Run a crawler like Xenu Link Sleuth or Screaming Frog against your site, and see how many URLs they discover. Export the results to Excel and look for weird URLs
i) Usually culprits for duplicate content include incorrect canonicalization (www vs non-www, URLs ending in /index.html vs just /, etc)
ii) Look for URLs ending with strange query strings (affiliate tracking, session IDs, etc)
c) Use the site: operator in other engines (Bing, blekko, etc) and compare the numbers they return. Especially if this number is larger than the number Google is returning, starting looking for weird URL patterns
Also, I'm not sure what you mean by "the domain canonical has been set correctly". If you're referring to use of the canonical link element for every URL, there are plenty of ways that can go wrong. E.g., if your CMS requires that each published URL have rel="canonical", but allows URLs to be published with and without the trailing /index.html, you can end up with a canonical link element on the non-canonical version of the URL, further confusing engines. Something to look into.
-
You might have a duplicate content issue. You will want to check if you have the proper 301 redirect and a canonical command in the head of your code. If you don't have this set properly then the search engines will see the www and non-www versions of your site as duplicate. Also remember that the search engines also by default place this at the end of the url /
Here are two links that can help if this is the issue.
http://www.webconfs.com/how-to-redirect-a-webpage.php/
http://www.mattcutts.com/blog/rel-canonical-html-head/
Hope this helps. Good Luck
-
Yes this is a potentially significant problem. The easiest way to troubleshoot is to do the 'site:' command again, and go to the last page of results. You should be seeing pages that aren't in your sitemap. Very likely duplicated content.
If you are having a rough time troubleshooting, post a link and I'll be glad to take a peek.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to move some pages of my website to a folder and nav menu in those pages should only show inner page links, will it hurt SEO?
Hi, My website has a few SaaS products, to make my website simple i want to move my website some pages to its specific folder structure , so eg website.com/product1/features
Technical SEO | | webbeemoz
website.com/product1/pricing
website.com/product1/information and same for product2 and so on, the website.com/product1/.. menu will only show the links of product1 and only one link to homepage (possibly in footer). Please share your opinion will it be a good idea, from UI perspective it will be simple , but i am not sure about SEO perspective, please help thanks0 -
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
How to de-index a page with a search string with the structure domain.com/?"spam"
The site in question was hacked years ago. All the security scans come up clean but the seo crawlers like semrush and ahrefs still show it as an indexed page. I can even click through on it and it takes me to the homepage with no 301. Where is the page and how to deindex it? domain/com/?spam There are multiple instances of this. http://www.clipular.com/c/5579083284217856.png?k=Q173VG9pkRrxBl0b5prNqIozPZI
Technical SEO | | Miamirealestatetrendsguy1 -
Page that appears on SERPs is not the page that has been optimized for users
This may seem like a pretty newbie question, but I haven't been able to find any answers to it (I may not be looking correctly). My site used to rank decently for the KW "Gold name necklace" with this page in the search results:http://www.mynamenecklace.co.uk/Products.aspx?p=302This was the page that I was working on optimizing for user experience (load time, image quality, ease of use, etc.) since this page was were users were getting to via search. A couple months ago the Google SERP's started showing this page for the same query (also ranked a little lower, but not important for this specific question):http://www.mynamenecklace.co.uk/Products.aspx?p=314Which is a white gold version of the necklaces. This is not what most users have in mind (when searching for gold name necklace) so it's much less effective and engaging.How do I tell Google to go back to old page/ give preference to older page / tell them that we have a better version of the page / etc. without having to noindex any of the content? Both of these pages have value and are for different queries, so I can't canonical them to a single page. As far as external links go, more links are pointing to the Yellow gold version and not the white gold one.Any ideas on how to remedy this?Thanks.
Technical SEO | | Don340 -
SEOMOZ and Webmaster Tools showing Different Page Index Results
I am promoting a jewelry e-commerce website. The website has about 600 pages and the SEOMOZ page index report shows this number. However, webmaster tools shows about 100,000 indexed pages. I have no idea why this is happening and I am sure this is hurting the page rankings in Google. Any ideas? Thanks, Guy
Technical SEO | | ciznerguy1 -
Google Indexing
Hi Everybody, I am having kind of an issue when it comes to the results Google is showing on my site. I have a multilingual site, which is main language is Catalan. But of course if I am looking results in Spanish (google.es) or in English (google.com) I want Google to show the results with the proper URL, title and descriptions. My brand is "Vallnord" so if you type this in Google you will be displayed the result in Catalan (Which is not optimized at all yet) but if you search "vallnord.com/es" only then you will be displayed the result in Spanish What do I have to do in order for Google to read this the way I want? Regards, Guido.
Technical SEO | | SilbertAd0 -
Yahoo and Bing do not index all pages
Only 20% of our pages are indexed by Bing and Yahoo although we have correctly submitted the sitemap to bing webmaster tools and other search engines index all our content. Do you have any suggestions?
Technical SEO | | AEM130 -
Properly Moving Blog from Index to its Own Page
Right now I have a website that is exclusively a blog. I want to create pages outside of the blog and move the blog to a page other than the index file e.g.) from domain.com to domain.com/blog I will have the blog post pages stay in the root directory. e.g.) domain.com/blog-post Any suggestions how to properly tell SE's and other websites that the blog has moved?
Technical SEO | | Bartell0