Differing numbers of pages indexed with and without the trailing slash
-
I noticed today that a site: query in Google (UK) for a certain domain I'm looking at returns different numbers depending on whether or not the trailing slash is added at the end. With the trailing slash the numbers are significantly different. This is a domain with a few duplicate content issues.
It seems very rare but I've managed to replicate it for a couple of other well known domains, so this is the phenomenon I'm referring to:
site:travelsupermarket.com - 16'300 results
site:travelsupermarket.com/ - 45'500 resultssite:guardian.co.uk - 120'000'000 results
site:guardian.co.uk/ - 121'000'000 resultsFor the particular domain I'm looking at the numbers are 19'000 without the trailing slash and 800'000 with it! As mentioned, there are a few duplicate content issues at the moment that I'm trying to tidy up, but how should I interpret this? Has anyone seen this before and can advise what it could indicate?
Thanks in advance for any answers.
-
"There is an XML sitemap submitted and GWMT shows a total number of indexed pages in the 800'000 region."
Brilliant. That's the number I would trust.
Incidentally, I see different numbers than what you see for all 4 site: queries you mentioned. Variances are pretty normal in my experience.
I've never noticed it, I would be intrigued to hear if someone else has correlated such variances to a technical issue or penalty.
-
Hi Adam, thanks for your response.
There is an XML sitemap submitted and GWMT shows a total number of indexed pages in the 800'000 region.
While I appreciate site: is not a precise tool, the fact that the site: numbers between trailing slash and no trailing slash match for virtually every other domain I try this with, and the numbers are so different in this example, suggests to me that this could be an indication of something amiss.
-
The site: query on Google isn't a precise tool. It's not uncommon to see strange variances like that.
For a more accurate count, submit an XML Sitemap via Google Webmaster Tools and Google will give you a more precise count of which pages it has indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages.
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages. I recently uploaded my sitemap file - https://psglearning.com/sitemapcustom/sitemap-index.xml - via Search Console. The only record within the XML file is sitemaps.gz. When I searched for some content on my site - here is the search https://goo.gl/mqxBeq - I was shown the following search result, indicating that our GZ file is getting indexed instead of our pages. http://www.psglearning.com/catalog 1 http://www.psglearning.com ...www.psglearning.com/sitemapcustom/sitemap.gz... 1 https://www.psglearning.com/catalog/productdetails/9781284059656/ 1 https://www.psglearning.com/catalog/productdetails/9781284060454/ 1 ... My sitemap is listed at https://psglearning.com/sitemapcustom/sitemap-index.xml inside the sitemap the only reference is to sitemap.gz. Should we remove the link the the sitemap.gz within the xml file and just serve the actual page paths? <sitemapindex< span=""> xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></sitemapindex<><sitemap></sitemap>https://www.psglearning.com/sitemapcustom/sitemap.gz<lastmod></lastmod>2017-06-12T09:41-04:00
Technical SEO | | pdowling0 -
404 Errors for Form Generated Pages - No index, no follow or 301 redirect
Hi there I wonder if someone can help me out and provide the best solution for a problem with form generated pages. I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404. Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy. Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed. Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible? The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC. I really appreciate any feedback on this one. Many thanks
Technical SEO | | Ric_McHale0 -
Blog Page Titles - Page 1, Page 2 etc.
Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks
Technical SEO | | O2C0 -
How to know how much pages are indexed on Google?
I have a big site, there are a way to know what page are not indexed? I know that you can use site: but with a big site is a mess to check page by page. This is a tool or a system to check a entire site and automatically find non-indexed pages?
Technical SEO | | markovald0 -
Does Google Still Pass Anchor Text for Multiple Links to the Same Page When Using a Hashtag? What About Indexation?
Both of these seem a little counter-intuitive to me so I want to make sure I'm on the same page. I'm wondering if I need to add "#s to my internal links when the page I'm linking to is already: a.) in the site's navigation b.) in the sidebar More specifically, in your experience...do the search engines only give credit to (or mostly give credit to) the anchor text used in the navigation and ignore the anchor text used in the body of the article? I've found (in here) a couple of folks mentioning that content after a hashtagged link isn't indexed. Just so I understand this... a.) if I were use a hashtag at the end of a link as the first link in the body of a page, this means that the rest of the article won't be indexed? b.) if I use a table of contents at the top of a page and link to places within the document, then only the areas of the page up to the table of contents will be indexed/crawled? Thanks ahead of time! I really appreciate the help.
Technical SEO | | Spencer_LuminInteractive0 -
Does google like Category pages or pages with lots of Products on them?
We are having an issue with getting Google to rank the page we want. To have this page http://www.jakewilson.com/c/52/-/346/Cruiser-Motorcycle-Tires rank for the key word Cruiser Motorcycle Tires; however, this page http://www.jakewilson.com/t/52/-/343/752/Cruiser-Motorcycle-Tires is ranking instead and it has less links and page authority according to site explorer and it is farther down in the hierarchy. I am wondering if google just likes pages that have actual products on them instead of a page leading to the page with all the products. Thoughts?
Technical SEO | | DoRM0 -
Properly Moving Blog from Index to its Own Page
Right now I have a website that is exclusively a blog. I want to create pages outside of the blog and move the blog to a page other than the index file e.g.) from domain.com to domain.com/blog I will have the blog post pages stay in the root directory. e.g.) domain.com/blog-post Any suggestions how to properly tell SE's and other websites that the blog has moved?
Technical SEO | | Bartell0