How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway
What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is Google able to see child pages in our AJAX pagination?
We upgraded our site to a new platform the first week of August. The product listing pages have a canonical issue. Page 2 of the paginated series has a canonical pointing to page 1 of the series. Google lists this as a "mistake" and we're planning on implementing best practice (https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html) We want to implement rel=next,prev. The URLs are constructed using a hashtag and a string of query parameters. You'll notice that these parameters are ¶meter:value vs ¶meter=value. /products#facet:&productBeginIndex:0&orderBy:&pageView:grid&minPrice:&maxPrice:&pageSize:& None of the URLs are included in any indexed URLs because the canonical is the page URL without the AJAX parameters. So these results are expected. Screamingfrog only finds the product links on page 1 and doesn't move to page 2. The link to page 2 is AJAX. ScreamingFrog only crawls AJAX if its in Google's deprecated recommendations as far as I know. The "facet" parameter is noted in search console, but the example URLs are for an unrelated URL that uses the "?facet=" format. None of the other parameters have been added by Google to the console. Other unrelated parameters from the new site are in the console. When using the fetch as Google tool, Google ignores everything after the "#" and shows only the main URL. I tested to see if it was just pulling the canonical of the page for the test, but that was not the case. None of the "#facet" strings appear in the Moz crawl I don't think Google is reading the "productBeginIndex" to specify the start of a page 2 and so on. One thought is to add the parameter in search console, remove the canonical, and test one category to see how Google treats the pages. Making the URLs SEO friendly (/page2.../page3) is a heavy lift. Any ideas how to diagnose/solve this issue?
Intermediate & Advanced SEO | | Jason.Capshaw0 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
Substantial difference between Number of Indexed Pages and Sitemap Pages
Hey there, I am doing a website audit at the moment. I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise. Total indexed: 2,360 (Search Consule)
Intermediate & Advanced SEO | | Online-Marketing-Guy
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLs Cheers,
Jochen0 -
Can you no index a page in Wordpress from just Google news?
I'm trying to find a plugin for Wordpress that enables you to no-index an individual page from Google news but not from Google search results. We want to remove some of our pages from Google news without hurting others.
Intermediate & Advanced SEO | | uSw0 -
Google local pointing to Google plus page not homepage
Today my clients homepage dropped off the search results page (was #1 for months, in the top for years). I noticed in the places account everything is suddenly pointing at the Google plus page? The interior pages are still ranking. Any insight would be very helpful! Thanks.
Intermediate & Advanced SEO | | stevenob0 -
Language Subdirectory homepage not indexed by Google
Hi mozzers, Our Spanish homepage doesn't seem to be indexed or cached in Google, despite being online for over a month or two. All Spanish subpages are indexed and have started to rank but not the homepage. I have submitted sitemap xml to GWTools and have checked there's no noindex on the page - it seems to be in order. And when I run site: command in Google it shows all pages except homepage. What could be the problem? Here's the page: http://www.bosphorusyacht.com/es/
Intermediate & Advanced SEO | | emerald0 -
Why the archive sub pages are still indexed by Google?
Why the archive sub pages are still indexed by Google? I am using the WordPress SEO by Yoast, and selected the needed option to get these pages no-index in order to avoid the duplicate content.
Intermediate & Advanced SEO | | MichaelNewman1 -
Same content pages in different versions of Google - is it duplicate>
Here's my issue I have the same page twice for content but on different url for the country, for example: www.example.com/gb/page/ and www.example.com/us/page So one for USA and one for Great Britain. Or it could be a subdomain gb. or us. etc. Now is it duplicate content is US version indexes the page and UK indexes other page (same content different url), the UK search engine will only see the UK page and the US the us page, different urls but same content. Is this bad for the panda update? or does this get away with it? People suggest it is ok and good for localised search for an international website - im not so sure. Really appreciate advice.
Intermediate & Advanced SEO | | pauledwards0