Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway

What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop URLs that include query strings from being indexed by Google
Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Newly designed page ranks in Google but then disappears - at a loss as to why.
Hi all, I wondered if you could help me at all please? We run a site called getinspired365.com (which is not optimised) and in the last 2 weeks have tried to optimise some new pages that we have added. For example, we have optimised this page - http://getinspired365.com/lifes-a-bit-like-mountaineering-never-look-down This page was added to Google's index via webmaster tools. When I then did a search for the full quote it came back 2nd in Google's search. If I did a search for half the quote (Life is a bit like mountaineering) it also ranked 2nd. We had another quote page that we'd optimised that displayed similar behaviour (it ranked 4th). But then for some reason when I now do the search it doesn't rank in the top 100 results. This, despite, an unoptimised "normal" page ranking 4th for a search such as: Thousands of geniuses live and die undiscovered. So our domain doesn't seem to be penalised as our "normal" pages are ranking. These pages aren't particularly well designed from an SEO standpoint. But our new pages - which are optimised - keep disappearing from Google, despite the fact they still show as indexed. I've rendered the pages and everything appears fine within Google Webmaster Tools. At a bit of a loss as to why they'd drop so significantly? A few pages I could understand but they've all but been removed. Any one seen this before, and any ideas what could be causing the issue? We have a different URL structure for our new pages in that we have the quote appear in the URL. All the content (bar the quote) that you see in the new pages are unique content that we've written ourselves. Could it be that we've over optimised and Google view these pages as spam? Many thanks in advance for all your help.
Intermediate & Advanced SEO | | MichaelWhyley0 -
Should I set up no index no follow on low quality pages?
I know it is a good idea for duplicate pages, blog tags, etc. but I remember somewhere that you can help the overall link juice of a website by adding no index no follow or no index follow low quality content pages of your website. Is it still a good idea to do this or was it never a good idea to begin with? Michael
Intermediate & Advanced SEO | | Michael_Rock0 -
Links from non-indexed pages
Whilst looking for link opportunities, I have noticed that the website has a few profiles from suppliers or accredited organisations. However, a search form is required to access these pages and when I type cache:"webpage.com" the page is showing up as non-indexed. These are good websites, not spammy directory sites, but is it worth trying to get Google to index the pages? If so, what is the best method to use?
Intermediate & Advanced SEO | | maxweb0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Number of Indexed Pages are Continuously Going Down
I am working on online retail stores. Initially, Google have indexed 10K+ pages of my website. I have checked number of indexed page before one week and pages were 8K+. Today, number of indexed pages are 7680. I can't understand why should it happen and How can fix it? I want to index maximum pages of my website.
Intermediate & Advanced SEO | | CommercePundit0 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0