How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway
What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do if lots of backend pages have been indexed by Google erroneously?
Hi Guys Our developer forgot to add a no index no follow tag on the pages he created in the back-end. So we have now ended up with lots of back end pages being indexed in google. So my question is, since many of those are now indexed in Google, so is it enough to just place a no index no follow on those or should we do a 301 redirect on all those to the most appropriate page? If a no index no follow is enough, that would create lots of 404 errors so could those affect the site negatively? Cheers Martin
Intermediate & Advanced SEO | | martin19700 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Cache and index page of Mobile site
Hi, I want to check cache and index page of mobile site. I am checking it on mobile phone but it is showing the cache version of desktop. So anybody can tell me the way(tool, online tool etc.) to check mobile site index and cache page.
Intermediate & Advanced SEO | | vivekrathore0 -
Google Is Indexing My Internal Search Results - What should i do?
Hello, We are using a CMS/E-Commerce platform which isn't really built with SEO in mind, this has led us to the following problem.... a large number of internal (product search) search result pages, which aren't "search engine friendly" or "user friendly", are being indexed by google and are driving traffic to the site, generating our client revenue. We want to remove these pages and stop them from being indexed, replacing them with static category pages - essentially moving the traffic from the search results to static pages. We feel this is necessary as our current situation is a short-term (accidental) win and later down the line as more pages become indexed we don't want to incur a penalty . We're hesitant to do a blanket de-indexation of all ?search results pages because we would lose revenue and traffic in the short term, while trying to improve the rankings of our optimised static pages. The idea is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages. Our main focus is to improve user experience and not have customers enter the site through unexpected pages. All thoughts or recommendations are welcome. Thanks
Intermediate & Advanced SEO | | iThinkMedia0 -
Will I lose traffic from Google for re-directing a page?
I’m currently planning to a retire a discontinued product and put a 301 redirect to a related product (although not identical). The thing is, I’m still getting significant traffic from people searching for the old product by name. Would Google send this traffic to the new pages via the re-direct? Is Google likely to display the new page in place of the old page for similar queries or will it serve other content? I’d like to answer this question so that I can decide between the two following approaches: 1) Retiring the old page immediately and putting a 301 redirect to the new related pages. This will have the advantage of transferring the value of any link signals / referring traffic. Traffic will also land on the new pages directly without having to click through from another page. We would have a dynamic message telling users that the old product had been retired depending on whether they had visited out site before. 2) Keep the old product pages temporarily so that we don’t lose the traffic from the search engines. We would then change the old pages to advise users that the old product was now retired, but that we have other products that might solve their problems. When this organic traffic decreases over time, then we will proceed with the re-direct as above. I am worried though that the old product pages might outrank the new product pages. I’d really appreciate some advice with this. I’ve been reading lots of articles, but it seems like there are different opinions on this. I understand that I will lose between 10% - 15% of page rank as per the Matt Cutts video.
Intermediate & Advanced SEO | | RG_SEO0 -
Huge Google index on E-commerce site
Hi Guys, I got a question which i can't understand. I'm working on a e-commerce site which recently got a CMS update including URL updates.
Intermediate & Advanced SEO | | ssiebn7
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed). The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed. Another strange thing which another forum member describes here : Cache date has been reverted And next to that old url's (which have a 301 for about a month now) keep showing up in the index. Does anyone know what i could do to solve the problem?0 -
Getting 260,000 pages re-indexed?
Hey there guys, I was recently hired to do SEO for a big forum to move the site to a new domain and to get them back up to their ranks after this move. This all went quite well, except for the fact that we lost about 1/3rd of our traffic. Although I expected some traffic to drop, this is quite a lot and I'm wondering what it is. The big keywords are still pulling the same traffic but I feel that a lot of the small threads on the forums have been de-indexed. Now, with a site with 260,000 threads, do I just take my loss and focus on new keywords? Or is there something I can do to get all these threads re-indexed? Thanks!
Intermediate & Advanced SEO | | StefanJDorresteijn0 -
Best way to stop pages being indexed and keeping PageRank
If for example on a discussion forum, what would be the best way to stop pages such as the posting page (where a user posts a topic or message) from being indexed AND not diluting PageRank too? If we added them to the Disallow on robots.txt, would pagerank still flow through the links to those blocked pages or would it stay concentrated on the linking page? Your ideas and suggestions will be greatly appreciated.
Intermediate & Advanced SEO | | Peter2640