How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway
What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
Why some pages show schema and some don't in Google?
I notice Google displays the schema(reviews, price, availability etc.) in results only for some of our item pages in same category using same template. Any ideas why this is happening. They are created around same time - more than a year ago. Schema was also added a year ago.
Intermediate & Advanced SEO | | rbai0 -
Delete page and tell it to Google
Hello everybody, i have a problem with some pages of my website. I have had to removed 5-10 pages because these pages linked to 404 pages and i removed it. Need i to tell to Google or Only removed? Thanks so much
Intermediate & Advanced SEO | | pompero990 -
Better for SEO to No-Index Pages with High Bounce Rates
Greeting MOZ Community: I operate www.nyc-officespace-leader.com, a New York City commercial real estate web site established in 2006. An SEO effort has been ongoing since September 2013 and traffic has dropped about 30% in the last month. The site has about 650 pages. 350 are listing pages, 150 are building pages. The listing and building pages have an average bounce rate of about 75%. The other 150 pages have a bounce rate of about 35%. The building and listing pages are dragging down click through rates for the entire site. My SEO firm believe there might be a benefit to "no-index, follow" these high bounce rate URLs. From an SEO perspective, would it be worthwhile to "no-index-follow" most of the building and listing pages in order to reduce the bounce rate? Would Google view the site as a higher quality site if I had these pages de-indexed and the average bounce rate for the site dropped significantly. If I no-indexed these pages would Google provide bette ranking to the pages that already perform well? As a real estate broker, I will constantly be adding many property listings that do not have much content so it seems that a "no-index, follow" would be good for the listings unless Google penalizes sites that have too many "no-index, follow" pages. Any thoughts??? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
Google+ Page Question
Just started some work for a new client, I created a Google+ page and a connected YouTube page, then proceeded to claim a listing for them on google places for business which automatically created another Google+ page for the business listing. What do I do in this situation? Do I delete the YouTube page and Google+ page that I originally made and then recreate them using the Google+ page that was automatically created or do I just keep both pages going? If the latter is the case, do I use the same information to populate both pages and post the same content to both pages? That doesn't seem like it would be efficient or the right way to go about handling this but I could be wrong.
Intermediate & Advanced SEO | | goldbergweismancairo0 -
Are pages with a canonical tag indexed?
Hello here, here are my questions for you related to the canonical tag: 1. If I put online a new webpage with a canonical tag pointing to a different page, will this new page be indexed by Google and will I be able to find it in the index? 2. If instead I apply the canonical tag to a page already in the index, will this page be removed from the index? Thank you in advance for any insights! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Duplicate Content From Indexing of non- File Extension Page
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
Intermediate & Advanced SEO | | WebbyNabler0 -
How Google Carwler Cached Orphan pages and directory?
I have website www.test.com I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval. Now, my demo link will be www.test.com/demo/ I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/ Then how Google crawler find it and cached some pages or entire directory? Thanks
Intermediate & Advanced SEO | | darshit210