Listing of all Google Indexed Pages
-
I started managing a site that has about 391,000 indexed pages. I want to get to the bottom of why there are so many in preparation for a ecommerce Migration and improving SEO. Anyone know of a tool? Many tools I have came across can only take 100 at a time. I would love to get them in excel or a database. I look forward to the suggestions.
-
Using site:yourdomain.com in Google, and then going to the end of the results and telling it to show you all of the results, is a good first start. It should get you enough to get an idea of why there are duplicated pages.
The Moz crawl can also help you figure it out, as often with ecommerce you'll have URLs for sorting products by price, name, pagination parameters, etc. We'll throw up a flag when we see a bunch of duplicate content or duplicate titles.
Also look for the easy stuff, such as non-www doesn't direct to www. Fix that, and you've cut your pages in half.
-
I may not be answering this correctly...
Are you looking for a list of URLs? If so, easy peasy to use screaming frog.
If it's all the pages Google has indexed, I don't really know and I'm sorry! However, I will come back to this thread to see if someone else has the answer for you, because I'm quite interested in it myself!!!
Best of luck,
Amelia
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
'Duplicate Page Content' for dissimilar pages
I'm using Moz's Crawl Diagnostics to try and clean up some SEO priorities for our website (http://www.craftcompany.co.uk) HOWEVER, virtually all of the pages that are being categorised as duplicate content are not the same, or indeed similar. For instance, these three pages have been deemed duplicated pages; http://www.craftcompany.co.uk/pme-rose-leaf-veined-plunger.html http://www.craftcompany.co.uk/double-faced-satin-ribbon-black-25mm-wide.html http://www.craftcompany.co.uk/double-faced-satin-maroon-10mm-wide-25mt.html Can anyone give me an insight into why this is? Many Thanks! http://www.craftcompany.co.uk/
Moz Pro | | The_Craft_Company0 -
On Page Reports for Long-Tail Keywords?
First Q&A here, so take it easy on me. Hopefully this is not a dumb question. 99% of the on page reports I look at are for local keywords. I'm finding that some of the page reports get and F where they should be receiving A's. I noticed if I manage a long-tail keyword like "books in houston tx" that I get a grade based on the exact phrase as opposed to a combination of the keywords. So when I have text in the body saying "books in the Houston area" or "Houston Books" for example, instead of using my exact managed keyword, it will tell me that the keyword is not mentioned in the body anywhere. When it is, it's just not in the exact order... I'm trying to write my pages for users and I don't think the users want to hear "books in houston tx" several times. If I do a report on the same page for the keyword "books" I get an A. So this is where it gets a little in the grey area for me. I need a solid on page report for local long-tail search terms. Anyone have any advice? Is there a way I could be using this tool to better suit my needs?
Moz Pro | | AmericomMarketing0 -
One page report are empty !
Hi Rodgerbot, Now, i've no seomoz one page report for any campaign 😞 What happen ? I've previously several report. Thanks,
Moz Pro | | Max840 -
Page authority questions?
I've been analyzing some IT communities ...in order to check how relevant is the page authority vs PageRank. I found one main site which is organized by "communities'..and every community is a sub-domain. The root domain has an authority of 90/100 which it should be great......so the sub-domains "inherit" part of this authority.... Until here everything seems to be perfect. However, I went deeper and I picked one of these communities. Analyzing the "Linking Root Domain" I discovered it only has only 5 root domains pointing to its home page. Those 5 Root Domains have generated more than 134k links. That doesn't seem to be "natural". Checking those 5 Root Domains I discovered that they have been registered by the same Root Domain site. Ex: Main domain: Domain.com Community1.domain.com Community2.domain.com.... Linking Root Domains: DomainXY.com DomainABC.com DomainRST.com DomainFGH.com DomainOPQ.com It seems to me that it is easy to cheat the authority domain score. Just creating others sites developing the same topic and generating back links to your main domain
Moz Pro | | SherWeb0 -
Sudden decrease in Moz Page rank
Hello, We have a serious issue with 404s and recently saw our Moz Page rank fall from 53 to 47. 1. OSE Inbound links no longer shows any of our Linked In posts, did Linked In stop passing juice? 2. Does SEO Moz reduce your ranking when there's a sudden increase in 404s? 2a. WP Yoast SEO - I accidentally checked the box on this plugin to "Strip the category base (usually /category/) from the category URL" which basically caused all of our blog post categories and Datafeedr categories to disappear. Didn't realize till too much time had passed that I accidentally clicked that box. Datafeedr is a plugin for our estore that parses the data feeds from affiliate vendors and allows you to create a saved search that auto updates old products every 3 days. I had a no index/follow parameter on the category items, but seeing the # of 404s continue to increase, I temporarily removed this parm last week to see if it reduces this now static number of 404s. Google Webmaster tools started showing a ton of soft 404s that kept increasing, while SEO Moz didn't show any of those 404s. I didn't pay much attention to GWT since Google kept saying it won't affect our rankings, and nothing was showing up on SEO Moz. Last week a fraction of those 404s showed up and I am not sure if that's what lowered our Moz rank or what looks like a possible delinking from Linked In and a higher ranking complimentary website directly related to our field itsallaboutyoga. Looking at the Moz graph of "Total Linking Root Domains Over Time" all of our competitors took a similar % hit since between June and the end of July, so I am thinking its more wide based than fat fingered mistake. I fixed # 2, (have to still figure out what to do with most of those 404s, thinking of submitting a request to Google vs 1,000s of 301s) so in doing my review of this sequence of events and using it as a learning experience, where would I assign max destructive value as a percentage? A. Ignoring GWT soft 404s in favor of SEO Moz campaign reports B. No follow from Linked In and related industry site C. Datafeedr, thousands of indexed products through Datafeedr that are no longer available mostly due to WP Yoast SEO fat finger error. I did have the D. WP Yoast SEO, "Strip the category base (usually /category/) from the category URL" E. Global Google algo change Cheers, Michael
Moz Pro | | MKaloud0 -
Getting PA & DA off of a list of links
I have a list of links that I want to get PA and DA for each individual link, can this be done in some way other than one at a time? I've heard this can be done with excel and using api but I don't know the specifics of this.. Help would be appreciated
Moz Pro | | Fergclaw2 -
Wild fluctuation in number of pages crawled
I am seeing huge fluctuations in the number of pages discovered the crawl each week. Some weeks the crawl discovers > 10,000 pages and other weeks I am seeing 4-500. So, this week for example I was hoping to see some changes reflected for warnings from last weeks report (which discovered > 10,000 pages). However, the entire crawl this week was 448 pages. The number of pages discovered each week seems to go back and forth between these two extremes. The more accurate count would be nearer the 10,000 mark than the 400 range. Thanks. Mark
Moz Pro | | MarkWill0 -
SEOmoz Bot indexing JSON as content
Hello, We have a bunch of pages that contain local JSON we use to display a slideshow. This JSON has a bunch of<a links="" in="" it. <="" p=""></a> <a links="" in="" it. <="" p="">For some reason, these</a><a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p=""></a> <a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p="">One example page this is happening on is: http://www.trendhunter.com/trends/a2591-simplifies-product-logos . Searching for the string '<a' yields="" 1100+="" results="" (all="" of="" which="" are="" recognized="" as="" links="" for="" that="" page="" in="" seomoz),="" however,="" ~980="" these="" json="" code="" and="" not="" actual="" on="" the="" page.="" this="" leads="" to="" a="" lot="" invalid="" our="" site,="" super="" inflated="" count="" on-page="" page. <="" span=""></a'></a> <a links="" that="" are="" in="" json="" being="" indexed="" and="" recognized="" by="" the="" seomoz="" bot="" showing="" up="" as="" legit="" for="" page. <="" p="">Is this a bug in the SEOMoz bot? and if not, does google work the same way?</a>
Moz Pro | | trendhunter-1598370