Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Tool to calculate the number of pages in Google's index?
-
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
-
A good way to know how many pages are indexed in Google is to check the Top Landing Pages in Google Analytics,
It gives you a more interesting information than, IMHO, Google WBT itself, because those pages are the pages that actually are indexed and because of that receive traffic.
-
Google Webmaster Tools allows you to see your sitemap, along with the count of URLs in the sitemap and Google's index.
You can also download your sitemap from Google WMT. I am not aware of any data Google offers to help identify which URLs are not indexed. It sounds like this is the information Michelle is seeking.
If you just needed to know the # of URLs for each subdirectory, you could submit a sitemap for each one and then Google will show the URLs in Index for each given sitemap.
-
Is this your site? If so you should be able to find this information in Google Webmaster tools. Upload your sitemap and Google will tell you how many of the pages in your sitemap are in there index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google ignore content styled with 'display:none'?
Do you know if an H1 within a div that has a 'display: none' style applied will still be crawled and evaluated by Google? We have that situation on this page on line 136: view-source:https://www.junk-king.com/services/items-we-take/foreclosure-cleanouts Of course we also have an H1 up at the top of the page and are concerned that the second one will cause interference with our SEO efforts. I've seen conflicting and inconclusive information on line - not sure. Thanks for any help.
Intermediate & Advanced SEO | | rastellop0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | | Inevo0 -
Magento: Should we disable old URL's or delete the page altogether
Our developer tells us that we have a lot of 404 pages that are being included in our sitemap and the reason for this is because we have put 301 redirects on the old pages to new pages. We're using Magento and our current process is to simply disable, which then makes it a a 404. We then redirect this page using a 301 redirect to a new relevant page. The reason for redirecting these pages is because the old pages are still being indexed in Google. I understand 404 pages will eventually drop out of Google's index, but was wondering if we were somehow preventing them dropping out of the index by redirecting the URL's, causing the 404 pages to be added to the sitemap. My questions are: 1. Could we simply delete the entire unwanted page, so that it returns a 404 and drops out of Google's index altogether? 2. Because the 404 pages are in the sitemap, does this mean they will continue to be indexed by Google?
Intermediate & Advanced SEO | | andyheath0 -
How do I get rel='canonical' to eliminate the trailing slash on my home page??
I have been searching high and low. Please help if you can, and thank you if you spend the time reading this. I think this issue may be affecting most pages. SUMMARY: I want to eliminate the trailing slash that is appended to my website. SPECIFIC ISSUE: I want www.threewaystoharems.com to showing up to users and search engines without the trailing slash but try as I might it shows up like www.threewaystoharems.com/ which is the canonical link. WHY? and I'm concerned my back-links to the link without the trailing slash will not be recognized but most people are going to backlink me without a trailing slash. I don't want to loose linkjuice from the people and the search engines not being in consensus about what my page address is. THINGS I"VE TRIED: (1) I've gone in my wordpress settings under permalinks and tried to specify no trailing slash. I can do this here but not for the home page. (2) I've tried using the SEO by yoast to set the canonical page. This would work if I had a static front page, but my front page is of blog posts and so there is no advanced page settings to set the canonical tag. (3) I'd like to just find the source code of the home page, but because it is CSS, I don't know where to find the reference. I have gone into the css files of my wordpress theme looking in header and index and everywhere else looking for a specification of what the canonical page is. I am not able to find it. I'm thinking it is actually specified in the .htaccess file. (4) Went into cpanel file manager looking for files that contain Canonical. I only found a file called canonical.php . the only thing that seemed like it was worth changing was changing line 139 from $redirect_url = home_url('/'); to $redirect_url = home_url(''); nothing happened. I'm thinking it is actually specified in the .htaccess file. (5) I have gone through the .htaccess file and put thes 4 lines at the top (didn't redirect or create the proper canonical link) and then at the bottom of the file (also didn't redirect or create the proper canonical link) : RewriteEngine on
Intermediate & Advanced SEO | | Dillman
RewriteCond %{HTTP_HOST} ^([a-z.]+)?threewaystoharems.com$ [NC]
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteRule .? http://www.%1threewaystoharems.com%{REQUEST_URI} [R=301,L] Please help friends.0 -
Remove URLs that 301 Redirect from Google's Index
I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?
Intermediate & Advanced SEO | | trung.ngo0 -
How to find all indexed pages in Google?
Hi, We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools. How can I get a list of all pages indexed of our domain? trying to locate the duplicate content. Doing a "site:www.mydomain.com" only returns up to 676 results... Any ideas? Thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0