Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
-
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up.
I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove.
Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
-
Looks like I can only do the first thousand. It's a start though. Thank you for the information.
Many of the URL's on my list, when put in to Google search, are giving me 80-100 other variants I can remove by hand.
http://www.mathewporter.co.uk/list-a-domains-indexed-pages-in-google-docs/ for anyone else following.
-
Finally getting around to doing this and noticed that when I change the start number to anything above 900, it doesn't work - ie: it's only letting me look at the first 1,000 results for some reason.
The list of 1,000 has given me some good URL's to search off for the filtering thingy that was generating all the garbage URL's but I'd love to get past 1,000 if I can.
Does anyone know how?
-
Correct. I have gone in to URL Parameters already and set them to Crawl 'No URLs' for those we don't want crawled.
We haven't added those parameters listed in there in to the robots.txt file yet, but I will do that now. I had an initial consult today and we ran way over time when we discovered all this stuff so I have another appointment in a couple of weeks.
We have a sitemap of all the category pages and relevant static pages on the site already and Google has those indexed nicely. We just need to get rid of the 240,000 pages it has indexed that we don't want in there (frightening I know - it's a really high number).
I greatly appreciate you taking the time to respond. Thank you.
-
Thanks. There's a lot of auto-generated content, duplicate pages and we've set the robots.txt file up to exclude a large number of them. Now we wait.
Very helpful and greatly appreciated. Thank you.
-
Hi,
I'm going to assume that as you have said it's an e-commerce site that the URL parameters are created by product variations, filters, sorts etc. If so then you must already be seeing those parameters on the URL of your site as you navigate and in your analytics or search results.
Your SEO specialist should easily be able to add those parameters to the robots file. Then personally I would resubmit a site map for completeness and wait for results to take effect.
-
Joanne,
I'm afraid there's no way to know which pages are actually indexed from your Webmaster Tools. You can use a simple search in Google: site:domain.com and it will list "all" your indexed pages, however, there's no way to export that as a report.
You can create a report using some "hack". Login to your Google Drive, create a new spreadsheet and use the following command to populate rows:
=importXml("https://www.google.com/search?q=site:www.yourdomainnamehere.com&num=100&start=1"; "//cite")
This will load the first 100 results. You will need to repeat the process for every 1000 results you have, changing the last variable: "start=1" to "start=100" and then "start=200", etc (you see where I'm going). This could really be a pain in the butt for your site's size.
My recommendation is you navigate your own site, decide which pages should be removed and then create the robots.txt regardless what google has indexed. Once you complete your robots.txt, it will take a few weeks (or even a month) to have the blocked pages removed.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Prioritise a page in Google/why is a well-optimised page not ranking
Hello I'm new to Moz Forums and was wondering if anyone out there could help with a query. My client has an ecommerce site selling a range of pet products, most of which have multiple items in the range for difference size animals i.e. [Product name] for small dog
Intermediate & Advanced SEO | | LauraSorrelle
[Product name] for medium dog
[Product name] for large dog
[Product name] for extra large dog I've got some really great rankings (top 3) for many keyword searches such as
'[product name] for dogs'
'[product name]' But these rankings are for individual product pages, meaning the user is taken to a small dog product page when they might have a large dog or visa versa. I felt it would be better for the users (and for conversions and bounce rates), if there was a group page which showed all products in the range which I could target keywords '[product name]', '[product name] for dogs'. The page would link through the the individual product pages. I created some group pages in autumn last year to trial this and, although they are well-optimised (score of 98 on Moz's optimisation tool), they are not ranking well. They are indexed, but way down the SERPs. The same group page format has been used for the PPC campaign and the difference to the retention/conversion of visitors is significant. Why are my group pages not ranking? Is it because my client's site already has good rankings for the target term and Google does not want to show another page of the site and muddy results?
Is there a way to prioritise the group page in Google's eyes? Or bring it to Google's attention? Any suggestions/advice welcome. Thanks in advance Laura0 -
Drop in Indexed pages
Hope everyone is having an Awesome December! I first noticed a drop in my index in the beginnings of November. My site drop in indexed pages from 1400 to 600 in the past 3-4 weeks. I don't know the cause of it, and would like the community to help me figure out why my indexing has dropped. Thank you for taking time out of your schedule to read this.
Intermediate & Advanced SEO | | BSC0 -
404 Pages. Can I change it to do this without getting penalized ? I want to lower our bounce rate from these pages to encourage the user to continue on the site
Hi All, We have been streaming our site and got rid of thousands of pages for redundant locations (Basically these used to be virtual locations where we didn't have a depot although we did deliver there and most of them was duplicate/thin content etc ). Most of them have little if any link value and I didn't want to 301 all of them as we already have quite a few 301's already We currently display a 404 page but I want to improve on this. Current 404 page is - http://goo.gl/rFRNMt I can get my developer to change it, so it will still be a 404 page but the user will see the relevant category page instead ? So it will look like this - http://goo.gl/Rc8YP8 . We could also use Java script to show the location name etc... Would be be okay ? or would google see this as cheating. basically I want to lower our bounce rates from these pages but still be attractive enough for the user to continue in the site and not go away. If this is not a good idea, then any recommendations on improving our current 404 would be greatly appreciated. thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Multiple Google Webmaster Tools Configurations
Hello everyone, I just inherited a website and 2 different users created GWT accounts on the same site and have configured different settings. Do you know how Google behaves when this happens? Thanks
Intermediate & Advanced SEO | | Carla_Dawson0 -
Links from non-indexed pages
Whilst looking for link opportunities, I have noticed that the website has a few profiles from suppliers or accredited organisations. However, a search form is required to access these pages and when I type cache:"webpage.com" the page is showing up as non-indexed. These are good websites, not spammy directory sites, but is it worth trying to get Google to index the pages? If so, what is the best method to use?
Intermediate & Advanced SEO | | maxweb0 -
Google webmaster tool (GWT) owner removal issue
Hi! I have a new client, the former agency added the client property with the agency account so we had to create a new GA account (as you can’t transfer ownership at the account level) but we also kept access to the former account to keep historical data. We were granted owner access to the GWT (which is more flexible, you can remove owners and creators) and we now want to remove former agency users. We have 3 adresses. One was verified with delegation method (no pb for removal), one with meta tag (no pb) and one with Google Analytics. Here it becomes tricky as Google says regarding GA verif method “If this account was verified using a Google Analytics tracking code, you should make sure that the user you want to unverify is no longer an administrator on the Analytics account. Otherwise, removal may not be permanent”. The thing is that this user has the same email address as the one used to create the agency GA account (no ownership transfer) so I basically can’t remove admin rights. The other possibility, as Google mentions when I try to unlink this user, is “remove the administrator status in Google Analytics or delete the Google Analytics tracking code on the website”. But we don’t want to remove the code as we still want to track data with the former account for historical analysis purposes. Has anyone ever faced this situation? Do you know how to handle this? Do you think that unlinking the GWT and the GA accounts will unverify the GA method? Many thanks in advance ! Ennick
Intermediate & Advanced SEO | | ennick0 -
Is it dangerous to use "Fetch as Google" too much in Webmaster Tools?
I saw some people freaking out about this on some forums and thought I would ask. Are you aware of there being any downside to use "Fetch as Google" often? Is it a bad thing to do when you create a new page or blog post, for example?
Intermediate & Advanced SEO | | BlueLinkERP0 -
Google Listings
How can i make my pages appear in google results such as menu, diner, hours, contact us etc.. when some searches for my keyword or domain take a look at this screen shot Thanks UbqY4kwA UbqY4kwA
Intermediate & Advanced SEO | | vlad_mezoz0