How can I get a listing of just the URLs that are indexed in Google
-
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
-
This question still remains unanswered, why did it get marked answered?
-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
If you import the TSV into Excel you will get a column of just the URLS
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs
-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
Your welcome. If that fully answered your question please mark it as answered.
-
Thanks, that let me grab the first 1000.
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
You may want to do that from an xml sitemap. You can find sites out there that will build a sitemap for you for free and then just open it in excel and you should have all of your urls in a list. NOw that doesn't answer your question of just the urls in google, but you will get all of the ones in google and then some if you do it the way suggested. Better overkill than underkill. Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google tries to index non existing language URLs. Why?
Hi, I am working for a SAAS client. He uses two different language versions by using two different subdomains.
Technical SEO | | TheHecksler
de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly. But Google Search Console tries to index URLs which were never existing before and are still not existing. de.domain.com**/en/company
en.domain.com/de/**company ... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that 😉 ). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
We do see in our logfiles, that a Screaming Frog Installation and moz.com w opensiteexplorer were trying to access this earlier. My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed? Any ideas? Thanks 🙂0 -
Google indexing .com and .co.uk site
Hi, I am working on a site that is experiencing indexation problems: To give you an idea, the website should be www.example.com however, Google seems to index www.example.co.uk as well. It doesn’t seem to honour the 301 redirect that is on the co.uk site. This is causing quite a few reporting and tracking issues. This happened the first time in November 2016 and there was an issue identified in the DDOS protection which meant we would have to point www.example.co.uk to the same DNS as www.example.com. This was implemented and made no difference. I cleaned up the htaccess file and this made no difference either. In June 2017, Google finally indexed the correct URL, but I can’t be sure what changed it. I have now migrated the site onto https and www.example.co.uk has been reindexed in Google alongside www.example.com I have been advised that the http needs to be removed from DDOS which is in motion I have also redirected http://www.example.co.uk straight to https://www.example.com to prevent chain redirects I can’t block the site via robot.txt unless I take the redirects off which could mean that I lose my rankings. I should also mention that I haven't actually lost any rankings, it's just replaced some URLs with co.uk and others have remained the same. Could you please advise what further steps I should take to ensure the correct URL’s are indexed in Google?
Technical SEO | | Niki_10 -
I hope someone can help me with page indexing problem
I have a problem with all video pages on www.tadibrothers.com.
Technical SEO | | TadiBrothers
I can not understand why google do not index all the video pages?
I never blocked them with the robots.txt file, there are no noindex/nofollow tags on the pages. The only video page that I found in search results is the main video category page: https://www.tadibrothers.com/videos and 1 video page out of 150 videos: https://www.tadibrothers.com/video/front-side-rear-view-cameras-for-backup-camera-systems I hope someone can point me to the right way0 -
How long does Google takes to re-index title tags?
Hi, We have carried out changes in our website title tags. However, when I search for these pages on Google, I still see the old title tags in the search results. Is there any way to speed this process up? Thanks
Technical SEO | | Kilgray0 -
Sitemap url's not being indexed
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed) The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it. For example Url in the sitemap: http://example.com/example-category/0246 Url once you actually go to that link: http://example.com/example-category/0246#.VR5a Just for further information, the XML file does not have any style information associated with it and is in it's most basic form. Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ? Thanks all for your help.
Technical SEO | | GreenStone0 -
How to optimize for different google seach center (google.de, google.ch) ?
We all use Deutsch language and (.com) domains for the sites. I ranked well in google.com ,but not so well in google.de , google.ch , my competitors ranked much better in google.de,google.ch. I checked most of their outbound-links, but get few information. Links from (.DE) domains or links from sites located in German help the rank for special google seach center ? (google.de, google.ch) . Or some other factors i missed? please help.
Technical SEO | | sunvary0 -
When should we use Remove URLs feature on Google Webmasters Tool?
Hi there, I run an ecommerce website on Magento. We are no longer using a category. It actually does not appear on the menu: mydomain.com/category.html If this is the case, do you recommend to remove it through the Removal URL feature on GWT? I don't want this to affect the juice of other links of the site such as: mydomain.com/product.html Thanks very much. Regards
Technical SEO | | footd0 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0