How can I get a listing of just the URLs that are indexed in Google
-
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
-
This question still remains unanswered, why did it get marked answered?
-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
If you import the TSV into Excel you will get a column of just the URLS
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs
-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
Your welcome. If that fully answered your question please mark it as answered.
-
Thanks, that let me grab the first 1000.
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
You may want to do that from an xml sitemap. You can find sites out there that will build a sitemap for you for free and then just open it in excel and you should have all of your urls in a list. NOw that doesn't answer your question of just the urls in google, but you will get all of the ones in google and then some if you do it the way suggested. Better overkill than underkill. Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Recover google INdexing issue after fixing malware attack.
Dear My Niche site attacked by malware on 1 st march 2018. Hacker inject a php file on my blogpage. Injected link like: mydomain.com/blog/dmy4xa.php? Then I scan My site by wordfence. Identifying all malware code.Then manually clean whole site with database. My site is completely free from malware. and remove all malware link from webmaster tools. Even Block my blog page by robots.txt . But new malware link index every week. So i need to remove those link every week. So this issue I decided to rebuild my site. Finally I rebuild my site another server. Then I flash my current server and migrate my site from those server on 10th january 2019 . I wait 1 month to deindex malware link. But new link are indexing every week. I discourage site for over 1 week and even delete site from google webmaster tools with all properties as well as verification file from server. Over 1 week , Link are showing. I feel boar to delete malware link every week. I need permanent solution. Please give me a perfect solution for this malware link index. Google index about 100 url .After that I clean my site with some tools. My site was free from malware. But Ne
Technical SEO | | Gfound1230 -
Not all images indexed in Google
Hi all, Recently, got an unusual issue with images in Google index. We have more than 1,500 images in our sitemap, but according to Search Console only 273 of those are indexed. If I check Google image search directly, I find more images in index, but still not all of them. For example this post has 28 images and only 17 are indexed in Google image. This is happening to other posts as well. Checked all possible reasons (missing alt, image as background, file size, fetch and render in Search Console), but none of these are relevant in our case. So, everything looks fine, but not all images are in index. Any ideas on this issue? Your feedback is much appreciated, thanks
Technical SEO | | flo_seo1 -
Do URLs with canonical tags get indexed by Google?
Hi, we re-branded and launched a new website in February 2016. In June we saw a steep drop in the number of URLs indexed, and there have continued to be smaller dips since. We started an account with Moz and found several thousand high priority crawl errors for duplicate pages and have since fixed those with canonical tags. However, we are still seeing the number of URLs indexed drop. Do URLs with canonical tags get indexed by Google? I can't seem to find a definitive answer on this. A good portion of our URLs have canonical tags because they are just events with different dates, but otherwise the content of the page is the same.
Technical SEO | | zasite0 -
Google Cache showing a different URL
Hi all, very weird things happening to us. For the 3 URLs below, Google cache is rendering content from a different URL (sister site) even though there are no redirects between the 2 & live page shows the 'right content' - see: http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/tours/ http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/about/ http://webcache.googleusercontent.com/search?q=cache:http://giltedgeafrica.com/about/team/ We also have the exact same issue with another domain we owned (but not anymore), only difference is that we 301 redirected those URLs before it changed ownership: http://webcache.googleusercontent.com/search?q=cache:http://www.preferredsafaris.com/Kenya/2 http://webcache.googleusercontent.com/search?q=cache:http://www.preferredsafaris.com/accommodation/Namibia/5 I have gone ahead into the URL removal Tool and got denied for the first case above ("") and it is still pending for the second lists. We are worried that this might be a sign of duplicate content & could be penalising us. Thanks! ps: I went through most questions & the closest one I found was this one (http://moz.com/community/q/page-disappeared-from-google-index-google-cache-shows-page-is-being-redirected) but it didn't provide a clear answer on my question above
Technical SEO | | SouthernAfricaTravel0 -
Can Google Read schema.org markup within Ajax?
Hi All, as a local business directory, we also display Openinghours on a business listing page. ex. http://www.goudengids.be/napoli-kontich-2550/
Technical SEO | | TruvoDirectories
At the same time I also have schema.org markup for Openinghours implemented.
But, for technical reasons (performance), the openinghours (and the markup alongside) are displayed using AJAX. I'm wondering if google is able to read the markup. The rich snippet tool and markup plugings like Semantic Inspector can't "see" the markup for openinghours. Any advice here?0 -
No existing pages in Google index
I have a real estate portal. I have a few categories - for example: flats, houses etc. Url of category looks like that: mydomain.com/flats/?page=1 Each category has about 30-40 pages - BUT in Google index I found url like: mydomain.com/flats/?page=1350 Can you explain it? This url contains just headline etc - but no content! (it´s just generated page by PHP) How is it possible, that Google can find and index these pages? (on the web, there are no backlinks on these pages) thanks
Technical SEO | | visibilitysk0 -
Google place listings and search results- quick question.
Has anybody else noticed that they are ranking better on 'places' yet they have dropped off in the actual search results? We've had no message through webmaster tools. The same seems to have happened to our competitors.
Technical SEO | | onlinechester0 -
Getting images indexed in the SERPS
Good Afternoon form 13 degrees C totally Sunny Wetherby UK 🙂 Am i right in thinking that the only way to get images appearing like this in your serps: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/innovia-merchant-immages-serpscopy.jpg is to be hooked up to Google Merchant? Which kind of means if the sight your working on has no images then this type of enhancement is out of bounds? Thanks in advance, David
Technical SEO | | Nightwing0