How can I get a listing of just the URLs that are indexed in Google
-
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
-
This question still remains unanswered, why did it get marked answered?
-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
If you import the TSV into Excel you will get a column of just the URLS
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs
-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
Your welcome. If that fully answered your question please mark it as answered.
-
Thanks, that let me grab the first 1000.
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
You may want to do that from an xml sitemap. You can find sites out there that will build a sitemap for you for free and then just open it in excel and you should have all of your urls in a list. NOw that doesn't answer your question of just the urls in google, but you will get all of the ones in google and then some if you do it the way suggested. Better overkill than underkill. Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site indexed by Google, but (almost) never gets impressions
Hi there, I have a question that I wasn't able to give it a reasonable answer yet, so I'm going to trust on all of you. Basically a site has all its pages indexed by Google (I verified with site:sitename.com) and it also has great and unique content. All on-page grades are A with absolutely no negative factors at all. However its pages do not get impressions almost at all. Of course I didn't expect it to be on page 1 since it has been launched on Dec, 1st, but it looks like Google is ignoring (or giving it bad scores) for some reason. Only things that can contribute to that could be: domain privacy on the domain, redirect from the www to the subdomain we use (we did this because it will be a multi-language site, so we'll assign to each country a subdomain), recency (it has been put online on Dec 1st and the domain is just a couple of months old). Or maybe because we blocked crawlers for a few days before the launch? Exactly a few days before Dec 1st. What do you think? What could be the reason for that? Thanks guys!
Technical SEO | | ruggero0 -
Anything new if determining how many of a sites pages are in Google's supplemental index vs the main index?
Since site:mysite.com *** -sljktf stopped working to find pages in the supplemental index several years ago has anyone found another way to identify content that has been regulated to the supplemental index?
Technical SEO | | SEMPassion0 -
Can I Get Penalized for 301 Redirects (Too Many or In Any Scenario)?
A client of ours owns several domain names that are keyword similar to the domain they actually use to run their site. They are asking us if we should 301 redirect all of these websites to the domain they use. However, I don't want this to work against them and their site get penalized later for this. I have heard buying out competitors and redirecting their domain to yours is frowned upon and penalized when you get caught (they did not do this). We are also wondering if there is a limit as to how many domains you can 301 redirect and what type (keyword similar, misspellings, .net's, etc.) and if you are penalized after too many (i.e. >50). All of the domains in question are keyword/brand name similar only and do not exist as actual websites. We just want to do the right thing. Thank you for your help.
Technical SEO | | JCunningham0 -
How can I see the SEO of a URL? I need to know the progress of a specific landing-page of my web. Not a keyword, an url please. Thanks.
I need to know the evolution on SEO of a specific landing-page (an URL) of my web. Not a keyword, a url. Thanks. (Necesito saber si es posible averiguar el progreso de una URL específica en el posicionamiento de Google. Es decir, lo que hace SEOmoz con las palabras clave pero al revés. Yo tengo una url concreta que quiero posicionar en las primeras posiciones de Google pero quiero ver cómo va progresando en función a los cambios que le voy aplicando. Muchas gracias)
Technical SEO | | online_admiral0 -
Title tag not changing in Google. Can somebody take a look for me?
I'm using Yoast SEO plugin for the website. The website is http://www.emerypharmaservices.com. It appears on the webpage, the title tag is correct (home page should be Contract Laboratory Research Services for Analytical Chemistry and Microbiology), however, in Google it only says Emeryville Pharmaceutical Services. Could this be due to my settings? Please advise. Thank you
Technical SEO | | leopold49520 -
Can Google Anlaytics Segment By Time of the DaY?
Greetings from Latitude 53.92705600 Longitude -1.38481600... Can Google analytics anser this question..."Tell me on the 1st Sept how many visitors landed on my site between 1200HRS & 1300HRS" Grazie Tanto,
Technical SEO | | Nightwing
David0 -
How can i get google adsense to work properly to earn income
Hi i am trying to get google adsense to work properly but i am not winning. What i am trying to do is, to get the adverts to reflect on the content. So for example this page here http://www.in2town.co.uk/news/mark-feehily/westlife-mark-feehily-announces-split-from-long-term-boyfriend I would like google adsense to have celebrity adverts such as celebrity news sites, celebrity fashion, concert tickets etc. I want the adverts to be related to celebrity but it is not happening. Can anyone please let me know how to do this and also if i have the google adsense in the right place as since rebuilding the site we have not earned anything with google adsense many thanks
Technical SEO | | ClaireH-1848860 -
Why is a 301 redirected url still getting indexed?
We recently fixed a redirect issue in a website, and although it appears that the redirection is working fine, the url in question keeps on getting crawled, indexed and cached by google. The redirect was done a month ago, and google shows cached version of it, even for a couple of days ago. Manual checking shows that its being redirected, and also a couple of online tools i checked report a 301 redirect. Do you have any idea why this could be happening? The website I'm talking about is www.hotelmajestic.gr and its being redirected to www.hotel-majestic.gr
Technical SEO | | dim_d0