How can I get a listing of just the URLs that are indexed in Google
-
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
-
This question still remains unanswered, why did it get marked answered?
-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
If you import the TSV into Excel you will get a column of just the URLS
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs
-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
Your welcome. If that fully answered your question please mark it as answered.
-
Thanks, that let me grab the first 1000.
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
You may want to do that from an xml sitemap. You can find sites out there that will build a sitemap for you for free and then just open it in excel and you should have all of your urls in a list. NOw that doesn't answer your question of just the urls in google, but you will get all of the ones in google and then some if you do it the way suggested. Better overkill than underkill. Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
Google Indexing Desktop & Mobile Versions
We have a relatively new site and I have noticed recently that Google seems to be indexing both the mobile and the desktop version of our site. There are some queries where the mobile version will show up and sometimes both mobile and desktop show up. This can't be good. I would imagine that what is supposed to happen is that the desktop version is the one that should be indexed (always) and browser detection will load the mobile version where appropriate once the user is on the site. Do you have any advice on what we should do to solve this problem as we are a bit stuck?
Technical SEO | | simonukss0 -
What should i do to index images in google webmaster?
My website onlineplants.com.au. It's a shopping cart website. I do have nearly 1200 images but none of the images are indexed in google webmaster? what should i do. Thanks
Technical SEO | | Verve-Innovation1 -
How do I get my pages to go from "Submitted" to "Indexed" in Google Webmaster Tools?
Background: I recently launched a new site and it's performing much better than the old site in terms of bounce rate, page view, pages per session, session duration, and conversions. As suspected, sessions, users, and % new sessions are all down. Which I'm okay with because the the old site had a lot of low quality traffic going to it. The traffic we have now is much more engaged and targeted. Lastly, the site was built using Squarespace and was launched the middle of August. **Question: **When reviewing Google Webmaster Tools' Sitemaps section, I noticed it says 57 web pages Submitted, but only 5 Indexed! The sitemap that's submitted seems to be all there. I'm not sure if this is a Squarespace thing or what. Anyone have any ideas? Thanks!!
Technical SEO | | Nate_D0 -
How GOOGLE can re-index my site as possible as?
I have facing the question about re-indexing in the google search engine, the case is: i have changed my site meta description but google indexed display part description why?? my site is http://www.green-lotus-trekking.com/everest-base-camp-trek/ whats the problem in meta tag description? Please let me know about this?
Technical SEO | | agsln0 -
Can you be penalised in Google for excessive internal keyword linking?
I have an online shop and 3 blogs (with different topics) all set up on sub-domains (for security reasons, don't want Word Press installed in the same hosting space as my shop in case one gets hacked). I have been on the front page of Google for a keyword, lets say 'widgets' for months now. I have been writing blogs about 'widgets', probably about 1/4 of all my blog posts are linking to the 'widgets' page in my shop. I write maybe 1-2 blogs a week, so it's not excessive. This morning I have woken to fine that the widgets page in my shop has vanished from Google's index. So typing in 'widgets' brings up nothing. It hasn't dropped in the rankings, it's just vanished. A few weeks ago I ranked 3 or 4. Then I dropped to about 6. A couple of days ago, i jumped back up to 5 and now it's vanished. If you type in 'buy widgets', or 'widgets online' or 'widgets australia', I have the #1 spot for all those, but for 'widgets', I just don't exist anymore. Could I have been penalised for writing too many posts and keyword linking internally? They're not keyword stuffed and they're well written. I just don't understand what's happened. Right now I"m freaking out about blogging and putting internal links on my website.
Technical SEO | | sparrowdog0 -
Can you 404 any forms of URL?
Hi seomozzers, <colgroup><col width="548"></colgroup>
Technical SEO | | Ideas-Money-Art
| http://ex.com/user/login?destination=comment%2Freply%2F256%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F258%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F242%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F257%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F260%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F225%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F251%23comment-form |
| http://ex.com/user/login?destination=comment%2Freply%2F176%23comment-form | These are duplicate content and the canonical version is: http://www.ex.com/user (login and pass page of the website) Since there were multiple other duplicates which mostly have been resolved by 301s, I figured that all "LOGIN" URLs (above) should be 404d since they don't carry any authority and 301 those wouldn't be the best solution since "too many 301s" can slow down the website speed. But a member of the dev team said: "Looks like all the urls requested to '404 redirect' are actually the same page http://ex.com/user/login. The only part of the url that changes is the variables after the "?" . I don't think you can (or highly not recommended) make 404 pages display for variables in a url. " So my question is: I am not sure what he means by that? and Is it really better to not 404 these? Thanks0 -
Why is this url showing as "not crawled" on opensiteexplorer, but still showing up in Google's index?
The below url is showing up as "not crawled" on opensitexplorer.com, but when you google the title tag "Joel Roberts, Our Family Doctors - Doctor in Clearwater, FL" it is showing up in the Google index. Can you explain why this is happening? Thank you http://doctor.webmd.com/physician_finder/profile.aspx?sponsor=core&pid=14ef09dd-e216-4369-99d3-460aa3c4f1ce
Technical SEO | | nicole.healthline0