How can I get a listing of just the URLs that are indexed in Google
-
I know I can use the site: query to see all the pages I have indexed in Google, but I need a listing of just the URLs. We are doing a site re-platform and I want to make sure every URL in Google has a 301. Is there an easy way to just see the URLs that Google has indexed for a domain?
-
This question still remains unanswered, why did it get marked answered?
-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
If you import the TSV into Excel you will get a column of just the URLS
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs
-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
Your welcome. If that fully answered your question please mark it as answered.
-
Thanks, that let me grab the first 1000.
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
You may want to do that from an xml sitemap. You can find sites out there that will build a sitemap for you for free and then just open it in excel and you should have all of your urls in a list. NOw that doesn't answer your question of just the urls in google, but you will get all of the ones in google and then some if you do it the way suggested. Better overkill than underkill. Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can adding thousands of new indexable URLs to my site at once be a problem?
Hi everyone, I am currently working on a project that will quickly add thousands of new indexable URLs to my site. For context, the site currently has over a million indexable pages. Is there any danger of adding a few thousand URLs at once to the site? Could it potentially affect crawlability/SEO/other pages? Thank you!
Technical SEO | | StevenLevine0 -
Old url is still indexed
A couple of months ago we requested a change of address in Search console. The new, correct url is already indexed. Yet when we search the old url (with site:www.) we find that the old url is still indexed. in Google Webmaster Tools the amount of indexed pages is reduced to 1. Is there another way to remove old urls?
Technical SEO | | conversal0 -
Removing Personal content from Google Index
Hi everyone, A user is complaining that her name is appearing in google search through our job ads site, so I removed such ads through Search Console, but the problem is not the ads anymore but our internal search results. The ads are no longer live but our searches has been indexed by google back then, We have been manually taking over 500 pages that included such name but more and more keep coming through pagination, we haven't found a pattern yet so pretty much any search result might have contained such name. We might get some legal issues here, did you guys got into anything similar before? We have just set some rules so that this doesn't happen again, but still can't find a way to deal with this one. Thanks in advance. PD: Not sure if this is the right category to fit it.
Technical SEO | | JoaoCJ0 -
How can we get google to know we are a lifestyle magazine
hi, i have a website which is a lifestyle magazine www.in2town.co.uk before we had to build out site from scratch due to a major error from our hosting company we ranked number one for the word lifestyle magazine but since then we have always ranked number five. This week that has dropped to number seven and i would like to know, what do we need to do to let google know we are a lifestyle magazine. I am not sure what we need to do. i know we should have important keywords on the home page but i am not sure where to put these without making the site look silly. can anyone let me know if we should have an introduction and if so where shall we put this. we would really like to increase our page rank for lifestyle magazine but are struggling to know where to put text that will not make the site look silly. Also, can anyone let me know what other keywords we should be aiming for please; We are currently making major changes to our site to make it better.
Technical SEO | | ClaireH-1848860 -
Google News URL Format
Hi, We are currently redesigning our gaming website (www.totallygn.com) and one of our main goals is to get listed by Google News in future. Looking at the Google News URL requirements "The URL for each article must contain a unique number consisting of at least three digits." How does the above affect SEO structure? I was planning on using a format such as www.totallygn.com/xbox-360/360-reviews/fifa-12-review how would this compare to something like? www.totallygn.com/xbox-360/360-reviews/fifa-12-review234 Thanks in advance for your help
Technical SEO | | WalesDragon0 -
What tools produce a complete list of all URLs for 301 redirects?
I am project managing the rebuild of a major corporate website and need to set up 301 redirects from the old pages to the new ones. The problem is that the old site sits on multiple CMS platforms so there is no way I can get a list of pages from the old CMS. Is there a good tool out there that will crawl through all the sites and produce a nice spreadsheet with all the URLs on it? Somebody mentioned Xenu but I have never used it. Any recommendations? Thanks -Adrian
Technical SEO | | Adrian_Kingwell0 -
Can JavaScrip affect Google's index/ranking?
We have changed our website template about a month ago and since then we experienced a huge drop in rankings, especially with our home page. We kept the same url structure on entire website, pretty much the same content and the same on-page seo. We kind of knew we will have a rank drop but not that huge. We used to rank with the homepage on the top of the second page, and now we lost about 20-25 positions. What we changed is that we made a new homepage structure, more user-friendly and with much more organized information, we also have a slider presenting our main services. 80% of our content on the homepage is included inside the slideshow and 3 tabs, but all these elements are JavaScript. The content is unique and is seo optimized but when I am disabling the JavaScript, it becomes completely unavailable. Could this be the reason for the huge rank drop? I used the Webmaster Tolls' Fetch as Googlebot tool and it looks like Google reads perfectly what's inside the JavaScrip slideshow so I did not worried until now when I found this on SEOMoz: "Try to avoid ... using javascript ... since the search engines will ... not indexed them ... " One more weird thing is that although we have no duplicate content and the entire website has been cached, for a few pages (including the homepage), the picture snipet is from the old website. All main urls are the same, we removed some old ones that we don't need anymore, so we kept all the inbound links. The 301 redirects are properly set. But still, we have a huge rank drop. Also, (not sure if this important or not), the robots.txt file is disallowing some folders like: images, modules, templates... (Joomla components). We still have some html errors and warnings but way less than we had with the old website. Any advice would be much appreciated, thank you!
Technical SEO | | echo10 -
Does Google index XML files?
Does Google or other search engines include XML files in their index? More specifically, I am wondering how Google knows the difference between an xml filetype and an RSS feed.
Technical SEO | | nicole.healthline0