What tool do you use to check for URLs not indexed?
-
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing.
Thanks,
Mark
-
I've had good results using Google Search Console for checking which URLs are indexed. It's pretty straightforward and gives a clear overview of any indexing issues halloweensquishmallows.
-
-
I can work on building this tool if there's enough interest.
-
I generally just use Xenu's hyperlink sleuth (if you thousands of pages) to listing out all the URLs you have got and I might then manually take a look at them, however, see the guitar in demand I have not come upon an automatic device yet. If all people are aware of any, I'd like to recognize as properly.
-
This post from Distilled mentions that SEO for Excel plugin has a "Indexation Checker":
https://www.distilled.net/blog/seo/awesome-examples-of-how-to-use-seotools-for-excel/Alas, after downloading and installing, it appears this feature was removed...
-
Unless I'm missing something, there doesn't seem to be a way to get Google to show more than 100 results on a page. Our site has about 8,000 pages, and I don't relish the idea of manually exporting 80 SERPs.
-
Annie Cushing from Seer Interactive made an awesome list of all the must have tools for SEO.
You can get it from her link which is http://bit.ly/tools-galore
In the list there is a tool called scrapebox which is great for this. In fact there are many uses for the software, it is also useful for sourcing potential link partners.
-
I would suggest using the Website Auditor from Advanced Web Ranking. It can parse 10.000 pages and it will tell you a lot more info than just if it's indexed by Google or not.
-
hmm...I thought there was a way to pull those SERPs urls into Google docs using a function of some sort?
-
I think you need not any tool for this, you can directly go to google.com and search: Site:www.YourWebsiteNem.com Site:www.YourWebsiteName.com/directory I think this will be the best option to check if your website is crwled by google or not.
-
I do something similar but use Advanced Web Ranking, use site:www.domain.com as your phrase, run it to retrieve 1000 results and generate a Top Site Report in Excel to get the indexed list.
Also remember that you can do it on sub-directories (or partial URL paths) as a way to get more than 1000 pages from the site. In general I run it once with site:www.domain.com, then identify the most frequent sub-directories, and add those as additional phrases to the project and run a second time, i.e.: site:www.domain.com site:www.domain.com/dir1 site:www.domain.com/dir2 etc.
Still not definitive, but think it does give indication of where value is.
-
David Kauzlaric has in my opinion the best answer. If google hasn't indexed it and you've investigated your Google webmaster account, then there isn't anything better out there as far as I'm concerned. It's by far the simplest, quickest and easiest way to identify a serp result.
re: David Kauzlaric
We built an internal tool to do it for us, but basically you can do this manually.
Go to google, type in "site:YOURURLHERE" without the quotes. You can check a certain page, a site, a subdomain, etc... of course if you have thousands of URLs this method is not ideal, but it can be done.
Cheers!
-
I concur, Xenu is an extremely valuable tool for me that I use daily. Also, once you get a list of all the URLs on your site, you can compare the two lists in excel (two lists being the Xenu page list for your site and the list of pages that have been indexed by Google).
-
Nice solution Kieran!
I use the same method, to compare URL list from Screaming Frog output with URL Found column from my Keyword Ranking tool - of course it doesn't catch all pages that might be indexed.
The intention is not really to get a complete list, more to "draught" out pages that need work.
-
I agree, this is not automated but so far, from what we know, looks like a nice and clean option. Thanks.
-
Saw this and tried the following which isn't automated but is one way of doing it.
- First install SEO Quake plugin
- Go to Google
- Turn off Google Instant (http://www.google.com/preferences)
- Go to Advanced search set the number of results you want displayed (estimate the number of pages on your site)
- Then run your site:www.example.com search query
- Export this to CSV
- Import to Excel
- Once then do a Data to columns conversion using ; as a delimiter (this is the CSV delimiter)
- This gives you a formatted list.
- Then import your sitemap.xml into another TAB in Excel
- Run a vlookup between the URL tabs to flag which are on sitemap or vice versa.
Not exactly automated but does the job.
-
Curious about this question also, it would be very useful to see a master list of all URLs on our site that are not indexed by Google so that we can take action to see what aspects of the page are lacking and what we need for it to get indexed.
-
I usually just use Xenu's link sleuth (if you thousands of pages) to list out all the URLs you have and I would then manually check them, but I haven't come across an automated tool yet. If anyone knows any, I'd love to know as well.
-
Manual is a no go for large sites. If someone knows a tool like this, it woul be cool to know which/ where to find. Or..... This would make a cool SEOmoz pro tool
-
My bad - you are right that it doesn't display the actual URLs. So I guess the best thing you can do is site:examplesite.com and see what comes up.
-
That will tell you the number indexed, but it still doesn't tell you which of those URLs are or are not indexed. I think we all wish it would!
-
I would use Google Webmaster Tools as you can see how many URLs are indexed based on your sitemap. Once you have that, you can compare it to your total list. The same can be done with Bing.
-
Yeah I do it manually now so was looking for something more efficient.
-
We built an internal tool to do it for us, but basically you can do this manually.
Go to google, type in "site:YOURURLHERE" without the quotes. You can check a certain page, a site, a subdomain, etc... of course if you have thousands of URLs this method is not ideal, but it can be done.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using # in parameters?
I am trying to understand why a website would use # instead of a ? for its parameters? I have put an example of the URL below: http://www.warehousestationery.co.nz/office-supplies/adhesives-tapes-and-fastenings#prefn1=brand&prefn2=colour&prefv1=Command&prefv2=Clear Any help would be much appreciated.
Technical SEO | | CaitlinDW1 -
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
How do I deindex url parameters
Google indexed a bunch of our URL parameters. I'm worried about duplicate content. I used the URL parameter tool in webmaster to set it so future parameters don't get indexed. What can I do to remove the ones that have already been indexed? For example, Site.com/products and site.com/products?campaign=email have both been indexed as separate pages even though they are the same page. If I use a no index I'm worried about de indexing the product page. What can I do to just deindexed the URL parameter version? Thank you!
Technical SEO | | BT20090 -
Should I use canonical?
I'm working on a site that sells audio tracks, the site is a Wordpress build. I've got Yoast and XML Sitemaps running for SEO. The site has been developed (not by myself) to use a flash based audio player. Now this player offers the ability to share, sell products etc... The player has been placed on the homepage and the main music catalog page. The main catalog page has had a custom page type created for itself. This page has been created in such a way that if you visit the actual page from dashboard > Pages and add content then no content will appear on the page. Even the page header is pulled from the PHP. So really as far as I am aware no real content is being seen on the page by a search engine. Except the content on the side bars (it has 2 sidebars on either side of the page.) The homepage has an introductory paragraph and header which are editable via the normal method in Wordpress. A custom post type has been created specifically for music items. When a music item is uploaded it is added to the music item feed on the homepage and music catalog pages. It also creates a separate post for the item itself. These items at the moment also have 'no content' as they are only sidebars with a flash music player. I've started to add short paragraphs and headers to them so there is content on the music item posts. I cannot however, in the time frame/budget start entering deeply descriptive content about each item. (I considered adding the intro paragraph from the homepage and using a canonical tag to the homepage on every music item). So here is my question. What do I do with these music items? Do I use canonical and point them toward the music catalog or the homepage? If so which one? I want the homepage or music catalog page to rank well and I am concerned that search engines aren't going to see these most vital parts of the site. I don't think individual items ranking is helpful, so what do i do?!?! The home and catalog pages are the two main pages of the site. I am going to advise a new player, page type etc... be utilised but at the moment I need a solution quickly. Any help will be much appreciated.
Technical SEO | | benyamin0 -
Canonical URL
I previously set the canonical Url in google web masters to the non www version, when I check my on page opt, it tells me that I have a critical issue with this. Should I change it in google web masters back to the www version? if so is there the possibility of negative results? Or is there a better way to deal with this? Note, I have inbound links pointing to both types.
Technical SEO | | bronxpad0 -
How to keep a URL social equity during a URL structure/name change?
We are in the process of making significant URL name/structure change to one of our property and we want to keep the social equity (likes, share, +1, tweets) from the old to the new URL. We have been trying many different option without success. We are running our social "button" in an iframe. Thanks
Technical SEO | | OlivierChateau0 -
301 an old URL with a ? in the URL?
I am redoing a site and the URL's are changing structure. The client's site was in magento and in the store they would get two URLs, for example: /store/categoryname/productname and /store/categoryname/productname?SID=dslkajsfdoiu947598whouieht983hg98 Do I have to 301 redirect both of these URL's to their new counterpart? Both go to the same content but magento seemed to add these SIDs into the navigation and Google has both versions in the index.
Technical SEO | | DanDeceuster0 -
Rel canonical with index follow on query string URLs
Hi guys, Quick question regarding the rel canonical tag. I have lots of links pointing at me with query strings and previously used some code to determine if query strings were in the URL and if they were then not to index that page. If there weren't query strings then the page would be indexed and followed. I assume I can now use the rel canonical tag on each of these pages so the value goes to the proper URL minus any query string. However do I need to have the rel canonical tag above the index, follow tag on the page? So URL is site.com/page.html?ref=ABC meta robots is "index, follow" Rel canonical is "site.com/page.html" Does the order of the meta robots and canonical tag matter? Thanks in advance!
Technical SEO | | panini0