How can I best find out which URLs from large sitemaps aren't indexed?
-
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold.
However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages.
I can obviously manually start hitting it, but surely there's a better way?
-
There's no obvious function in WM tools, but having a look round there's this option:
http://www.aspfree.com/c/a/BrainDump/Extracting-Google-Indexed-Web-Site-Pages-Using-MS-Excel/
But Google will only display the first 1000 URLs on a site query so you would need to adapt it lots of times. From the looks of it there's not an easy way.
There's maybe a tool out there that is similar to Xenu, but checks the index status in Google also. I haven't ever had the need for this so I'm not aware of one, but the chances are there is something out there.
Good luck!
-
Any ideas on how to go about exporting indexed urls?
-
Hi Peter,
I'd attempt some sort of export of both indexed URLs and actual URLs into an Excel file and try and remove duplicates.
You would need to look into it but I'm sure there's a way of matching and removing duplicates.
Other than that I wouldn't know.
Ben
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do URLs with canonical tags get indexed by Google?
Hi, we re-branded and launched a new website in February 2016. In June we saw a steep drop in the number of URLs indexed, and there have continued to be smaller dips since. We started an account with Moz and found several thousand high priority crawl errors for duplicate pages and have since fixed those with canonical tags. However, we are still seeing the number of URLs indexed drop. Do URLs with canonical tags get indexed by Google? I can't seem to find a definitive answer on this. A good portion of our URLs have canonical tags because they are just events with different dates, but otherwise the content of the page is the same.
Technical SEO | | zasite0 -
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
Site went down and traffic hasn't recovered
Very curious situation. We have a network of sites. Sunday night one (only one) of our sites goes down, and since then we've seen a loss in traffic across all our sites!! Not only have we seen a loss of traffic, we also saw a loss of indexed pages. A complete drop off from 1.8 million to 1.3 million pages indexed. Does anyone know why one site outtage would affect the rest of them? And the indexed pages? Very confused. Thanks,
Technical SEO | | TMI.com0 -
Can't get Google to Index .pdf in wp-content folder
We created an indepth case study/survey for a legal client and can't get Google to crawl the PDF which is hosted on Wordpress in the wp-content folder. It is linked to heavily from nearly all pages of the site by a global sidebar. Am I missing something obvious as to why Google won't crawl this PDF? We can't get much value from it unless it gets indexed. Any help is greatly appreciated. Thanks! Here is the PDF itself:
Technical SEO | | inboundauthority
http://www.billbonebikelaw.com/wp-content/uploads/2013/11/Whitepaper-Drivers-vs-cyclists-Floridas-Struggle-to-share-the-road.pdf Here is the page it is linked from:
http://www.billbonebikelaw.com/resources/drivers-vs-cyclists-study/0 -
Duplicate Content issue in Magento: The product pages are available true 3 URL's! How can we solve this?
Right now the product page "gedroogde goji bessen" (Dutch for: dried goji berries) is available true 3 URL's! **http://www.sportvoeding.net/gedroogde-goji-bessen ** =>
Technical SEO | | Zanox
By clicking on the product slider on the homepage
http://www.sportvoeding.net/superfood/gedroogde-goji-bessen =>
First go to sportvoeding.net/superfood (main categorie) and than clicking on "gedroogde Goji bessen"
http://www.sportvoeding.net/superfood/goji-bessen/gedroogde-goji-bessen =>
When directly go to the subcategorie "Goji Bessen" true the menu and there clicking on "gedroogde Goji Bessen" We want to have the following product URL:
http://www.sportvoeding.net/superfood/goji-bessen/gedroogde-goji-bessen Does someone know´s a good Exetension for this issue?0 -
My webmaster doesn't shows backlinks data why???
Hi every one I have 2 websites with same domain with targeting different country name for example www.domain.com.au and the other is wwww.domain.co.nz . So my question is that in webmaster tool my one domain doesn't shows back links data or search query data. why this is happen?
Technical SEO | | SanketPatel0 -
Best way to setup large site for multi language
Hello, I am setting up a new site which is going to be very large, over 250,000 products. Most of our customers are in the UK (45%), the rest are from various European countries and the USA. Unfortunately we only have a team of two people writing content for these pages in English. I would value some input on the best way to setup my website structure for ranking. Obviously the best would be individual country oriented domains I.e. domain.fr domain.de domain.co.uk . However we wouldnt have the time to create content for every page and most pages would contain the same content as the English domain. Would I get a penalty for this from google? The second choice is to follow the example of overstock.com and pull in information relating to each country I.e. currency and delivery time. this would be a lot easier but I am concerned that the lack of geo focus would effect my rankings. Does any one have any ideas?
Technical SEO | | DavidLenehan0 -
How to handle URL's from removed products?
Hi All, I have a question about a fashion related webshop. Every month about 100 articles are removed and about the some amouth is added to the site. Most of the products are indexed on brandname and type (e.g. MyBrand t-shirt blue) My question is what to do with the URL / page after the product is removed. I'm thinking about a couple of solutions: 301 the page to the brand categorie page build a script which shows related articles on the old URL (and try to keep it indexed) 404 page optimized for search term with links to brand category any other suggestons? Thanks in advance, Sam
Technical SEO | | U-Digital0