How can I best find out which URLs from large sitemaps aren't indexed?
-
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold.
However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages.
I can obviously manually start hitting it, but surely there's a better way?
-
There's no obvious function in WM tools, but having a look round there's this option:
http://www.aspfree.com/c/a/BrainDump/Extracting-Google-Indexed-Web-Site-Pages-Using-MS-Excel/
But Google will only display the first 1000 URLs on a site query so you would need to adapt it lots of times. From the looks of it there's not an easy way.
There's maybe a tool out there that is similar to Xenu, but checks the index status in Google also. I haven't ever had the need for this so I'm not aware of one, but the chances are there is something out there.
Good luck!
-
Any ideas on how to go about exporting indexed urls?
-
Hi Peter,
I'd attempt some sort of export of both indexed URLs and actual URLs into an Excel file and try and remove duplicates.
You would need to look into it but I'm sure there's a way of matching and removing duplicates.
Other than that I wouldn't know.
Ben
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Another company's website indexing for my site
Hi, I am looking at all the pages which Google are indexing for my website and have come across pages of another company's website. I have contacted them through their online form and Facebook page asking for them to remove their listings for us, but to no avail so far. Is there a way I can do this myself?
Technical SEO | | British-Car-Registrations0 -
I hope someone can help me with page indexing problem
I have a problem with all video pages on www.tadibrothers.com.
Technical SEO | | TadiBrothers
I can not understand why google do not index all the video pages?
I never blocked them with the robots.txt file, there are no noindex/nofollow tags on the pages. The only video page that I found in search results is the main video category page: https://www.tadibrothers.com/videos and 1 video page out of 150 videos: https://www.tadibrothers.com/video/front-side-rear-view-cameras-for-backup-camera-systems I hope someone can point me to the right way0 -
URL / sitemap structure for support pages
I am creating a site that has four categories housed in folders off of the TLD. Example: example.com/category-1
Technical SEO | | InterCall
example.com/category-2
example.com/category-3
example.com/category-4 Those category folders contain sub-folders that house the products inside each category. Example: example.com/category-1/product-1
example.com/category-2/product-1
etc. Each of the products have a corresponding support page with technical information, FAQs, etc. I have three options as to how to structure the support pages' URLs. Option 1 - Add new sub-folder with "support" added to string: example.com/category-1/product-1-support Option 2 - Add a second sub-folder off of the product sub-folder for support: example.com/category-1/product-1/support Option 3 - Create a "support" folder with product sub-folders: example.com/support/product-1 Which of these three options would you choose? I don't like having one large /support folder that houses all products. It seems like this would create a strange crawling and UX situation. The sitemap would have a huge /support folder with all of my products in it and the keywords in my category folders would be replaced with the word "support." Because I would rather have the main product pages ranking over any of the support pages (outside of searches containing the word "support"), I am leaning toward Option 2: example.com/category-1/product-1/support. I think this structure indicates to crawlers that the more important page is the product page, while the support page is secondary to that. It also makes it clear to users that this is the support page for that particular product. Does anyone have any experience or perspective on this? I'm open to suggestions and if I'm overthinking it, tell me that too. Thanks, team.0 -
Shortening URL's
Hello again Mozzers, I am debating what could be a fairly drastic change to the company website and I would appreciate your thoughts. The URL structure is currently as follows Product Pages
Technical SEO | | ATP
www.url.co.uk/product.html Category Pages
www.url.co.uk/products/category/subcategory.html I am debating removing the /products/ section as i feel it doesn't really add much and lengthens the url with a pointless word. This does mean however redirecting about 50-60 pages on the website, is this worth it? Would it do more damage than good? Am i just being a bit OCD and it wont really have an impact? As always, thanks for the input0 -
Merging sites, ensuring traffic doesn't die
Wondering if I could get a second opinion on this, please. I have just taken on a new client, they own about 6 different niched car experience websites (hire an Aston Martin for the day, type thing). All the six sites they have seem to perform reasonably well for the brand of car they deal with, the average DA of the sites is about 24. The client wishes to move all of these different manufacturers into one site and have sections of the site, they can then also target more generic experience day type keywords. The obvious way of dealing with this move would be to 301 the old sites to the relevant places on the new site and wait for that to rank. However, looking at the backlinks profile of the niched sites, they seem to have very few backlinks and i feel the reason they are ranking so well for all the individual manufacturers is because they all feature the name in the domain. Not exact match, but the name is there. If I am thinking right, with the 301 we want to tell Google page x is now page y, index this one instead. Because the new site has a more generic name I don't think it will enjoy any of the domain keyword benefits which are helping the sub sites, and as a result I expect the rankings and traffic to drop (at least in the short term). Am I reading this correct. Would people use a 301 in this case? The easiest thing to do would be to leave the 6 sub sites up and running on their own domain and launch the new site to run alongside them, however the client doesn't want this. Thanks, Carl
Technical SEO | | GrumpyCarl0 -
Should I change by URL's
I started with a static website and then moved to Wordpress. At the time I had a few hundred pages and wanted to keep the same URL structure so I use a plugin that adds .html to every page. Should I change the structure to a more common URL structure and do 301 directs from the .html page to the regular page?
Technical SEO | | JillB20130 -
Canonical tags pointing at old URLs that have been 301'd
I have a site which has various white label sites with the same content on each. I have canonical tags on the white label sites pointing to the main site. I have changed some URLs on the main site and 301'd the previous URL to the new ones. Is it ok to have the canonicals pointing to the old URLs that now have a 301 redirect on them.
Technical SEO | | BeattieGroup0 -
What is the best way to find stranded pages?
I have a client that has a site that has had a number of people in charge of it. All of these people have very different opinions about what should be on the site itself. When I look at their website on the server I see pages that do not have any obvious navigation to them. What is the best way to find out the internal linking structure of a site and see if these pages truly are stranded?
Technical SEO | | anjonr0