Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
Should search pages be indexed?
Hey guys, I've always believed that search pages should be no-indexed but now I'm wondering if there is an argument to index them? Appreciate any thoughts!
Technical SEO | | RebekahVP0 -
How to find orphan pages
Hi all, I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site. However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too). Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract? Thanks!
Technical SEO | | KJH-HAC1 -
Keyword variations on a single page
I have done the research and have compiled a list of a little over 100 keywords that are highly connected to our industry. I have used the metrics to rank those keywords and have given the top 50 of them a ranking. My intention is to use them on my site and make sure that all of my pages have a keyword focus. In doing this, I am running into some challenges. Any insight would be helpful. 1. There are numerous keywords that have simple variations in them. I am trying to figure out if each variation needs it's own page. I have read articles (here on moz) that say that one page can rank for several keywords, and other articles that say that a simple variation can need it's own page. Not sure what to do here. Below is an example of what I mean. (examples: "my long tail keyword" , "my long tail" , "my long" , "long tail" , "long tail keyword" , "keyword long tail") 2. Will it help to create a page for each one of the 50 or even the full 100? I have the opportunity to use blogs and FAQ's to assist with content creation. 3. Since my brand ranks well and is obviously tied highly into my site, do I worry about including brand terms in my keyword focus or should I just focus on those search terms?
Technical SEO | | Smart_Start0 -
Blog Page Titles - Page 1, Page 2 etc.
Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks
Technical SEO | | O2C0 -
SEO value of InDesign pages?
Hi there, my company is exploring creating an online magazine built with Adobe's InDesign toolset. If we proceeded with this, could we make these pages "as spiderable" as normal html/css webpages? Or are we limited to them being less spiderable, or not at all spiderable?
Technical SEO | | TheaterMania1 -
Can you 301 redirect a page to an already existing/old page ?
If you delete a page (say a sub department/category page on an ecommerce store) should you 301 redirect its url to the nearest equivalent page still on the site or just delete and forget about it ? Generally should you try and 301 redirect any old pages your deleting if you can find suitable page with similar content to redirect to. Wont G consider it weird if you say a page has moved permenantly to such and such an address if that page/address existed before ? I presume its fine since say in the scenario of consolidating departments on your store you want to redirect the department page your going to delete to the existing pages/department you are consolidating old departments products into ?
Technical SEO | | Dan-Lawrence0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0