How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page ranked then disappeared
Recently there have been a a couple of pages form my website that ranked well, in top 5 for a couple of days then they disappear suddenly, they are not at all seen in google search results no matter how narrow I search for them. I checked my search console, there seems to be no issues with the page, but when I check google analytics, I do not get any data from that page since the day it disappeared, and it does not even show up on the 'active pages' section no matter I keep the url open in multiple computers.
Technical SEO | | JoelssonMedia
Has anyone else faced this issue? is there a solution to it?0 -
Our protected pages 302 redirect to a login page if not a member. Is that a problem for SEO?
We have a membership site that has links out in our unprotected pages. If a non-member clicks on these links it sends a 302 redirect to the login / join page. Is this an issue for SEO? Thanks!
Technical SEO | | rimix1 -
Page Juice not moving???
Moved URL's from ldnwicklesscandles.com to ldnwicklesscandles.co.uk because I wanted to rank better for UK where I'm located and thought also the .co.uk for my competitors may have been giving them the advantage. Use Squarespace 7 (transferred over from SS5)----they told me to set primary domain to .co.uk and I've done it. I've also done a 301 redirect and done a change of address in webmaster tools although I'm not sure if all of this is needed? Squarespace seem to think just setting the primary domain is enough. My question is its been a couple of weeks, I've resubmited to Google webmaster to try to speed things up, the new URL is appearing in Google but none of my Page Juice seems to be transferring yet? How long will it take? I know not all the juice will move over but my PA/DA is non existent now and I have no idea if I'm just being impatient or I've done something wrong here. Not a Pro, Just a small biz owner here so forgive me if this has been asked before.
Technical SEO | | ldnwickless0 -
Issue: Duplicate Page Content > Wordpress Comments Page
Hello Moz Community, I've create a campaign in Moz and received hundreds of errors, regarding "Duplicate Page Content". After some review, I've found that 99% of the errors in the "Duplicate Page Content" report are occurring due to Wordpress creating a new comment page (with the original post detail), if a comment is made on a blog post. The post comment can be displayed on the original blog post, but also viewable on a second URL, created by Wordpress. http://www.Example.com/example-post http://www.Example.com/example-post/comment-page-1 Anyone else experience this issue in Wordpress or this same type of report in Moz? Thanks for your help!
Technical SEO | | DomainUltra0 -
Too Many Page Links
I have 8 niche websites for golf clubs. This was done to carve out tight niches for specific types of clubs then only broadens each club by type - i.e. better player, game improvement, max game improvement. So far, for fairly young sites, <1 year, they are doing fairly well as I build content. Running campaigns has alerted me to one problem - too many on-page links. And because I use Wordpress those links are on each page in the right sidebar and lead to the other sites. Even though visitors arrive via organic search in most cases they tend to eventually exit to one of the other sites or they click on a product (Ebay) and venture off to hopefully make a purchase. Ex: Drivers site will have a picture link for each of the other 7 sites. Question: If I have one stie (like a splash page) used as one link to that page listing all the sites with a brief explanation of each site will this cause visitors to bounce off because they will have one click, than the list and other clicks depending on what other club/site they would like to go to. The links all open in new windows. This would cut down on the number of links per page of each site but will it cause too much work for visitors and cause them to leave?
Technical SEO | | NicheGuy0 -
Duplicate page issue in website
i found duplicate pages in my website. seomoz is showing duplicate web pages this is issue or not please tell me?
Technical SEO | | learningall0 -
According to 1 of my PRO campaigns - I have 250+ pages with Duplicate Content - Could my empty 'tag' pages be to blame?
Like I said, my one of my moz reports is showing 250+ pages with duplicate content. should I just delete the tag pages? Is that worth my time? how do I alert SEOmoz that the changes have been made, so that they show up in my next report?
Technical SEO | | TylerAbernethy0 -
Should i Change On Page Optimization ?
Hi, PC monitoring and computer monitoring software are our targeted keywords. Around 5 weeks ago, We created a page for pc monitoring software (home/pc-monitoring-software) and did some bookmarking and guest posts targeting PC monitoring software keyword. Now we are in Top 15 on Google for PC monitoring software keyword . Initially we were thinking to change content of around 2 year old home page to adjust computer monitoring software keyword and do SEO for this keyword. But few days ago, we noticed that our pc-monitoring-software page is already ranking in early fourties for computer monitoring software keyword as well. May be Google is giving advatage of being synonym of PC . Now we are thinking that we should optimize the PC monitoring software page for both computer and PC software keywords like adding "computer monitoring software" in addition to existing "pc monitoring software" in title and similalrly do other on page related work for Computer Mnitoring Software. We are also thinking of doing 301 redirect of existing pc-monitoring-software page to new computer-monitoring-software page which will be optimized both for PC and Computer. Please suggest me if it will help to get good ranking for both PC and Computer Monitoring software if we make above mentioned changes or we should not change the existing pc-monitoring-software page and shall stick to earlier plan of changing the home page to adjust for computer monitoring software.? I'm new to SEO, so want to make wise decision with your help instead of learning with failures. Thanks, shahzad
Technical SEO | | shaz_lhr0