How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages Fighting Over Keywords
Hi Guys, Just after some general advice. Since manipulation of keywords through links is no longer a feasible way of ranking these days, I was wondering how people got round the issue of pages bouncing for the same keyword or Google deciding that a blog post is a better signal rather than your service page. For instance if you are doing local and national search, how do you stop the local keywords ranking for national pages, without diluting the local signals. I have some ideas:- stronger internal linking to the page review content But obviously redirects or canonical won't be a good solution as I still want these pages to exist in their own right. Regards Neil
Technical SEO | | nezona0 -
Will redirecting a logged in user from a public page to an equivalent private page (not visible to google) impact SEO?
Hi, We have public pages that can obviously be visited by our registered members. When they visit these public pages + they are logged in to our site, we want to redirect them to the equivalent (richer) page on the private site e.g. a logged in user visiting /public/contentA will be redirected to /private/contentA Note: Our /public pages are indexed by Google whereas /private pages are excluded. a) will this affect our SEO? b) if not, is 302 the best http status code to use? Cheers
Technical SEO | | bernienabo0 -
Old Content Pages
Hello we run a large sports website. Since 2009 we have been doing game previews for most games every day for all the major sports..IE NFL, CFB, NBA, MLB etc.. Most of these previews generate traffic for 1-2 days leading up to or day of the event. After that there is minimal if any traffic and over the years almost nothing to the old previews. If you do a search for any of these each time the same matchup happens Google will update its rankings and filter out any old matchups/previews with new ones. So our question is what would you do with all this old content? Is it worth just keeping? Google Indexes a majority of it? Should we prune some of the old articles? The other option we thought of and its not really practical is to create event pages where we reuse a post each time the teams meet but if there was some sort of benefit we could do it.
Technical SEO | | dueces0 -
Rel Canonical for the Same Page
Hi, I was looking in my one of my moz accounts and under analyz page under notices is a message that says: Rel Canonical Using rel=canonical suggests to search engines which URL should be seen as canonical. I checked an notice that I do have a rel='canonical' href='http://www.example.com' /> from the home page of http://www.example.com. I guess my question is. Does having a Rel Canonical going to the same page hurt my SEO? I'm not sure why it is there but wanted to make sure I address this correctly. I was under the impression you use Rel Canonical for duplicate or similar pages and you want to let Google know what page to show. But since I've made this mistake to where I am saying to show the home page if you find a similar home page, should I just delete the Rel Canonical. Thanks,
Technical SEO | | ErrickG
Errick0 -
Off-page SEO and on-page SEO improvements
I would like to know what off-page SEO and on-page SEO improvements can be made to one of our client websites http://www.nd-center.com Best regards,
Technical SEO | | fkdpl2420 -
Unreachable Pages
Hi All Is there a tool to check a website if it has stand alone unreachable pages? Thanks for helping
Technical SEO | | Joseph-Green-SEO0 -
Can you 301 redirect a page to an already existing/old page ?
If you delete a page (say a sub department/category page on an ecommerce store) should you 301 redirect its url to the nearest equivalent page still on the site or just delete and forget about it ? Generally should you try and 301 redirect any old pages your deleting if you can find suitable page with similar content to redirect to. Wont G consider it weird if you say a page has moved permenantly to such and such an address if that page/address existed before ? I presume its fine since say in the scenario of consolidating departments on your store you want to redirect the department page your going to delete to the existing pages/department you are consolidating old departments products into ?
Technical SEO | | Dan-Lawrence0 -
SEOMoz Crawl Diagnostic indicates duplicate page content for home page?
My first SEOMoz Crawl Diagnostic report for my website indicates duplicate page content for my home page. It lists the home page URL Page Title and URL twice. How do I go about diagnosing this? Is the problem related to the following code that is in my .htaccess file? (The purpose of the code was to redirect any non "www" backlink referrals to the "www" version of the domain.) RewriteCond %{HTTP_HOST} ^whatever.com [NC]
Technical SEO | | Linesides
RewriteRule ^(.*)$ http://www.whatever.com/$1 [L,R=301] Should I get rid of the "http" reference in the second line? Related to this is a notice in the "Crawl Notices Found" -- "301 Permanent redirect" which shows my home page title as "http://whatever.com" and shows the redirect address as http://http://www.whatever.com/ I'm guessing this problem is again related to the redirect code I'm using. Also... The report indicates duplicate content for those links that have different parameters added to the URL i.e. http://www.whatever.com?marker=Blah Blah&markerzoom=13 If I set up a canonical reference for the page, will this fix this? Thank you.0