How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Does a no-indexed parent page impact its child pages?
If I have a page* in WordPress that is set as private and is no-indexed with Yoast, will that negatively affect the visibility of other pages that are set as children of that first page? *The context is that I want to organize some of the pages on a business's WordPress site into silos/directories. For example, if the business was a home remodeling company, it'd be convenient to keep all the pages about bathrooms, kitchens, additions, basements, etc. bundled together under a "services" parent page (/services/kitchens/, /services/bathrooms/, etc.). The thing is that the child pages will all be directly accessible from the menus, so there doesn't need to be anything on the parent /services/ page itself. Another such parent page/directory/category might be used to keep different photo gallery pages together (/galleries/kitchen-photos/, /galleries/bathroom-photos/, etc.). So again, would it be safe for pages like /services/kitchens/ and /galleries/addition-photos/ if the /services/ and /galleries/ pages (but not /galleries/* or anything like that) are no-indexed? Thanks!
Technical SEO | | BrianAlpert781 -
Different meta descriptions for same page
Hi Depending what terms I put into Google I am seeing a different meta description for exactly the same page. I have checked Umbraco CMS and everything seems in working order. Is there a reason this would be happening? Anyone else had trouble like this?
Technical SEO | | TheZenAgency0 -
Too Many Page Links
I have 8 niche websites for golf clubs. This was done to carve out tight niches for specific types of clubs then only broadens each club by type - i.e. better player, game improvement, max game improvement. So far, for fairly young sites, <1 year, they are doing fairly well as I build content. Running campaigns has alerted me to one problem - too many on-page links. And because I use Wordpress those links are on each page in the right sidebar and lead to the other sites. Even though visitors arrive via organic search in most cases they tend to eventually exit to one of the other sites or they click on a product (Ebay) and venture off to hopefully make a purchase. Ex: Drivers site will have a picture link for each of the other 7 sites. Question: If I have one stie (like a splash page) used as one link to that page listing all the sites with a brief explanation of each site will this cause visitors to bounce off because they will have one click, than the list and other clicks depending on what other club/site they would like to go to. The links all open in new windows. This would cut down on the number of links per page of each site but will it cause too much work for visitors and cause them to leave?
Technical SEO | | NicheGuy0 -
My website pages are not crawled, what to do?
Hi all. I have made some changes on the website so i like to crawled them by the search engines Google especially. I have made these changes around 2 weeks ago. I have submitted my website on good bookmarking websites. Also i used a tool available in Google webmasters "Fetch as Google", Resubmitted a sitemap.xml. Still my pages are not crawled your opinion please. Thanks
Technical SEO | | lucidsoftech0 -
How to determine which pages are not indexed
Is there a way to determine which pages of a website are not being indexed by the search engines? I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn't necessarily show which urls aren't being indexed.
Technical SEO | | priceseo1 -
3 URLS Being Created All For The Same Page
I use wordpress for my blog and for some reason it is creating triple urls for my pages. I am not sure it has always been like this or not. I just noticed it in the errors section of SEO Moz. http://www.kisswedding.com/blog/?gid=7&r=20 http://www.kisswedding.com/blog/ashley-and-daniels-rainy-day-diy-farm-wedding/?gid=7&r=20 http://www.kisswedding.com/blog/ashley-and-daniels-rainy-day-diy-farm-wedding/ It's all the exact same page. Is there something I can do in my settings to make this stop. I don't imagine this is good. Ya think....ha! I saw this is the SEO Moz error area for Missing Title Tags. Apparently the number has gone from 200 to 400 which is weird because I never gave my blog posts meta stuff and I haven't written 200 pages since SEO Moz's last crawl.
Technical SEO | | annasusmiles
Maybe I changed something on my blog settings without even knowing. I can't think for the life of me what that would be though. Thanks so much and I appreciate any help received. Edited to add: I added some plugins over the past week. Maybe it's one of these? Category Text Category SEO Meta Tags (just deactivated this one) PhotoSmash (also deactivated this one) Clicky for WordPress0 -
How do fix twin home pages
Search engine analysis is indicating that my site has twin home pages (www.mysite.com and http://mysite.com). The error message I'm getting is: "your website resides at both www.mysite.com and mysite.com. My uploaded index page is a .htm page (not .html). I don't know if that matters. Can someone explain how this happened and what I can do to fix it? Thanks!
Technical SEO | | finalfrontier0