How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed pages
Hi all, this is my first post so be kind 🙂 - I have a one page Wordpress site that has the Yoast plugin installed. Unfortunately, when I first submitted the site's XML sitemap to the Google Search Console, I didn't check the Yoast settings and it submitted some example files from a theme demo I was using. These got indexed, which is a pain, so now I am trying to remove them. Originally I did a bunch of 301's but that didn't remove them from (at least not after about a month) - so now I have set up 410's - These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?
Technical SEO | | Jettynz
Thanks in advance for any suggestions.0 -
Duplicate Page Title
Our pages has so many DUPLİCATE PAGE TİTLE
Technical SEO | | iskq
I want to change all of them, is it right way?0 -
Why is my office page not being indexed?
Good Morning from 24 degrees C partly cloudy wetherby UK 🙂 This page is not being indexed by Google:
Technical SEO | | Nightwing
http://www.sandersonweatherall.co.uk/office-to-let-leeds/ 1st Question Ive checked robots txt file no problems, i'm in the midst of updating the xml sitemap (it had the old one in place). It only has one link from this page http://www.sandersonweatherall.co.uk/Site-Map/ So is the reason oits not being indexed just a simple case of lack if SEO juice from inbound links so the remedy lies in routing more inbound links to the offending page? 2nd question Is the quickest way to diagnose if a web address is not being indexed to cut and paste the url in the Google search box and if it doesnt return the page theres a problem? Thanks in advance, David0 -
Homepage dropping back to page 30 and being replaced by a random page?
Hi All Please accept my apologies if i have posted this in the wrong place, i am new to this. I have asked for help over and over again on Google Webmaster Forum but everytime i am faced with sarcastic, unhelpful answers and then moaned at for asking the same question again when i get no answers. Well, my website is http://www.hillfieldscampingandleisure.co.uk. The site is nearly 2 years old and is an ecommerce online camping equipment store. It is hosted on the EKMPOWERSHOP Platform. After a about a year of adding products and designing my site i decided to hire an SEO Company based in the UK, they were a good company with some big clients. Anyways to cut a really long story short....they completely ripped me off by £700 a month for 7 months for my site to keep going backwards, they wouldnt target the keywords i wanted and all they did was provide really spammy, non relevant, no page rank links...my site ended up on number 31 of Google. I managed to drop the company and try to do things myself. I optimized my sites content so it wasn't keyword stuffed I re-wrote all my alt tags to look more natural I optimized my meta and h1 tags I carried on with trying to build relevant, high page rank links Anyways i managed to get my homepage to page 3/4 of Google. It stayed there for a few weeks but over the past few weeks my Homepage is dropping back to page 28-30 and being replaced with a random page of my site on page 4-6. It corrects itself after a while and my homepage returns but then it happens all over again....today i have a random page on page 4 and my homepage is on page 29. Any ideas on what is causing this and how can i get my site up there? I have had some ideas come back that it is the EKM platform i am using but since the seo company took the p out of me, its the only one i can afford at the moment until i start selling. I am a small business with stock waiting to be sold but no matter how much i read and rules to follow my site just doesn't seem to move. Any help would be really really apreciated and be nice!
Technical SEO | | hillfields0 -
Should i Change On Page Optimization ?
Hi, PC monitoring and computer monitoring software are our targeted keywords. Around 5 weeks ago, We created a page for pc monitoring software (home/pc-monitoring-software) and did some bookmarking and guest posts targeting PC monitoring software keyword. Now we are in Top 15 on Google for PC monitoring software keyword . Initially we were thinking to change content of around 2 year old home page to adjust computer monitoring software keyword and do SEO for this keyword. But few days ago, we noticed that our pc-monitoring-software page is already ranking in early fourties for computer monitoring software keyword as well. May be Google is giving advatage of being synonym of PC . Now we are thinking that we should optimize the PC monitoring software page for both computer and PC software keywords like adding "computer monitoring software" in addition to existing "pc monitoring software" in title and similalrly do other on page related work for Computer Mnitoring Software. We are also thinking of doing 301 redirect of existing pc-monitoring-software page to new computer-monitoring-software page which will be optimized both for PC and Computer. Please suggest me if it will help to get good ranking for both PC and Computer Monitoring software if we make above mentioned changes or we should not change the existing pc-monitoring-software page and shall stick to earlier plan of changing the home page to adjust for computer monitoring software.? I'm new to SEO, so want to make wise decision with your help instead of learning with failures. Thanks, shahzad
Technical SEO | | shaz_lhr0 -
New Domain Page 7 Google but Page 1 Bing & Yahoo
Hi just wondered what other people's experience is with a new domain. Basically have a client with a domain registered end of May this year, so less than 3 months old! The site ranks for his keyword choice (not very competitive), which is in the domain name. For me I'm not at all surprised with Google's low ranking after such a short period but quite surprsied to see it ranking page 1 on Bing and Yahoo. No seo work has been done yet and there are no inbound links. Anyone else have experience of this? Should I be surprised or is that normal in the other two search engines? Thanks in advance Trevor
Technical SEO | | TrevorJones0 -
No. of links on a page
Is it true that If there is a huge number of links from the source page then each link will provide very little value in terms of passing link juice ?
Technical SEO | | seoug_20050 -
Cache my page
So I need to get this page cached: http://www.flowerpetal.com/index.jsp?info=13 It's been 4-5 months since uploaded. Now it's linked to from the homepage of a PR5 site. I've tweeted that link 10 times, facebooked, stumbled, linked to it from other articles and still nothing. And I submitted the url to google twice. Any thoughts? Thanks Tyler
Technical SEO | | tylerfraser0