How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page for page 301 redirects from old server to new server
Hi guys:
Technical SEO | | cindyt-17038
I have a client who is moving their entire ecommerce site from one hosting platform (Yahoo Store) to another (BigCommerce) and from one domain to another. The old domain is registered with the Yahoo as of yesterday and we have redirected the old domain (at the domain level) to the new domain. However, we are having trouble getting the pages to redirect page for page. Currently they are all redirecting to the new domain home page. We did just move the old domain from GoDaddy to Yahoo yesterday thinking this would solve it however as of this morning the old pages are still redirecting to the home page of the new domain. To complete the 301 redirect picture, we uploaded the redirects (all relative links for both from and to) to BigCommerce. And while the domain was hosted at GoDaddy with a redirect to the new domain, they were working. We moved the domain to Yahoo because of email issues thinking it should still work. Is it possibly just a waiting game now as the change populates across the DNS? old url to test:
rock-n-roll-action-figures.com/fender-jazz-bass-miniature-guitar-replica-classic-red-finish.html0 -
Pages not being cached have a negative effect?
Hi all! I look after a website where it's been discovered a section of the website has the noarchive robots meta tag active on it causing it to not get cached but has been indexed. Out of curiosity has anyone seen any negative effects from Google for having pages that aren't cached? It's not the strongest section on the website so makes it tricky to judge myself but interested if anyone had any thoughts on the matter. Cheers,
Technical SEO | | thisisOllie0 -
I need help compiling solid documentation and data (if possible) that having tons of orphaned pages is bad for SEO - Can you help?
I spent an hour this afternoon trying to convince my CEO that having thousands of orphaned pages is bad for SEO. His argument was "If they aren't indexed, then I don't see how it can be a problem." Despite my best efforts to convince him that thousands of them ARE indexed, he simply said "Unless you can prove it's bad and prove what benefit the site would get out of cleaning them up, I don't see it as a priority." So, I am turning to all you brilliant folks here in Q & A and asking for help...and some words of encouragement would be nice today too 🙂 Dana
Technical SEO | | danatanseo0 -
How to remove the duplicate page title
Hi everyone, I saw many posts related to this query.But i couldnt find a solution for my error.. Here is my question I got 575 Duplicate page title & 600 duplicate page content errors. My site is related to realestate. I created a page title like same sentence differs with locality name Eg: Land for sale - kandy property Land for sale - Galle property Likewise Locality name only differs..I have created meta title & Content like this. Can anyone let me know how to solve this error ASAP ?
Technical SEO | | Rajesh.Chandran0 -
Issue Duplicate Page Title
I'm having some really strange issues with duplicate page titles and I can't seem to figure out what's going on. I just got a new crawl from SEOMOZ and it's showing some duplicate page titles. http://www.example.com/blog/ http://www.example.com/blog/page/2/ http://www.example.com/blog/page/3/ Repeat .............. I have no idea what's going on, how these were duplicated, or how to correct it. Does anyone have a chance to take a look and see if you can figure out what's happening and what I need to do to correct the errors? I'm using Wordpress and all in one SEO plugin. Thanks so much!
Technical SEO | | KLLC0 -
Duplicate Content on SEO Pages
I'm trying to create a bunch of content pages, and I want to know if the shortcut I took is going to penalize me for duplicate content. Some background: we are an airport ground transportation search engine(www.mozio.com), and we constructed several airport transportation pages with the providers in a particular area listed. However, the problem is, sometimes in a certain region multiple of the same providers serve the same places. For instance, NYAS serves both JFK and LGA, and obviously SuperShuttle serves ~200 airports. So this means for every airport's page, they have the super shuttle box. All the provider info is stored in a database with tags for the airports they serve, and then we dynamically create the page. A good example follows: http://www.mozio.com/lga_airport_transportation/ http://www.mozio.com/jfk_airport_transportation/ http://www.mozio.com/ewr_airport_transportation/ All 3 of those pages have a lot in common. Now, I'm not sure, but they started out working decently, but as I added more and more pages the efficacy of them went down on the whole. Is what I've done qualify as "duplicate content", and would I be better off getting rid of some of the pages or somehow consolidating the info into a master page? Thanks!
Technical SEO | | moziodavid0 -
301 by category rather than page?
Hi Guys It is possible (and best practice) to 301 a category as a whole rather than every page within the category. I was asked this by a developer the other day, and it was something I'd not considered before as I've always done it by page. It seems there's some wordpress plugins out there that do it. Thanks in advance.
Technical SEO | | PerchDigital0 -
Specific Link Page in Domain
Hi everyone: I have seen that many SEO Agencies have contacted my business (Also SEO but In- House) in order to interchange links. They have created a specific page on their site with the Label "Links" or similar, and on that page they add multiple links of the competence. I have heard that you can only do that if you make sure you add two things: No follow in links. Not inserting links of websites that have nothing to do with our sector. Either way, I have never found this amusing. I always recommend people not to do this but I have my doubts after all. ¿Could some one give me their opinion? Cheers !
Technical SEO | | Tintanus0