Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Website Page Speed is not increasing
HEY EXPERTS, My website page speed is not increasing. I used the wp rocket plugin but still, I am facing errors of Reduce unused CSS, Properly size images, and Avoid serving legacy JavaScript to modern browsers. you can see in the image Screenshot (7).png I used many plugins for speed optimization but still facing errors. I optimized the images manually by using photoshop but still, I am facing the issue of images size. After Google Core Web Vital Update my website keyword position is down due to slow speed. Please guide me on how I increase the page speed of my website https://karmanwalayfabrics.pk Thanks
Technical SEO | | frazashfaq110 -
Page Rank Flow
I wonder if someone can help me understand clearly page rank flow. If we have a website with a Home page, Services, About and Contact as a very basic website and the page rank will flow to each of those pages from the Home page (i'm not including internal linking between pages or anchor text from the home page content - this is a question purely about home page flow via the main navigation). If the Services page had 3 drop down pages. Would the home page rank also flow to each of these or is it going to the Services page which then distributes it to the three drop down. So instead of Home page rank flowing to 3 pages 33% each - it is flowing to 6 pages 16.6% each. Or is it flowing to 3 pages - 33.3% then the Services pages get a third of 33.3% ->10.1% I know this is simplifying it all a great deal- but it is the basic concept I am trying to grasp on this simple example. Thanks
Technical SEO | | AL123al0 -
Non Published Wordpress Pages
Hi, Is there any negative SEO consequences from having too many pages private or not published. Can it like slow the site down or does it not matter? Someone in my dept. has so many pages started/not complete and besides being messy, I wonder if it has any negative impact on the site. Thanks
Technical SEO | | aua1 -
Are image pages considered 'thin' content pages?
I am currently doing a site audit. The total number of pages on the website are around 400... 187 of them are image pages and coming up as 'zero' word count in Screaming Frog report. I needed to know if they will be considered 'thin' content by search engines? Should I include them as an issue? An answer would be most appreciated.
Technical SEO | | MTalhaImtiaz0 -
Should all pagination pages be included in sitemaps
How important is it for a sitemap to include all individual urls for the paginated content. Assuming the rel next and prev tags are set up would it be ok to just have the page 1 in the sitemap ?
Technical SEO | | Saijo.George0 -
Page titles in browser not matching WP page title
I have an issue with a few page titles not matching the title I have In WordPress. I have 2 pages, blog & creative gallery, that show the homepage title, which is causing duplicate title errors. This has been going on for 5 weeks, so its not an a crawl issue. Any ideas what could cause this? To clarify, I have the page title set in WP, and I checked "Disable PSP title format on this page/post:"...but this page is still showing the homepage title. Is there an additional title setting for a page in WP?
Technical SEO | | Branden_S0 -
What is the best way to find missing alt tags on my site (site wide - not page by page)?
I am looking to find all the missing alt tags on my site at once. I have a FF extension that use to do it page by page, but my site is huge and that will take forever. Thanks!!
Technical SEO | | franchisesolutions1 -
Page with h1 and h1 class=
Hi, If a page in the source code has boht following elements: class="blogg_rubrik">TITLE OF THE PAGE Is that bad for SEO, since the first H1 is empty? Shouldn't a page use only one H1?
Technical SEO | | Ypsilon0