How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to find all the keywords that an existing page is currently ranking for...is there a way to do that in MOZ or another tool?
I've seen this done during a software demo and saw it as the only value add for that tool but it's not worth the price of the whole tool for that one feature. The tool I saw showed you all the keywords you currently ranked for (within the top 200 positions), the position you were at, the number of users that term drove to your site and the total search volume for the keyword. SUPER useful info to have.
Technical SEO | | BrianPiper1 -
Page Juice not moving???
Moved URL's from ldnwicklesscandles.com to ldnwicklesscandles.co.uk because I wanted to rank better for UK where I'm located and thought also the .co.uk for my competitors may have been giving them the advantage. Use Squarespace 7 (transferred over from SS5)----they told me to set primary domain to .co.uk and I've done it. I've also done a 301 redirect and done a change of address in webmaster tools although I'm not sure if all of this is needed? Squarespace seem to think just setting the primary domain is enough. My question is its been a couple of weeks, I've resubmited to Google webmaster to try to speed things up, the new URL is appearing in Google but none of my Page Juice seems to be transferring yet? How long will it take? I know not all the juice will move over but my PA/DA is non existent now and I have no idea if I'm just being impatient or I've done something wrong here. Not a Pro, Just a small biz owner here so forgive me if this has been asked before.
Technical SEO | | ldnwickless0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
Help! Pages not being indexed
Hi Mozzers, I need your help.
Technical SEO | | bshanahan
Our website (www.barnettcapitaladvisors.com) stopped being indexed in search engines following a round of major changes to URLs and content. There were a number of dead links for a few days before 301 redirects were properly put in place. And now, only 3 pages show up in bing when I do the search "site:barnettcapitaladvisors.com". A bunch of pages show up in Google for that search, but they're not any of the pages we want to show up. Our home page and most important services pages are nowhere in search results. What's going on here?
Our sitemap is at http://www.barnettcapitaladvisors.com/sites/default/files/users/AndrewCarrillo/sitemap/sitemap.xml
Robots.txt is at: http://www.barnettcapitaladvisors.com/robots.txt Thanks!0 -
Old Product Pages
Hi Issue: I have old versions of a product page in the Google index for a product that I still carry. Why: The URLs were changed when we updated this product page a few years ago. There are four different URLs for this product -- no duplicate content issues b/c we updated the product info, Title tags, etc. So I have a few pages indexed by Google for a particular product. Including a current, up-to-date page. The old pages don't get any traffic, but if I type in google search: "product name" site:store.com then all of the versions of this page appear. The old pages don't have any links to them, only one has any PA, and as I said they don't get any traffic, and the current page is around #8 in google for its keyword. Question: Do these old pages need 301 redirects, should I ask google to remove the old URLs? It seems like Google picks the right version of this page for this keyword query, is it possible that the existence of these other pages (that are not nearly as optimized for the keyword) drag it down a bit in the results? Thanks in advance for any help
Technical SEO | | IOSC0 -
Google Cache is not showing in my page
Hello Everyone, I have issue in my Page, My category page (http://www.bannerbuzz.com/custom-vinyl-banners.html) is regular cached in past, but before sometime it can't show the cached result in SERP and not show in cached result , I have also fetch this link in google web master, but can't get the result, it is showing following message. 404. That’s an error. The requested URL /search?q=cache%3A http%3A//www.bannerbuzz.com/custom-vinyl-banners.html was not found on this server. That’s all we know. My category page rank is 2 and its keyword is on first in google.com, so i am little bit worried about this page cache issue, Can someone please tell me why is this happening? Is this a temporary issue? Help me to solve out this cache issue and once again my page will regularly cache in future. Thanks
Technical SEO | | CommercePundit0 -
I am trying to correct error report of duplicate page content. However I am unable to find in over 100 blogs the page which contains similar content to the page SEOmoz reported as having similar content is my only option to just dlete the blog page?
I am trying to correct duplicate content. However SEOmoz only reports and shows the page of duplicate content. I have 5 years worth of blogs and cannot find the duplicate page. Is my only option to just delete the page to improve my rankings. Brooke
Technical SEO | | wianno1680 -
Pages plummeting in ranking
Hi all, I have a question, which i hope you can answer for me. I have a site www.betxpert.com (a danish betting site) and we have tried to do some SEO to improve conversions. One of the steps we have taken was to link to all of our bookmaker reviews in our menu (a mega menu). All of our bookmakers have an img and text link in the menu. The menu is shown on every page of the site. Since we have made this change we have been plumeting down the SERPs. For the search "betsafe" this page http://www.betxpert.com/bookmakere/betsafe is no longer in the top 50. We also added the "stars" so that the google result will show our over all review for the bookmaker, in order to stand out in the SERPs. Can anyone explain to me what the problem might be? Over extensive internal linking or?
Technical SEO | | rasmusbang0