How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to find all the keywords that an existing page is currently ranking for...is there a way to do that in MOZ or another tool?
I've seen this done during a software demo and saw it as the only value add for that tool but it's not worth the price of the whole tool for that one feature. The tool I saw showed you all the keywords you currently ranked for (within the top 200 positions), the position you were at, the number of users that term drove to your site and the total search volume for the keyword. SUPER useful info to have.
Technical SEO | | BrianPiper1 -
How to deal with duplicate pages on Shopify
Moz is alerting me that there's about 60 duplicate pages on my Shopify ecommerce site. Most of them are products. I'm not sure how to fix this since the coding for my site is in liquid. I'm not sure if this is something I even need to be worried about. Most of these duplicate pages are a result of product tags shopify sites use to group products you tag with characteristics that the user can select in the product view. here are a couple URLS: https://www.mamadoux.com/collections/all/hooded https://www.mamadoux.com/collections/all/jumpers https://www.mamadoux.com/collections/all/menswear
Technical SEO | | Mamadoux0 -
Titling Category Pages Like You Would a Blog Page?
So, with our 600 or so category pages, I was curious... on each of these category pages we show the top 12 products for that category. In trying to increase click through rate, I wonder if it would be prudent to use some of the strategies I see used for Blog posts with thee category pages. i.e. Instead of Category Name - Website Name How about: Top 12 Kitty Litters We Carry - View the Best and the Rest! Or something like that. And then in the description, I could put, "Number 8 made my jaw drop!!!" (Ok, kidding about that one...) But serious about the initial question... Thanks! Craig
Technical SEO | | TheCraig0 -
When creating parent and child pages should key words be repeated in url and page title?
We are in the direct mail advertising business: PrintLabelAndMail.com Example: Parent:
Technical SEO | | JimDirectMailCoach
Postcard Direct Mail Children:
Postcard Mailings
Postcard Design
Postcard Samples
Postcard Pricing
Postcard Advantages should "postcard" be repeated in the URL and Page Title? and in this example should each of the 5 children link back directly to the parent or would it be better to "daisy chain" them using each as parent for the next?0 -
Penalized by google. How to find out?
Our webpage performs very bad on some keywords relating to one product. At the SeoMoz-ranking page i can se we are number 9 but we have the highest (higher than our competitors) rating in almost every category (at least 25 of 30) on the keyword difficulty report. How do i find out why this is so, or if we have been penalized by google?On other search-engines (yahoo, bing etc) we are number one! And we have the highest pagerank among the competitors...
Technical SEO | | alsvik0 -
Page not Accesible for crawler in on-page report
Hi All, We started using SEOMoz this week and ran into an issue regarding the crawler access in the on-page report module. The attached screen shot shows that the HTTP status is 200 but SEOMoz still says that the page is not accessible for crawlers. What could this be? Page in question
Technical SEO | | TiasNimbas
http://www.tiasnimbas.edu/Executive_MBA/pgeId=307 Regards, Coen SEOMoz.png0 -
How do I know which page a link is from
I've got an interesting situation. I hope you can help. I have a list of links but I'm not sure which pages of my site they are from. How do I know which page a specific link is from? Thanks in advance.
Technical SEO | | VinceWicks0 -
On Page 301 redirect for html pages
For php pages youve got Header( "HTTP/1.1 301 Moved Permanently" );
Technical SEO | | shupester
Header( "Location: http://www.example.com" );
?> Is there anything for html pages? Other then Or is placing this code redirect 301 /old/old.htm http://www.you.com/new.php in the .htaccess the only way to properly 301 redirect html pages? Thanks!0