How to find orphan pages
-
Hi all,
I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site.
However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too).
Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract?
Thanks!
-
Yes I mentioned in my case I use Semrush and there is a dedicated space for that specific parameter. The easiest way to get your log files is logging into your cPanel and find an option called Raw Log Files. If you are still not able to find it, you may need to contact your hosting provider and ask them to provide the log files for your site.
Raw Access Logs allow you to see what the visits to your website were without displaying graphs, charts, or other graphics. You can use the Raw Access Logs menu to download a zipped version of the server’s access log for your site. This can be very useful when you want to quickly see who has visited your site.
Raw logs may only contain a few hours’ worths of data because they are discarded after the system processes them. However, if archiving is enabled, the system archives the raw log data before the system discards it. So go ahead and ensure that you are archiving!
Once you have your log file ready to go, you now need to gather the other data set of pages that can be crawled by Google, using Screaming Frog.
Crawl Your Pages with Screaming Frog SEO Spider
Using the Screaming Frog SEO Spider, you can crawl your website as Googlebot would, and export a list of all the URLs that were found.
Once you have Screaming Frog ready, first ensure that your crawl Mode is set to the default ‘Spider’.
Then make sure that under Configuration > Spider, ‘Check External Links’ is unchecked, to avoid unnecessary external site crawling.
Now you can type in your website URL, and click Start.
Once the crawl is complete, simply
a. Navigate to the Internal tab.
b. Filter by HTML.
c. Click Export.
d. Save in .csv format.Now you should have two sets of URL data, both in .csv format:
All you need to do now is compare the URL data from the two .csv files, and find the URLs that were not crawlable.If you decided to analyze a log file instead, you can use the Screaming Frog SEO Log File Analyser to uncover our orphan pages. (Keep in mind that Log File Analyzer is not the same tool that SEO spyder)
The tool is very easy to use (download here), from the dashboard you have the ability to import the two data sets that you need to analyze
If the answer were useful do not forget to mark it as a good answer ....Good Luck
-
Hi Roman,
Out of interest, is there an option to expert an orphan page report like there is in Screaming Frog? (Reports / Orphan Pages).
I guess the true and most realistic option is to get the list from the dev team as using the sitemap isn't plausible as these pages should still get indexed. The new Google Search Console also lets you test individual pages and as long as they're in the sitemap, they should (hopefully) be indexed.
Still, trying to get a list of ALL pages on a site, without dev support, seems to be a challenge I'm trying to solve
-
Even Screaming-frog have problems to find all the orphan-pages, I use Screaming-frog, Moz, Semrush, Ahrefs, and Raven-tools in my day to day and honestly, Semrush is the one that gives me better results for that specific tasks. As an experience, I can say that a few months ago I took a website and it was a complete disaster, no sitemap, no canonical tags, no meta-tags and etc.
I run screaming-frog and showed me just 200 pages but I knew it was too much more at the end I founded 5k pages with Semrush, probably even the crawler of screaming frog has problems with that website so I commenting that as an experience.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our protected pages 302 redirect to a login page if not a member. Is that a problem for SEO?
We have a membership site that has links out in our unprotected pages. If a non-member clicks on these links it sends a 302 redirect to the login / join page. Is this an issue for SEO? Thanks!
Technical SEO | | rimix1 -
Canonicalisation and Dynamic Pages
We have an e-commerce single page app hosted at https://www.whichledlight.com and part of this site is our search results page (http://www.whichledlight.com/t/gu10-led-bulbs?fitting_eq=GU10). To narrow down products on the results we make heavy use of query parameters. From an SEO perspective we are telling GoogleBot to not index pages that include these query parameters to prevent duplicate content issues and to not index pages where the combination of query parameters has resulted in no results being returned. The only exception to this is the page parameter. We are posting here to check our homework so to speak. Does the above sound sensible? Although we have told GoogleBot to not index these pages, Moz will still crawl them (to the best of my knowledge), so we will continue to see crawl errors within our Moz reports where in fact these issues don't exist. Is this true? Is there anyway to make Moz ignore pages with certain query parameters? Any other suggestions to improve the SEO of our results pages is most appreciated. Thanks
Technical SEO | | TrueluxGroup0 -
Page for page 301 redirects from old server to new server
Hi guys:
Technical SEO | | cindyt-17038
I have a client who is moving their entire ecommerce site from one hosting platform (Yahoo Store) to another (BigCommerce) and from one domain to another. The old domain is registered with the Yahoo as of yesterday and we have redirected the old domain (at the domain level) to the new domain. However, we are having trouble getting the pages to redirect page for page. Currently they are all redirecting to the new domain home page. We did just move the old domain from GoDaddy to Yahoo yesterday thinking this would solve it however as of this morning the old pages are still redirecting to the home page of the new domain. To complete the 301 redirect picture, we uploaded the redirects (all relative links for both from and to) to BigCommerce. And while the domain was hosted at GoDaddy with a redirect to the new domain, they were working. We moved the domain to Yahoo because of email issues thinking it should still work. Is it possibly just a waiting game now as the change populates across the DNS? old url to test:
rock-n-roll-action-figures.com/fender-jazz-bass-miniature-guitar-replica-classic-red-finish.html0 -
Google showing https:// page in search results but directing to http:// page
We're a bit confused as to why Google shows a secure page https:// URL in the results for some of our pages. This includes our homepage. But when you click through it isn't taking you to the https:// page, just the normal unsecured page. This isn't happening for all of our results, most of our deeper content results are not showing as https://. I thought this might have something to do with Google conducting searches behind secure pages now, but this problem doesn't seem to affect other sites and our competitors. Any ideas as to why this is happening and how we get around it?
Technical SEO | | amiraicaew0 -
How do I find which pages are being deindexed on a large site?
Is there an easy way or any way to get a list of all deindexed pages? Thanks for reading!
Technical SEO | | DA20130 -
Seomoz pages error
Hi
Technical SEO | | looktouchfeel
I have a problem with seomoz, it is saying my website http://www.clearviewtraffic.com has page errors on 19,680 pages. Most of the errors are for duplicate page titles. The website itself doesn't even have 100 pages. Does anyone know how I can fix this? Thanks Luke0 -
Homepage dropping back to page 30 and being replaced by a random page?
Hi All Please accept my apologies if i have posted this in the wrong place, i am new to this. I have asked for help over and over again on Google Webmaster Forum but everytime i am faced with sarcastic, unhelpful answers and then moaned at for asking the same question again when i get no answers. Well, my website is http://www.hillfieldscampingandleisure.co.uk. The site is nearly 2 years old and is an ecommerce online camping equipment store. It is hosted on the EKMPOWERSHOP Platform. After a about a year of adding products and designing my site i decided to hire an SEO Company based in the UK, they were a good company with some big clients. Anyways to cut a really long story short....they completely ripped me off by £700 a month for 7 months for my site to keep going backwards, they wouldnt target the keywords i wanted and all they did was provide really spammy, non relevant, no page rank links...my site ended up on number 31 of Google. I managed to drop the company and try to do things myself. I optimized my sites content so it wasn't keyword stuffed I re-wrote all my alt tags to look more natural I optimized my meta and h1 tags I carried on with trying to build relevant, high page rank links Anyways i managed to get my homepage to page 3/4 of Google. It stayed there for a few weeks but over the past few weeks my Homepage is dropping back to page 28-30 and being replaced with a random page of my site on page 4-6. It corrects itself after a while and my homepage returns but then it happens all over again....today i have a random page on page 4 and my homepage is on page 29. Any ideas on what is causing this and how can i get my site up there? I have had some ideas come back that it is the EKM platform i am using but since the seo company took the p out of me, its the only one i can afford at the moment until i start selling. I am a small business with stock waiting to be sold but no matter how much i read and rules to follow my site just doesn't seem to move. Any help would be really really apreciated and be nice!
Technical SEO | | hillfields0 -
Should I delete a page or remove links on a penalized page?
Hello All, If I have a internal page that has low quality links point to it or a penality. Can I just remove the page, and start over versus trying to remove the links? Over time wouldn't this page disapear along with the penalty on that page? Kinda like pruning a tree? Cutting off the junk limbs so other could grow stronger, or to start new fresh ones. Example: www.domain.com Penalized Internal Page: (Say this page is penalized due to keyword stuffing, and has low quality links pointing to it like blog comments, or profiles) www.domain.com/penalized-internal-page.com Would it be effective to just delete this page (www.domain.com/penalized-internal-page.com) and start over with a new page. New Internal Page: www.domain.com/new-internal-page.com I would of course lose any good links point to that page, but it might be easier then trying to remove old back links. Thoughts? Thanks! Pete
Technical SEO | | Juratovic0