Unreachable Pages
-
Hi All
Is there a tool to check a website if it has stand alone unreachable pages?
Thanks for helping
-
The only possible way I can think of is if the other person's site has an xml sitemap that is accurate, complete, and was generated by the website's system itself. (As is often created by plugins on WordPress sites, for example)
You could then pull the URLs from the xml into the spreadsheet as indicated above, add the URLs from the "follow link" crawl and continue from there. If a site has an xml sitemap it's usually located at www.website.com/sitemap.xml. Alternately, it's location may be specified in the site's robots.txt file.
The only way this can be done accurately is if you can get a list of all URLs natively created by the website itself. Any third-party tool/search engine is only going to be able to find pages by following links. And the very definition of the pages you're looking for is that they've never been linked. Hence the challenge.
Paul
-
Thanks Paul! Is there any way to do that for another persons site, any tool?
-
The only way I can see accomplishing this is if you have a fully complete sitemap generated by your own website's system (ie not created by a third-party tool which simply follow links to map your site)
Once you have the full sitemap, you'll also need to do a crawl using something like Screaming Frog to capture all the pages it can find using the "follow link" method.
Now you should have a list of ALL the pages on the site (the first sitemap) and a second list of all the pages that can be found through internal linking. Load both into a spreadsheet and eliminate all the duplicate URLs. What you'll be left with "should" be the pages that aren't connected by any links - ie the orphaned pages.
You'll definitely have to do some manual cleanup in this process to deal with things like page URLs that include dynamic variables etc, but it should give a strong starting point. I'm not aware of any tool capable of doing this for you automatically.
Does this approach make sense?
Paul
-
pages without any internal links to them
-
Do you mean orphaned pages without any internal links to them? Or pages that are giving a bad server header code?
-
But I want to find the stand alone pages only. I don't want to see the reachable pages. Can any one help?
-
If the page is indexed you can just place the site url in quotes "www.site.com" in google and it will give you all the pages that has this url on it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What should I do with all these 404 pages?
I have a website that Im currently working on that has been fairly dormant for a while and has just been given a face lift and brought back to life. I have some questions below about dealing with 404 pages. In Google WMT/search console there are reports of thousands of 404 pages going back some years. It says there are over 5k in total but I am only able to download 1k or so from WMT it seems. I ran a crawl test with Moz and the report it sent back only had a few hundred 404s in, why is that? Im not sure what to do with all the 404 pages also, I know that both Google and Moz recommend a mixture of leaving some as 404s and redirect others and Id like to know what the community here suggests. The 404s are a mix of the following: Blog posts and articles that have disappeared (some of these have good back-links too) Urls that look like they used to belong to users (the site used to have a forum) which where deleted when the forum was removed, some of them look like they were removed for spam reasons too eg /user/buy-cheap-meds-online and others like that Other urls like this /node/4455 (or some other random number) Im thinking I should permanently redirect the blog posts to the homepage or the blog but Im not sure what to do about all the others? Surely having so many 404s like this is hurting my crawl rate?
Technical SEO | | linklander0 -
Why are only a few of our pages being indexed
Recently rebuilt a site for an auctioneers, however it has a problem in that none of the lots and auctions are being indexed by Google on the new site, only the pages like About, FAQ, home, contact. Checking WMT shows that Google has crawled all the pages, and I've done a "Fetch as Google" on them and it loads up fine, so there's no crawling issues that is standing out. I've set the "URL Parameters" to no effect too. Also built a sitemap with all the lots in, pushed to Google which then crawled them all (massive spike in Crawl rate for a couple days), and still just indexing a handful of pages. Any clues to look into would be greatly appreciated. https://www.wilkinsons-auctioneers.co.uk/auctions/
Technical SEO | | Blue-shark0 -
Why are some pages now duplicate content?
It is probably a silly question, but all of a sudden, the following pages of one of my clients are reported as Duplicate content. I cannot understand why. They weren't before... http://www.ciaoitalia.nl/product/pizza-originale/mediterranea-halal
Technical SEO | | MarketingEnergy
http://www.ciaoitalia.nl/product/pizza-originale/gyros-halal
http://www.ciaoitalia.nl/product/pizza-originale/döner-halal
http://www.ciaoitalia.nl/product/pizza-originale/vegetariana
http://www.ciaoitalia.nl/product/pizza-originale/seizoen-pizza-estate
http://www.ciaoitalia.nl/product/pizza-originale/contadina
http://www.ciaoitalia.nl/product/pizza-originale/4-stagioni
http://www.ciaoitalia.nl/product/pizza-originale/shoarma Thanks for any help in the right direction 🙂 | |
| |
| |
| |
| |
| |
| |
| | <colgroup><col style="mso-width-source: userset; mso-width-alt: 17225; width: 353pt;" width="471"></colgroup>
| http://www.ciaoitalia.nl/product/pizza-originale/mediterranea-halal |
| http://www.ciaoitalia.nl/product/pizza-originale/gyros-halal |
| http://www.ciaoitalia.nl/product/pizza-originale/döner-halal |
| http://www.ciaoitalia.nl/product/pizza-originale/vegetariana |
| http://www.ciaoitalia.nl/product/pizza-originale/seizoen-pizza-estate |
| http://www.ciaoitalia.nl/product/pizza-originale/contadina |
| http://www.ciaoitalia.nl/product/pizza-originale/4-stagioni |
| http://www.ciaoitalia.nl/product/pizza-originale/shoarma |0 -
Too Many On-Page Links?
How much would this affect my page ranks performance? There are many Too Many On-Page Links? warning on my campaign. should I address this issue right away to fix it or leave it as it would not matter seriously ? I've looked at some of the pages and think all of them are necessary. Could someone help me? Thanks!
Technical SEO | | LauraHT0 -
Getting on to page 1 again
I have recently set up a website, and as soon as it got indexed by google, I ranked on the first page at number 10... However, Ever since I have started trying to backlink.. the more I seem to do, the more I seem to drop down the rankings! The same for both Bing and Yahoo... I am really not sure what the problem is. My site is www.arilinegamesatc.com, and the ranking words are airline games and airline game. Any help woudl be greatly appreciated!
Technical SEO | | rolls1230 -
Linking from and to pages
My website, www.kamperen-bij-de-boer.com, tells people what campingssites can be found in The Netherlands for recreational purposes. In order for a campingsite to be mentioned on our website we ask them to place a link to our website (either using a text link or image link) and then we make a page for that campsite on our website with in the end a link to ther website, e.g. http://www.kamperen-bij-de-boer.com/Minicamping-In-t-Oldambt.html -> they in return link back to us. Since this comes natural will this or won't this be penalized by Google and so on for linkfarming. At this moment we have about 600 camping sites on our website alone linking to us (not all of them) and we are linking to them. Since this can be explained as link trading which is not as good for your ranking as one-way-linking what should be wise? Should i include a nofollow? I already have many links from other sites linking to mine without having to link back, is there anything else i can do with linking to ensure better ranking?
Technical SEO | | JarnoNijzing0 -
New Domain Page 7 Google but Page 1 Bing & Yahoo
Hi just wondered what other people's experience is with a new domain. Basically have a client with a domain registered end of May this year, so less than 3 months old! The site ranks for his keyword choice (not very competitive), which is in the domain name. For me I'm not at all surprised with Google's low ranking after such a short period but quite surprsied to see it ranking page 1 on Bing and Yahoo. No seo work has been done yet and there are no inbound links. Anyone else have experience of this? Should I be surprised or is that normal in the other two search engines? Thanks in advance Trevor
Technical SEO | | TrevorJones0 -
GWT indexing wrong pages
Hi SEOMoz I have a listings site. In a part of the page, I have 3 comboboxes, for state, county and city. On the change event, the javascript redirects the user to the page of the selected location. Parameters are passed via GET, and my URL is rewrited via htaccess. Example: http:///www.site.com/state/county/city.html The problem is, there is A LOT(more than 10k) of 404 errors. It is happenning because the crawler is trying to index the pages, sometimes WITHOUT a parameter, like http:///www.site.com/state//city.html I don't know how to stop it, and I don't wanna remove it, once it's very clicked by the users. What should I do?
Technical SEO | | elias990