Unreachable Pages
-
Hi All
Is there a tool to check a website if it has stand alone unreachable pages?
Thanks for helping
-
The only possible way I can think of is if the other person's site has an xml sitemap that is accurate, complete, and was generated by the website's system itself. (As is often created by plugins on WordPress sites, for example)
You could then pull the URLs from the xml into the spreadsheet as indicated above, add the URLs from the "follow link" crawl and continue from there. If a site has an xml sitemap it's usually located at www.website.com/sitemap.xml. Alternately, it's location may be specified in the site's robots.txt file.
The only way this can be done accurately is if you can get a list of all URLs natively created by the website itself. Any third-party tool/search engine is only going to be able to find pages by following links. And the very definition of the pages you're looking for is that they've never been linked. Hence the challenge.
Paul
-
Thanks Paul! Is there any way to do that for another persons site, any tool?
-
The only way I can see accomplishing this is if you have a fully complete sitemap generated by your own website's system (ie not created by a third-party tool which simply follow links to map your site)
Once you have the full sitemap, you'll also need to do a crawl using something like Screaming Frog to capture all the pages it can find using the "follow link" method.
Now you should have a list of ALL the pages on the site (the first sitemap) and a second list of all the pages that can be found through internal linking. Load both into a spreadsheet and eliminate all the duplicate URLs. What you'll be left with "should" be the pages that aren't connected by any links - ie the orphaned pages.
You'll definitely have to do some manual cleanup in this process to deal with things like page URLs that include dynamic variables etc, but it should give a strong starting point. I'm not aware of any tool capable of doing this for you automatically.
Does this approach make sense?
Paul
-
pages without any internal links to them
-
Do you mean orphaned pages without any internal links to them? Or pages that are giving a bad server header code?
-
But I want to find the stand alone pages only. I don't want to see the reachable pages. Can any one help?
-
If the page is indexed you can just place the site url in quotes "www.site.com" in google and it will give you all the pages that has this url on it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Overdynamic Pages - How to Solve it?
Hi everyone, I'm running a classified real estate ads site, where people can publish their apartment or house they want to sell, so we use multiple filters to help people find what they want. Lately we added multiple filters to the URL to make the search more precise, things like: Prices (priceAmount=###) Bedrooms (BedroomsNumber=2) Bathrooms (BathroomsNumber=3) TotalArea (totalArea=1_50) Services (Elevator, CommonAreas, security) Among other Filters so you see the picture, all this filters are on the URL so that people can share their search on multiple social media, that makes two problems for moz crawl: Overdynamic URLs Too long URLs Now what would be a good solution for this 2 problems, would a canonical to the original page before the "?" would be ok? Example:
Technical SEO | | JoaoCJ
http://urbania.pe/buscar/venta-de-propiedades?bathroomsNumber=2&services=gas&commonAreas=solarium The problem I have with this solution is that I also have a pagination parameter (page=2), and I'm using prev and next tags, if I use a such canonical will break the prev and next tag? http://urbania.pe/buscar/venta-de-propiedades?bathroomsNumber=2&services=gas&commonAreas=solarium&page=2 Also thinking if adding a noindex on pages with paramters could also be an option. Thanks a lot, I'm trying to address this issues.0 -
Issues Indexing Translated Pages
I'm having trouble getting http://www.procloud.ch/ to index for their german pages. The english pages are being indexed but not the german. Any ideas? Chris
Technical SEO | | ninel_P0 -
Pages Not Getting Indexed
Hey there I have a website with pretty much 3-4 pages. All of them had a canonical pointing to one page and the same content ( which happened by mistake ) I removed the canonical URL and added one pointing to its page. Also, I added the original content that was supposed to be there to begin with. It's been weeks but those pages are not getting indexed on the SERPS while the one that they use to point with the canonical does.
Technical SEO | | AngelosS0 -
Blog Page Titles - Page 1, Page 2 etc.
Hi All, I have a couple of crawl errors coming up in MOZ that I am trying to fix. They are duplicate page title issues with my blog area. For example we have a URL of www.ourwebsite.com/blog/page/1 and as we have quite a few blog posts they get put onto another page, example www.ourwebsite.com/blog/page/2 both of these urls have the same heading, title, meta description etc. I was just wondering if this was an actual SEO problem or not and if there is a way to fix it. I am using Wordpress for reference but I can't see anywhere to access the settings of these pages. Thanks
Technical SEO | | O2C0 -
Pages to be indexed in Google
Hi, We have 70K posts in our site but Google has scanned 500K pages and these extra pages are category pages or User profile pages. Each category has a page and each user has a page. When we have 90K users so Google has indexed 90K pages of users alone. My question is. Should we leave it as they are or should we block them from being indexed? As we get unwanted landings to the pages and huge bounce rate. If we need to remove what needs to be done? Robots block or Noindex/Nofollow Regards
Technical SEO | | mtthompsons0 -
Has Google stopped rendering author snippets on SERP pages if the author's G+ page is not actively updated?
Working with a site that has multiple authors and author microformat enabled. The image is rendering for some authors on SERP page and not for others. Difference seems to be having an updated G+ page and not having a constantly updating G+ page. any thoughts?
Technical SEO | | irvingw0 -
How To SEO Mobile Pages?
hello, I have finally put my first foot on the path of trying to learn and understand mobile SEO. I have a few questions regarding mobile SEO and how it works, so please help me out. I use wordpress for my site, and there is a nifty plugin called WP touch http://wordpress.org/extend/plugins/wptouch/ What it basically does is, it converts your desktop version into a mobile friendly version. I wanted to know that if it does that, does this mean whatever SEO i do for my regular web site gets accomplished for my moible version as well? Another simple question is, if i search for the same term on my mobile phone then on my desktop how different will the SERs be? thanks moz peeps
Technical SEO | | david3050 -
Dealing with 404 pages
I built a blog on my root domain while I worked on another part of the site at .....co.uk/alpha I was really careful not to have any links go to alpha - but it seems google found and indexed it. The problem is that part of alpha was a copy of the blog - so now soon we have a lot of duplicate content. The /alpha part is now ready to be taken over to the root domain, the initial plan was to then delete /alpha. But now that its indexed I'm worried that Ill have all these 404 pages. I'm not sure what to do.. I know I can just do a 301 redirect for all those pages to go to the other ones in case a link comes on but I need to delete those pages as the server is already very slow. Or does a 301 redirect mean that I don't need those pages anymore? Will those pages still get indexed by google as separate pages? Please assist.
Technical SEO | | borderbound0