I need help compiling solid documentation and data (if possible) that having tons of orphaned pages is bad for SEO - Can you help?
-
I spent an hour this afternoon trying to convince my CEO that having thousands of orphaned pages is bad for SEO. His argument was "If they aren't indexed, then I don't see how it can be a problem."
Despite my best efforts to convince him that thousands of them ARE indexed, he simply said "Unless you can prove it's bad and prove what benefit the site would get out of cleaning them up, I don't see it as a priority."
So, I am turning to all you brilliant folks here in Q & A and asking for help...and some words of encouragement would be nice today too
Dana
-
Agreed on all counts Jason, not to mention the improved customer experience because we won't have people landing on those God-awful ugly and useless pages!
From a server perspective, could deleting 8,000 files (pages, images, PDFs) results in our site speed improving too? Or would it likely have no impact?
-
So you have roughly 8,500 pages that are part of your customer experience and that you want customers to be able to navigate to from your site and presumably would like customers to find on Google. (from Screaming Frog).
But only 7,500 only pages are in Google's index. So best case, roughly 1,000 of your good pages (almost 12% of all the pages on your site) don't exist in organic search. Worst case, is that some of those 7,500 pages in google are depreciated pages that aren't part of your active site, making the percentage of live pages in google even worse.
It's very possible that a portion of your google crawl budget is being consumed by pages that don't help you. If you get those pages out of the index, you stand a better chance to get your 1000 good pages into the index.
-
Hi Jason,
Ok, here is what I saw in Screaming Frog:
27,616 total spidered URLs, of which:
- 8,494 are HTML pages
- 45 are CSS files
- 14,687 are images
- 4,287 are PDFs
Google says we have only 7,540 URLs indexed (of all types) - I know for a fact that at least 500 orphaned pages are indexed in Google. It seems to me, then, that Google is indexing content that isn't important to us, and perhaps not indexing other content that is important to us because it's having trouble telling what's important and what's not.
Any insights on that Jason? What do you make of it?
-
Hi Jason,
I'm just following up as I get my ducks in a row on this one. Above in your comment you said "Google Count of Pages - Screaming Frog count of Pages = # of Orphaned Pages" - to be perfectly accurate, this would only give me the number of orphaned pages that are indexed. There could be many additional orphaned pages that are not in Google's index.
My follow up question is, should I be concerned about those too? Or are orphaned pages that aren't indexed not worth cleaning up? I think I already know the answer (Yes! Clean those up too because they can interfere with crawl rate and site speed...)....but I want to know your take on it please. Thanks so much!
Dana
-
Tempting! Very tempting.:-)
-
I would not do this if I was an employee... but.... I would ask him to bet me an amount that would be equivalent to about "one month's pay" on the results.
He is a chicken so he wouldn't accept that bet. And if he did accept I would want it in writing.
-
Thanks EGOL. You made me chuckle, because all of these things crossed my mind. I did go home mad yesterday, and I don't get mad very easily or very often. I usually welcome the idea of explaining SEO strategies and tactics to newbies and laypeople (as is evidenced by my many posts here in Q & A).
Let's just say - my feelers are out looking at other possibilities.
-
In my opinion, the links are still evaporating pagerank.
If some of these pages are still in the index they could be counting as thin/duplicate content.
-
What would your response be to that?
- thinks for a while *
I would be mad about this. This is why I prefer to be self-employed.
I don't know the temperament or personality of this person.
I might not be working there much longer.
It seems to me that the effort required to cut links into these pages is tiny and the potential for gain is pretty high.
Downside risk is zero. Upside opportunity is good. He is a chicken and a fool.
-
EGOL, I thought I would just follow up on these thin content "Reviews/Ratings" pages. They are blocked from Google crawling them via the robots.txt file. Is this enough? Or are they still diluting the product page's authority just by being there?
Thanks!
Dana
-
Thanks EGOL,
And yes, they are.
The comment I received when trying to explain that those links were draining authority off the product pages was "No they aren't. Whatever PageRank the product page has, it has, regardless of whether the links are there or not."
What would your response be to that? I tried to explain it several different ways, but he just looked at me like I was full of malarkey...He is a visual person. Perhaps I should try a diagram?
It's difficult going into a situation like this when the opening premise in the other person's mind is that he knows more about SEO than I do, because all SEO is in his mind is a bunch of guesswork.
Sorry, moral's a bit low in my heart at the moment. I work too hard and study too hard at what I do to have someone who maybe read's a blog about SEO occasionally to come in and treat me like I have no idea what I'm talking about.
Thanks very much for responding. I appreciate it mucho!
Dana
-
Thanks Jason,
These are great suggestions and are exactly the kinds of things that will give me the proof I need to convince him that removing these is a worthwhile endeavor. I'm off to do them now and will come back here and post my discoveries.
Dana
-
Are these those thin content, duplicate content, review and email pages?
There are links into those pages that are evaporating pagerank.
Two links on each of your product pages are being wasted.
If they are getting indexed then they are dead weight on your site and make your site look like a skimpy spammy publisher.
-
By "orphaned" do you mean pages that are no longer linked to your site navigation taxonomy?
If you know the subject matter and/or URLs, you can easy show your boss that they are indexed: Google "site:oursite.com orphaned topic" and show him all the pages in the google index.
If you can't find the pages, then do a complete crawl of your site with Screaming Frog and see how many pages it finds. Now compare that number with how many pages Google has in your index in Google Webmaster Tools (under Health -> Index Status). Google Count of Pages - Screaming Frog count of Pages = # of Orphaned Pages.
Now to see if those pages are hurting you, run them through Open Site Explorer to see if any of them have backlinks. If so, they are diluting your SEO efforts. Even if not, look at your crawl stats in Google Webmaster tools under Health and see how many pages you're getting crawled per day. If it's a fraction of your total pages, then if you got rid of the orphaned pages, you could be getting your important pages crawled more regularly.
I hope that helps.
Jason "Retailgeek" Goldberg
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can ht access file affect page load times
We have a large and old site. As we've transition from one CMS to another, there's been a need for create 301 redirects using our ht access file. I'm not a technical SEO person, but concerned that the size of our ht access file might be contributing source for long page download times. Can large ht access files cause slow page load times? Or is the coding of the 301 redirect a cause for slow page downloads? Thanks
Technical SEO | | ahw1 -
A few pages deindexed from Google .. PLEASE HELP!
My client has a fairly new site and we were agressively building content to the website. It is an ecommerce store and we have got a blog as well. We guest blogged in a few places and wrote 3-5 articles a day. Last few days, i noticed 3-4 pages that we were building links to got deindexed. What could be the reason? We weren't using any bots to build links, only a couple of it around 5-10 links to a page. Google WMT is not showing any messages and no manual action is seen. What could be the reason? I've submitted those URL for reindex and so far nothing seems to work. Any idea? Please help.
Technical SEO | | WayneRooney0 -
Need help with home page on site
Hello! Thanks for reading in advance! I've got a relatively old site (12 year old domain) that has experienced a drop in rankings specifically for our home page. Some of the key terms that I'd assume we would rank well for are: "expedite us passport" According to SEOMOZ, our on page optimization receives a C for the termr. also, the root domain and page have decent links, etc. However; looking at Google (logged out and in incognito mode in chrome), a page on our site http://www.passportsandvisas.com/passport/index.asp ranks well and our HOME page isn't listed in the top 50 or 100. THis is the case for a lot of keywords we used to rank well for. I would have thought our home page would have at least outranked an internal page. Any thoughts would be very, very helpful!
Technical SEO | | santiago230 -
On-Page SEO of the SEOmoz Blog Section
Hey Everyone My brain isn't working (only had 1 cup of coffee so far - #2 on it's way) this morning and I could use some help. We're creating a blog on a site for a client of ours and I've been looking at the SEOmoz blog for best practices when it comes to the implementation of pagination, canonical tags and noindex. My questions: There is no use of the canonical tag on the main blog page or any of the paginated pages but it is being used on blog post pages. Why not use it on the main blog pages as well? I'm assuming because the blog pages are always changing with different content there is not much point? Paginated pages in the category sections i.e. http://www.seomoz.org/blog/category/1?page=2 are noindexed but paginated pages in the main blog section i.e. http://www.seomoz.org/blog?page=2 are not. Is this because of a duplicate content concern since the posts in the category sections are in the main blog section as well? If that's the case, why wouldn't the main category page i.e.http://www.seomoz.org/blog/category/1 be noindexed as well? What's the reason for noindexing the "Show # Posts" pages i.e.http://www.seomoz.org/blog?show=5 ? I'm assuming another concern of duplicate content? Any insights into these questions would be greatly appreciated and would help with the implementation of our clients blog. Thanks, Ken
Technical SEO | | noBulMedia0 -
Can you use aggregate review rich snippets on non-product pages?
It seems like the intended purpose of the aggregate review rich snippet is for an individual product page like a page for Madden 2013. However, what if you created a single page for all football video games that you sell and put reviews on this page for different games in this category. Could you still use the aggregate review markup for this page?
Technical SEO | | ProjectLabs0 -
SEO Yoast Help Needed
Anyone familar with SEO Yoast and interested in being hired to check out my settings for SEO. Thinking about 30 minute screen sharing session an helping me figure out what I am am doing wrong? Just cleaned up duplicates because of tags and now I see the images are getting duplicated as well as some of the titles. So new to Wordpress here I shine. Message me if you can help. Much Appreciated!!
Technical SEO | | Force70 -
Seperate Pages for similar keywords from SEO standpoint
Should I create separate pages with unique URLS for very similar keywords. If the answer is yes - how do i ensure uniqueness of content? For eg. Lets say the keywords in question are:- send money to china transfer money to china money transfer to china online money transfer to china. Thanks.
Technical SEO | | himanshupatil0 -
Very, very confusing behaviour with 301s. Help needed!
Hi SEOMoz gang! Been a long timer reader and hangerouter here but now i need to pick your brains. I've been working on two websites in the last few days which are showing very strange behaviour with 301 redirects. Site A This site is an ecommerce stie stocking over 900 products and 000's of motor parts. The old site was turned off in Feb 2011 when we built them a new one. The old site had terrible problems with canonical URLs where every search could/would generate a unique ID e.g. domain.com/results.aspx?product=1234. When you have 000's of products and Google can find them it is a big problem. Or was. We launche the new site and 301'd all of the old results pages over to the new product pages and deleted the old results.aspx. The results.aspx page didn't index or get shown for months. Then about two months again we found some certain conditions which would mean we wouldn't get the right 301 working so had to put the results.aspx page back in place. If it found the product, it 301'd, if it didn't it redirected to the sitemap.aspx page. We found recently that some bizarre scenerio actually caused the results.aspx page to 200 rather than 301 or 404. Problem. We found this last week after our 404 count in GWMT went up to nearly 90k. This was still odd as the results.aspx format was of the OLD site rather than the new. The old URLs should have been forgetten about after several months but started appearing again! When we saw the 404 count get so high last week, we decided to take severe action and 301 everything which hit the results.aspx page to the home page. No problem we thought. When we got into the office on Monday, most of our product pages had been dropped from the top 20 placing they had (there were nearly 400 rankings lost) and on some phrases the old results.aspx pages started to show up in there place!! Can anyone think why old pages, some of which have been 301'd over to new pages for nearly 6 months would start to rank? Even when the page didn't exist for several months? Surely if they are 301's then after a while they should start to get lost in the index? Site B This site moved domain a few weeks ago. Traffic has been lost on some phrases but this was mainly due to old blog articles not being carried forward (what i'll call noisy traffic which was picked up by accident and had bad on page stats). No major loss in traffic on this one but again bizarre errors in GWMT. This time pages which haven't been in existence for several YEARS are showing up as 404s in GWMT. The only place they are still noted anywhere is in the redirect table on our old site. The new site went live and all of the pages which were in Googles index and in OpenSiteExplorer were handled in a new 301 table. The old 301s we thought we didn't need to worry about as they had been going from old page to new page for several years and we assumed the old page had delisted. We couldn't see it anywhere in any index. So... my question here is why would some old pages which have been 301'ing for years now show up as 404s on my new domain? I've been doing SEO on and off for seven years so think i know most things about how google works but this is baffling. It seems that two different sites have failed to prevent old pages from cropping up which were 301d for either months or years. Does anyone has any thoughts as to why this might the case. Thanks in advance. Andy Adido
Technical SEO | | Adido-1053990