Why is my Crawl Report Showing Thousands of Pages that Do Not Exist?
-
Hi,
I just downloaded a Crawl Summary Report for a client's website. I am seeing THOUSANDS of duplicate page content errors. The overwhelming majority of them look something like this:
This page doesn't exist and results in a 404 page. Why are these pages showing up? How do I get rid of them? Are they endangering the health of my site as a whole?
Thank you,
Jenna
<colgroup><col width="1051"></colgroup>
| | -
Hi Jenna,
It's not so much the fact you have 404 pages that is the problem for SEO, but rather the fact your site is creating a problem for the search engines to crawl the site correctly and efficiently since they are getting caught in an endless loop. This can be a problem because the crawlers may get caught in the endless loop and just give up on your site and leave, which means the search engines may not be able to access the rest of the pages on your site and may have a negative impact on your rankings as a whole. One of the most important parts of SEO is to make your website as "friendly" to the search engines as possible so if they caught in endless loops then that is definitely not ideal. Hope that helps!
Patrick
-
Hi Streamline -
Thanks for your help thus far. Could you elaborate on some of the SEO challenges this presents? After a bit of research, I'm seeing people say that having hundreds or thousands of 404s are okay, if they are in fact non-existant pages. I'm not that well educated on this, so just looking for a bit of clarification.
I will look into the relative URL issue. I just recently took over the work on this site, and I'm still digging in to what the original web developer created.
Jenna
-
It looks like the crawler is being caught in an endless loop, most likely a result of using relative URLs somewhere on your site. Yes, this is a problem for the site as a whole so I highly recommend implementing absolute URLs throughout the entire site.
Edit - I just looked at your site and this is exactly what it is. The links in your navigation are relative, such as "<a <="" span="">href="</a>../development/default.aspx"" so just change it to absolute URLs such as http://www.yoursite.com/development/default.aspx and it should fix the problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why Would My Page Have a Higher PA and DA, Links & On-Page Grade & Still Not Rank?
The Search Term is "Alcohol Ink" and our client has a better page authority, domain authority, links to the page, and on-page grade than those in the SERP for spaces 5-10 and we're not even ranked in the top 51+ according to Moz's tracker. The only difference I can see is that our URL doesn't use the exact text like some of the 5-10 do. However, regardless of this, our on-page grade is significantly higher than the rest of them. The one thing I found was that there were two links to the page (that we never asked for) that had a spam score in the low 20's and another in the low 30's. Does anyone have any recommendations on how to maybe get around this? Certainly, a content campaign and linking campaign around this could also help but I'm kind of scratching my head. The client is reputable, with a solid domain age and well recognized in the space so it's not like it's a noob trying to get in out of nowhere.
Intermediate & Advanced SEO | | Omnisye0 -
When "pruning" old content, is it normal to see an drop in Domain Authority on Moz crawl report?
After reading several posts about the benefits of pruning old, irrelevant content, I went through a content audit exercise to kick off the year. The biggest category of changes so far has been to noindex + remove from sitemap a number of blog posts from 2015/2016 (which were very time-specific, i.e. software release details). I assigned many of the old posts a new canonical URL pointing to the parent category. I realize it'd be ideal to point to a more relevant/current blog post, but could this be where I've gone wrong? Another big change was to hide the old posts from the archive pages on the blog. Any advice/experience from anyone doing something similar much appreciated! Would be good to be reassured I'm on the right track and a slight drop is nothing to worry about. 🙂 If anyone is interested in having a look: https://vivaldi.com https://vivaldi.com/blog/snapshots [this is the category where changes have been made, primarily] https://vivaldi.com/blog/snapshots/keyboard-shortcut-editing/ [example of a pruned post]
Intermediate & Advanced SEO | | jonmc1 -
Home page showing some other website in cache
My website (www.kamagrauk.com) is showing www.likeyoursaytoday.com in google cache website domain further redirect to http://kamagrauknow.com/ problem :1) info:kamagrauk.com shows www.likeyoursaytoday.com2) cache:kamagrauk.com shows www.likeyoursaytoday.comwww.likeyoursaytoday.com copied content from kamagraoraljelly.mei already checked done1) changed website hosting (New Virtual private server)2) Uploaded Fresh backup of website 3) Checked header response ( DNS perfect)4) Checked language meta tag (no error)5) fetch function worked fine 6) try to remove url and readded 7) no error in sitemap8) SSL all Ok9) no crawl errorsnothing worked ....... trying to contact www.likeyoursaytoday.com but not responding backToday (23rd feb) www.likeyoursaytoday.com gone down but our cache been replaced http://www.bagnak.com/so it seems google not able to read our page but here i am attaching screen shoot which google sees everything okblocked%20resources.png cache.png crawlerror.png robots%20test.png
Intermediate & Advanced SEO | | Gauravbb1 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
New pages not ranking
I published some new landing pages about a month a go which are much better quality than previous pages and on an optimised URL. The old pages never ranked and the new pages aren't ranking either although they are much better. The old pages 301 redirect to the new pages. Any quick ways I can at least get them ranking? Not expecting Page 1 overnight but to at least see the new pages on Page 5 would be great!
Intermediate & Advanced SEO | | Marketing_Today0 -
After Server Migration - Crawling Gets slow and Dynamic Pages wherein Content changes are not getting Updated
Hello, I have just performed doing server migration 2 days back All's well with traffic moved to new servers But somehow - it seems that w.r.t previous host that on submitting a new article - it was getting indexed in minutes. Now even after submitting page for indexing - its taking bit of time in coming to Search Engines and some pages wherein content is daily updated - despite submitting for indexing - changes are not getting reflected Site name is - http://www.mycarhelpline.com Have checked in robots, meta tags, url structure - all remains well intact. No unknown errors reports through Google webmaster Could someone advise - is it normal - due to name server and ip address change and expect to correct it automatically or am i missing something Kindly advise in . Thanks
Intermediate & Advanced SEO | | Modi0 -
Incorrect cached page indexing in Google while correct page indexes intermittently
Hi, we are a South African insurance company. We have a page http://www.miway.co.za/midrivestyle which has a 301 redirect to http://www.miway.co.za/car-insurance. Problem is that the former page is ranking in the index rather than the latter. The latter page does index occasionally in the same position, but rarely. This is primarily for search phrases like "car insurance" and "car insurance quotes". The ranking was knocked down the index with Penquin 2.0. It was not ranking at all but we have managed to recover to 12/13. This abnormally has only been occurring since the recovery. The correct page does index for other search terms like "insurance for car". Your help would be appreciated, thanks!
Intermediate & Advanced SEO | | miway0