Generating 404 Errors but the Pages Exist
-
Hey
I have recently come across an issue with several of a sites urls being seen as a 404 by bots such as Xenu, SEOMoz, Google Web Tools etc. The funny thing is, the pages exist and display fine.
This happens on many of the pages which use the Modx CMS, but the index is fine. The wordpress blog in /blog/ all works fine.
The only thing I can think of is that I have a conflict in the htaccess, but troubleshooting this is difficult, any tool I have found online seem useless.
Have tried to rollback to previous versions but still does not work.
Anyone had any experience of similar issues?
Many thanks
K.
-
FYI, we finally found our error. The short URL turned out to be the same name as the folder (photo-gallery) so once this was changed, wordpress was able to access the correct path. A bit of custom javascript had to be amended as well, but that was limited to our custom code. Using your web-sniffer.net link we were able to test immediately and fix it fairly quickly. Thank you for your help!
-
That's true Ryan I guess it is coding related really.
Issues like this are a real pain in the ass. And most people don't even check WMT to realise the issues exist. TBH, I don't check as often as I should.
-
I agree with you Paul.
As you pointed out one possible cause is a CMS-related issue which I would refer to as "coding" meaning something in the code which was used to present the website. Perhaps there is a better way to phrase it but nothing comes to mind at the moment.
Another possibility you mentioned is Litespeed which would be a server-side issue directly. Either way, it is a legitimate issue which should be addressed.
-
FWIW, I don't think it's a coding issue. If it were coding, it would either show a 200OK or it would show a 404. It wouldn't sometimes serve a 404.
If you're using Litespeed, I'd guarantee that is the issue and if you're using Joomla, it's another prime culprit.
-
Please keep in mind, that 404 error does not mean the page doesn't exist. It means your server, is sending a response code to indicate that it doesn't exist.
When I installed Litespeed on my server, this issue happened over and over again.
I believe Joomla for example, has some kind of security module that serves a 404 if a single IP requests a page too many times. I remember running SEOFrog on a friends Joomla site and tons of 404's were showing up.
-
Dev team are looking into it, must be quite a complex htaccess issue. Will get to the bottom of it this week and post any findings.
-
Thanks Ryan! I will get it looked at...Sue
-
@DentalID, the same reply I offered to Guy applies for you as well. This is an SEO issue which does need to be fixed. Something on your end is causing the page to show with a 403 response code. You really need a programmer to get in there and determine the root cause of the issue. You could try asking your web host if you have managed hosting, but this level of assistance would normally be outside the support of managed hosting.
-
Guy,
In looking at the page this appears to be a legitimate problem. Your server settings allow you to present a page with any header code you wish. You can 301 a page but still present the page with a 200 code if you want. Presently it appears the page is being presented fine but your server is offering a 404 header code.
I can't tell the actual source of the problem other then to say it appears to be on your end and should be fixed. I originally looked at the code with the MOZbar but then checked independently with another tool as well. http://web-sniffer.net/
All tools show a 404 header code for the page. This response code is generated by your web server.
-
We are having a similar problem with this URL: http://dentalimplantsportland.com/photo-gallery/ and also the following locations:
http://cosmeticdentistportland.net/photo-gallery/
http://dentalveneersportland.com/photo-gallery/
SEO Moz and Google webmaster tools show it as a 403 error but the pages display fine. I am not able to tell if this is really a problem for SEO or if we should reconstruct this gallery system and would really love your input.
This is Wordpress with a Spry gallery...
Thanks so much!
-
It is just a small affiliate site I am looking at - this page creates a 404.
http://www.insure-uk.com/post-office-car-insurance.html
Currently testing on some beta servers. Hopefully should fix soon as otherwise it will lose indexation.
-
I also see this now and again, but next crawl they fix themselfs. i assume robots can not always reach page for a number of reasons
-
Can you offer an example of a URL which is causing this problem?
-
I have had the same issues, I think it is often the bot's problem
Just to be certain check your links are correct and manually test them. Also ensure your sitemap is up to date and that you are not blocking the crawlers with metarobots, robots.txt, or some weird stuff in htaccess.
I have found that renaming pages or moving them will often cause 404 issues with crawlers
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will google be able to crawl all of the pages given that the pages displayed or the info on a page varies according to the city of a user?
So the website I am working for asks for a location before displaying the product pages. There are two cities with multiple warehouses. Based on the users' location, the product pages available in the warehouse serving only in that area are shown. If the user skips location, default warehouse-related product pages are shown. The APIs are all location-based.
Intermediate & Advanced SEO | | Airlift0 -
Paginated Pages Page Depth
Hi Everyone, I was wondering how Google counts the page depth on paginated pages. DeepCrawl is showing our primary pages as being 6+ levels deep, but without the blog or with an infinite scroll on the /blog/ page, I believe it would be only 2 or 3 levels deep. Using Moz's blog as an example, is https://moz.com/blog?page=2 treated to be on the same level in terms of page depth as https://moz.com/blog? If so is it the https://site.comcom/blog" /> and https://site.com/blog?page=3" /> code that helps Google recognize this? Or does Google treat the page depth the same way that DeepCrawl is showing it with the blog posts on page 2 being +1 in page depth compared to the ones on page 1, for example? Thanks, Andy
Intermediate & Advanced SEO | | AndyRSB0 -
VisitSweden indexing error
Hi all Just got a new site up about weekend travel for VisitSweden, the official tourism office of Sweden. Everything went just fine except som issues with indexing. The site can be found here at weekend.visitsweden.com/no/ For some weird reason the "frontpage" of the site does not get indexed. What I have done myself to find the issue: Added sitemaps.xml Configured and added site to webmaster tools Checked 301s so they are not faulty By doing a simple site:weekend.visitsweden.com/no/ you can see that the frontpage is simple not in the index. Also by doing a cache:weekend.visitsweden.com/no/ I see that Google tries to index the page without the trailing /no/ for some reason. http://webcache.googleusercontent.com/search?q=cache:http://weekend.visitsweden.com/no/ Any smart ideas to get this fixed or where to start looking? All help greatly appreciated Kind regards Fredrik
Intermediate & Advanced SEO | | Resultify0 -
I have removed over 2000+ pages but Google still says i have 3000+ pages indexed
Good Afternoon, I run a office equipment website called top4office.co.uk. My predecessor decided that he would make an exact copy of the content on our existing site top4office.com and place it on the top4office.co.uk domain which included over 2k of thin pages. Since coming in i have hired a copywriter who has rewritten all the important content and I have removed over 2k pages of thin pages. I have set up 301's and blocked the thin pages using robots.txt and then used Google's removal tool to remove the pages from the index which was successfully done. But, although they were removed and can now longer be found in Google, when i use site:top4office.co.uk i still have over 3k of indexed pages (Originally i had 3700). Does anyone have any ideas why this is happening and more importantly how i can fix it? Our ranking on this site is woeful in comparison to what it was in 2011. I have a deadline and was wondering how quickly, in your opinion, do you think all these changes will impact my SERPs rankings? Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
A Landing Page Goldmine?
If anyone can take a minute to help me out with this, I'd really love to get some expert opinions. I can produce really strong content like a machine and, over the years, I've had tons of pages on my website that had links pointing to them (didn't know about SEO then) deleted and now I'm starting to dig them up. I have dozens with a moz rank higher than 25. My question is what do I do with these urls, should I rewrite them and get the innerlinking strength or should I do a 301 redirect to a similar page? Considering the incoming links and individual seomoz pr rank of these pages , am I sitting on something valuable?
Intermediate & Advanced SEO | | ksundheim10 -
Will Google Visit Non-Canonicalized Page Again and Return Its Page's Original Ranking?
I have 2 questions about canonicalization. 1. Will Google ever visit Page A again if after it has been canonicalized to Page B? 2. If Google will still visit Page A and found that it is not canonicalizing to Page B already, will the original rankings and traffic of Page A returned to the way before it's canonicalized? Thanks.
Intermediate & Advanced SEO | | globalsources.com0 -
Do in page links pointing to the parent page make the page more relevant for that term?
Here's a technical question. Suppose I have a page relevant to the term "Mobile Phones". I have a piece of text, on that page talking about "mobile phones", and within that text is the term "cell phones". Now if I link the text "cell phones", to the page it is already placed on (ie the parent page) - will the page gain more relevancy for the term "cell phones"?? Thanks
Intermediate & Advanced SEO | | James770