Page loads fine for users but returns a 404 for Google & Moz
-
I have an e-commerce website that is built using Wordpress and the WP E-commerce plug-in, the products have always worked fine and the pages when you view them in a browser work fine and people can purchase the products with no problems.
However in the Google merchant feed and in the Moz crawl diagnostics certain product pages are returning a 404 error message and I can't work out why, especially as the pages load fine in the browser.
I had a look at the page headers and can see when the page does load the initial request does return a 404 error message, then every other request goes through and loads fine. Can anyone help me as to why this is happening?
A link to the product I have been using to test is: http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/
Here is a part of the header dump that I did:
http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/
GET /organic-clothing/lounge-wear/organic-tunic-top/ HTTP/1.1
Host: earthkindoriginals.co.uk
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=159840937.1804930013.1369831087.1373619597.1373622660.4; __utmz=159840937.1369831087.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); wp-settings-1=imgsize%3Dmedium%26hidetb%3D1%26editor%3Dhtml%26urlbutton%3Dnone%26mfold%3Do%26align%3Dcenter%26ed_size%3D160%26libraryContent%3Dbrowse; wp-settings-time-1=1370438004; __utmb=159840937.3.10.1373622660; PHPSESSID=e6f3b379d54c1471a8c662bf52c24543; __utmc=159840937
Connection: keep-alive
HTTP/1.1 404 Not Found
Date: Fri, 12 Jul 2013 09:58:33 GMT
Server: Apache
X-Powered-By: PHP/5.2.17
X-Pingback: http://earthkindoriginals.co.uk/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 6653
Connection: close
Content-Type: text/html; charset=UTF-8 -
Thanks for the help guys, it is good to actually have a direction to look in now, I was just completely stuck before. I will post any updates I have.
-
Hello,
The status returned is 404 not found, this is independent of whether the page is loaded or not.
There is something that is generating that code either htaccess, some php code, maybe some redirection, a misconfigured rewrite, look for what can be, with that code, pages are not indexed.
Sorry for my english.
Best regards,
Carlos
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to handle images (lazy loading, compressing, caching...) to impact page load and thus SEO?
Hi all, I am looking for a conclusive answer on how to handle images on Wordpress websites. Most of the time we encounter the same problems regarding images. There are several options to make sure that images don't increase page load too much: Page caching and compressing: standard Lazy loading: helps decrease page load time, but Google might not crawl the images so not good for SEO. See this article on Googlebot scrolling. Correct image format (for example WebP): tried it several times and doesn't help much to decrease page load time. What is best practice? Are there standards or preferred options for the image dimensions and quality (max height, width, number of pixels, rectangular or square) before you upload it, also regarding responsiveness? Is it better to use .jpg, .png or WebP? To sum up, what should you do by default to handle images on websites so you can still have a good page speed even with loads of images? Thanks for your answers!
Intermediate & Advanced SEO | | Mat_C0 -
Paginated category pages still showing in Google
Despite our blog using rel=next and rel=”prev” we’re still finding paginated pages getting impressions in Google, suggesting they are taking up unnecessary crawl budget. An example is: https://www.theukdomain.uk/seo/page/2/ What steps would you recommend I take to most benefit my sites SEO? Thanks, Sam
Intermediate & Advanced SEO | | sjefferies0 -
Google only indexing the top 2/3 of my page?
HI, I have a page that is about 5000 lines of code total. I was having difficulty figuring out why the addition of a lot of targeted, quality content to the bottom of the pages was not helping with rankings. Then, when fetching as Google, I noticed that only about 3300 lines were getting indexed for some reason. So naturally, that content wasn't going to have any effect if Google in not seeing it. Has anyone seen this before? Thoughts on what may be happening? I'm not seeing any errors begin thrown by the page....and I'm not aware of a limit of lines of code Google will crawl. Pages load under 5 seconds so loading speed shouldn't be the issue. Thanks, Kevin
Intermediate & Advanced SEO | | yandl1 -
"No index" page still shows in search results and paginated pages shows page 2 in results
I have "no index, follow" on some pages, which I set 2 weeks ago. Today I see one of these pages showing in Google Search Results. I am using rel=next prev on pages, yet Page 2 of a string of pages showed up in results before Page 1. What could be the issue?
Intermediate & Advanced SEO | | khi50 -
I have removed over 2000+ pages but Google still says i have 3000+ pages indexed
Good Afternoon, I run a office equipment website called top4office.co.uk. My predecessor decided that he would make an exact copy of the content on our existing site top4office.com and place it on the top4office.co.uk domain which included over 2k of thin pages. Since coming in i have hired a copywriter who has rewritten all the important content and I have removed over 2k pages of thin pages. I have set up 301's and blocked the thin pages using robots.txt and then used Google's removal tool to remove the pages from the index which was successfully done. But, although they were removed and can now longer be found in Google, when i use site:top4office.co.uk i still have over 3k of indexed pages (Originally i had 3700). Does anyone have any ideas why this is happening and more importantly how i can fix it? Our ranking on this site is woeful in comparison to what it was in 2011. I have a deadline and was wondering how quickly, in your opinion, do you think all these changes will impact my SERPs rankings? Look forward to your responses!
Intermediate & Advanced SEO | | apogeecorp0 -
Google & Bing not indexing a Joomla Site properly....
Can someone explain the following to me please. The background: I launched a new website - new domain with no history. I added the domain to my Bing webmaster tools account, verified the domain and submitted the XML sitemap at the same time. I added the domain to my Google analytics account and link webmaster tools and verified the domain - I was NOT asked to submit the sitemap or anything. The site has only 10 pages. The situation: The site shows up in bing when I search using site:www.domain.com - Pages indexed:- 1 (the home page) The site shows up in google when I search using site:www.domain.com - Pages indexed:- 30 Please note Google found 30 pages - the sitemap and site only has 10 pages - I have found out due to the way the site has been built that there are "hidden" pages i.e. A page displaying half of a page as it is made up using element in Joomla. My questions:- 1. Why does Bing find 1 page and Google find 30 - surely Bing should at least find the 10 pages of the site as it has the sitemap? (I suspect I know the answer but I want other peoples input). 2. Why does Google find these hidden elements - Whats the best way to sort this - controllnig the htaccess or robots.txt OR have the programmer look into how Joomla works more to stop this happening. 3. Any Joomla experts out there had the same experience with "hidden" pages showing when you type site:www.domain.com into Google. I will look forward to your input! 🙂
Intermediate & Advanced SEO | | JohnW-UK0 -
404 in google webmaster tool
I have redesigned my website with new web address over 6 months ago and in the google webmaster tools it still shows my old urls with a reponse code 404 and still crawls those pages. How do I make sure they don't appear anymore in the webmaster tool and don't get crawled anymore ? or should I do a re-direct ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0