Google crawl index issue with our website...
-
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server.
In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder.
An example:
Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp
The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp
Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky.
We called our hosting company who said the problem isn't coming from them...
Has anyone had something like this happen to them?
Thanks so much for your insight!
Megan -
Hi Keri. Thanks for following up. This turned out to be an issue with an auto-generated breadcrumbs script. I don't know what the intricacies of that were but we were able to remove it and get this issue straightened out.
Thanks again!
Megan
-
Hi Megan,
I'm following up on older questions that are marked unanswered. Did you ever get this figured out?
-
Megan ,
Please check with your hosting company,
about this code to be included in htaccess
ErrorDocument 404 /404.shtml
/404.shtml its your 404 page
-
Thanks for your help on this Wissam. Is this something that we need to have the hosting company set-up on the server to ensure that these pages get returned as 404s?
-
Megan,
See here
http://markup.io/v/fyd9w4w9wmjr
Googlebot when It crawls this page, you remote server is telling Google Bot that its a Live page and this page Exists
The solution to the upper problem, might help you in fixing the actual problem.
If the Pages with the mystery folder Does not Exist .. your remote server should show google bot a 404 not found (http header).
-
Are we talking about one problem or two?
http://www.burlingtonmortgage.biz/contact.htm does not exist on the remote server (as it was removed over a year ago). I see that there are similar errors for other old pages which were also previously removed. Should we have redirected those to the 404 page since there are not related pages on the existing site?
I am not sure if the two problems have anything to do with one another. The pages with the "mystery folders" are existing pages. They just exist in the root. Why would google be looking at them as if they are inside sub folder?
-
Megan,
noticed something also for example this page http://www.burlingtonmortgage.biz/contact.htm . its showing a 404 error from title and content ... but the HTTP header is showing 200 ok. u need to fix that.
and would assume maybe thats why google started indexing weird URLs generating from your site... and if its true is a 404 page ..google is not picking it up because its showing its a Live page (200ok)
-
We use Dreamweaver.
-
Which CMS are you using?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can we re rank our Penalyzed website in Google?
Hello This is Maqbul, from India. I have a jobs portal blog [ bharatrecruit.com]. It was getting around 50K to 100K Views a Day and made me $100 a day. But after a few months, my competitor made negative SEO with 12,000 Spammy backlinks. Suddenly my site was hit by Google and now it is getting 200 to 300 Pageviews a day. So the question is I did not disavow bad links for a long time like 3 to 4 months. Now I disavow all the bad links but the website is not ranking. Can we re-rank this site or create another website. Please reply must. None of the bloggers can answer this. Thanks, Regards Maqbul
Technical SEO | | vinaso960 -
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Google crawling but not indexing for no apparent reason
Client's site went secure about two months ago and chose root domain as rel canonical (so site redirects to https://rootdomain.com (no "www"). Client is seeing the site recognized and indexed by Google about every 3-5 days and then not indexed until they request a "Fetch". They've been going through this annoying process for about 3 weeks now. Not sure if it's a server issue or a domain issue. They've done work to enhance .htaccess (i.e., the redirects) and robots.txt. If you've encountered this issue and have a recommendation or have a tech site or person resource to recommend, please let me know. Google search engine results are respectable. One option would be to do nothing but then would SERPs start to fall without requesting a new Fetch? Thanks in advance, Alan
Technical SEO | | alankoen1230 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Moz Crawl Diagnostic shows lots of duplicate content issues
Hi my client's website uses URL with www and without www. In page/title both website shows up. The one with www has page authority of 51 and the one without 45. In Moz diagnostic I can see that the website shows over 200 duplicate content which are not found in , e.g. Webmaster. When I check each page and add/remove www then the website shows the same content for both www and no www. It is not redirect - in search tab it actually shows www and then if you use no www it doesn't show www. Is the www issue to blame? or could it be something else? and what do I do since both www URL and no-www URL have high authority, just set up redirect from lower authority URL to higher authority URL?
Technical SEO | | GardenPet0 -
Website not crawled
i added website www.nsale.in in add campaign, it shows only 1 page crawled. but its working fine for other sites, any idea why it failed ?
Technical SEO | | Dhinesh0 -
We changed the URL structure 10 weeks ago and Google hasn't indexed it yet...
We recently modified the whole URL structure on our website, which resulted in huge amount of 404 pages changing them to nice human readable urls. We did this in the middle of March - about 10 weeks ago... We used to have around 5000 404 pages in the beginning, but this number is decreasing slowly. (We have around 3000 now). On some parts of the website we have also set up a 301 redirect from the old URLs to the new ones, to avoid showing a 404 page thus making the “indexing transmission”, but it doesn’t seem to have made any difference. We've lost a significant amount of traffic, because of the URL changes, as Google removed the old URLs, but hasn’t indexed our new URLs yet. Is there anything else we can do to get our website indexed with the new URL structure quicker? It might also be useful to know that we are a page rank 4 and have over 30,000 unique users a month so I am sure Google often comes to the site quite often and pages we have made since then that only have the new url structure are indexed within hours sometimes they appear in search the next day!
Technical SEO | | jack860