Why is site not being indexed by Google, and not showing on a crawl test??
-
On a site we developed of which .com is forwarded to .net domain, we quit getting crawled by google on about the 20th of Feb. Now when we try to run a crawl test on either url, we get There was an error fetching this page. Error description For some reason the page returned did not describe itself as an html page. It could be possible that the url is serving an image, rss feed, pdf, or xml file of some sort. The crawl tool does not currently report metrics on this type of data. Our other sites are fine and this was up to this date. We took out noodp, noydir today as the only thing we could think of. Site is on WP cms.
-
Site last cached 2nd March
Your site is indexed.
Header's returning 200 codes.
Site can be crawled fine, Xenu finds about 27 pages.
Lynxviewer gets through the page alright.
Only thing I can think of is that robots.txt looks needlessly complicated but should be alright, I would consider stripping it all out and re-running the test, if you get the same error then it's not that, if it is then narrow down what it could be.
If no joy, let me know and I'll have another look.
-
The site is www.innerloophomesreport.net, .com. Thanks.
-
Probably going to need the URL on this one.
I presume you can access the site as a user? What's in your robots.txt file? You using the SEOmoz tools?
-
Hi Robert Fisher,
This problem probably come from the headers of the file and not from the content itself. You might want to look at the headers returned by your URL using one of the following tools :
http://www.seoconsultants.com/tools/headers
http://www.rexswain.com/httpview.html
http://web-sniffer.net/
http://www.g-force.ca/referencement/entetesWhen you got the headers, I suggest you post it here so we can look into it.
Best regards,
Guillaume Voyer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm doing a crawl analysis for a website and finding all these duplicate URLs with "null" being added to them and have no clue what could be causing this.
Does anyone know what could be causing this? Our dev team thinks it's caused by mobile pages they created a while ago but it is adding 1000's of additional URLs to the crawl report and being indexed by Google. They don't see it as a priority but I believe these could be very harmful to our site. examples from URL string:
Web Design | | julianne.amann
uruguay-argentina-chilenullnull/days
rainforests-volcanoes-wildlifenullnull/reviews
of-eastern-europenullnullnullnull/hotels0 -
How to find out that none of the images on my site violates copyrights? Is there any tool that can do this without having to check manually image by image?
We plan to add several thousand images to our site and we outsourced the image search to some freelancers who had instructions to just use royalty free pictures. Is there any easy and quick way to check that in fact none of these images violates copyrights without having to check image by image? In case there are violations we are unaware of, do you think we need to be concerned about a risk of receiving Takedown Notices (DMCA) before owner giving us notification for giving us opportunity to remove the photo?
Web Design | | lcourse1 -
My news site not showing in "In the news" list on Google Web Search
I got a news website (www.tapscape.com) which is 6 years old and has been on Google News since 2012. However, whenever I publish a news article, it never shows up "In the news" list on Google Web Search. I have already added the schema.org/NewsArticle on the website and have checked it if it's working or not on Google structured data testing tool. I see everything shows on on the structured data testing tool. The site already has a news sitemap (http://www.tapscape.com/news-sitemap.xml) and has been added to Google webmaster tools. News articles show perfectly fine in the News tab, but why isn't the articles being shown on "In the news" list on the Google web search? My site has a strong backlink background already, so I don't think I need to work on the backlinks. Please let me know what I'm doing wrong, and how can I get it to the news articles on "In the news" list. Below is a screenshot that I have attached to this question to help you understand what I mean to say. 1qoArRs
Web Design | | hakhan2010 -
Should i not use hyphens in web page titles? Google Penalty for hyphens?
all the page titles in my site have hyphens between the words like this: http://texas.com/texas-plumbers.html I have seen tests where hyphenated domain names ranked lower than non hyphenated domain names. Does this mean my pages are being penalized for hyphens or is this only in the domain that it is penalized? If I create new pages should I not use hyphens in the page titles when there are two or more words in the title? If I changed all my page titles to eliminate the hyphens, I would lose all my rankings correct? My site is 12 years old and if I changed all these titles I'm guessing that each page would be thrown in the google sandbox for several months, is this true? Thanks mozzers!
Web Design | | Ron100 -
Are URL suffixes ignored by Google? Or is this duplicate content?
Example URLs: www.example.com/great-article-on-dog-hygiene.html www.example.com/great-article-on-dog-hygiene.rt-article.html My IT dept. tells me the second instance of this article would be ignored by Google, but I've found a couple of instances in which Google did index the 'rt-article.html' version of the page. To be fair, I've only found a couple out of MANY. Is it an issue? Thanks, Trisha
Web Design | | lzhao0 -
404 page not found after site migration
Hi, A question from our developer. We have an issue in Google Webmaster Tools. A few months ago we killed off one of our e-commerce sites and set up another to replace it. The new site uses different software on a different domain. I set up a mass 301 redirect that would redirect any URLs to the new domain, so domain-one.com/product would redirect to domain-two.com/product. As it turns out, the new site doesn’t use the same URLs for products as the old one did, so I deleted the mass 301 redirect. We’re getting a lot of URLs showing up as 404 not found in Webmaster tools. These URLs used to exist on the old site and be linked to from the old sitemap. Even URLs that are showing up as 404 recently say that they are linked to in the old sitemap. The old sitemap no longer exists and has been returning a 404 error for some time now. Normally I would set up 301 redirects for each one and mark them as fixed, but there are almost quarter of a million URLs that are returning 404 errors, and rising. I’m sure there are some genuine problems that need sorting out in that list, but I just can’t see them under the mass of errors for pages that have been redirected from the old site. Because of this, I’m reluctant to set up a robots file that disallows all of the 404 URLs. The old site is no longer in the index. Searching google for site:domain-one.com returns no results. Ideally, I’d like anything that was linked from the old sitemap to be removed from webmaster tools and for Google to stop attempting to crawl those pages. Thanks in advance.
Web Design | | PASSLtd0 -
Duplicate Content Problem on Our Site?
Hi, Having read the SEOMOZ guide and already worried about this previously, I have decided to look further into this. Our site is 4-5 years old, poorly built by a rouge firm so we have to stick with what we have for now. Were I think we might be getting punished is duplicate content across various pages. We have a Brands page, link at top of page. Here we are meant to enter each brand we stock and a little write up on that brands. What we then put in these write ups is used on each brands item page when we click a brand name on the left nav bar. Or when we click a Product Type (eg. Footwear) then click on a brand filter on the left. So this in theory is duplicate content. The SEO title and Meta Description for each brand is then used on the Brands Page and also on each page with the Brands Product on. As we have entered this brand info, you will notice that the page www.designerboutique-online.com/all-clothing/armani-jeans/ has the same brand description in the scroll box at the top as the page www.designerboutique-online.com/shirts/armani-jeans/ and all the other product type pages. The same SEO title and same Meta descriptions. Only the products change from each one. This then applies to each brand we have (at least 15) across about 8 pages. All with different URLs but the same text. Not sure how a 301 or rel: canonical would work for this, as each URL needs to point at specific pages (eg. shirts, shorts etc...). Some brands such as Creative Recreation and Cruyff only sell footwear, so technically I think??? We could 301 to the Footwear/ URL rather than having both all-clothing and footwear file paths? This surely must be down to the bad design? Could we be losing valulable rank and juice because of this issue? And how would I go about fixing it? I want a new site, but funds are tight. But if this issue is so big that only a new site would fix it, then maybe the money would need to come forward. What do people make of this? Cheers Will
Web Design | | YNWA0 -
Google search issue with exact domain
We had a site from Feb-2011 to Nov-2011 at the domain amcoexterminating.com. The site was pure HTML/CSS and the daily unique visitors steadily increased over that time. So all was fine. We then moved the site to a CMS (Joomla) on Dec. 6th. From that day forward, the daily visitors went into the tank. Before the move, if you typed "amcoexterminating.com" or "amco exterminating" into Google search, the site would be the first result (as you'd expect since those are the words that make up the actua domain). But we tried this yesterday and the site did not come up at all. NOT GOOD. It would work in Yahoo or Bing, but not in Google. So obviously, the problem with Google search directly affected the daily visitors. We just checked Webmaster tools yesterday (yes, this should have been done sooner, lesson learned) and it said "Site has severe health issues - Important page blocked by robots.txt". It listed the "important" page URL and it was just a link to an image. Regardless, I wiped out the Joomla created robots.txt file and added a new one and made it just say... User-agent: *Allow: / About 14 hours later, after the new robots.txt file was recognized by Google, the "severe health" message went away. However if I search in Google for "amcoexterminating.com", it still doesn't show up and the client is concerned (as they should be). Do you think the search engines just need more time to refresh? If so, once it refreshes, should the site show up first again right away? Or is it possible the robots.txt file had nothing to do with the issue? If so, what other things could I check into that might cause Google search to not find a site even if you search for exact domain name? Please share any and all things I should look into as I need to get this site showing in Google search again (as it was before moving to the CMS). Thanks!
Web Design | | MarathonMS0