Why isn't our new site being indexed?
-
We built a new website for a client recently.
Site: https://www.woofadvisor.com/
It's been live for three weeks. Robots.txt isn't blocking Googlebot or anything.
Submitted a sitemap.xml through Webmasters but we still aren't being indexed.
Anyone have any ideas?
-
Hey Dirk,
No worries - I visited the question first time today and considered it unanswered as the site is perfectly accessible in California. I like to confirm what Search Console says as that is 'straight from the horses mouth'.
Thanks for confirming that the IP redirect has changed, that is interesting. It is impossible for us to know when that happened - I would have expected thing to get indexed quite fast when it changed.
With the extra info I'm happy to mark this as answered, but would be good to hear from the OP.
Best,
-Tom
-
Hi Tom,
I am not questioning your knowledge - I re-ran the test on webpagetest.org and I see that the site is now accessible for Californian ip (http://www.webpagetest.org/result/150911_6V_14J6/) which wasn't the case a few days ago (check the result on http://www.webpagetest.org/result/150907_G1_TE9/) - so there has been a change on the ip redirection. I also checked from Belgium - the site is now also accessible from here.
I also notice that if I now do a site:woofadvisor.com in Google I get 19 pages indexed rather than 2 I got a few days ago.
Apparently removing the ip redirection solved (or is solving) the indexation issue - but still this question remains marked as "unanswered"
rgds,
Dirk
-
I am in California right now, and can access the website just fine, which is why I didn't mark the question as answered - I don't think we have enough info yet. I think the 'fetch as googlebot' will help us resolve that.
You are correct that if there is no robots.txt then Google assumes the site is open, but my concern is that the developers on the team say that there IS a robots.txt file there and it has some contents. I have, on at least two occasions, come across a team that was serving a robots.txt that was only accessible to search bots (once they were doing that 'for security', another time because they mis-understood how it worked). That is why I suggested that Search Console is checked to see what shows up for robots.txt.
-
To be very honest - I am quite surprised that this question is still marked as "Unanswered".
The owners of the site decided to block access for all non UK / Ireland adresses. The main Googlebot is using a Californian ip address to visit the site. Hence - the only page Googlebot can see is https://www.woofadvisor.com/holding-page.php which has no links to the other parts of the site (this is confirmed by the webpagetest.org test with Californian ip address)
As Google indicates - Googlebot can also use other IP adresses to crawl the site ("With geo-distributed crawling, Googlebot can now use IP addresses that appear to come from other countries, such as Australia.") - however it's is very likely that these bots do not crawl with the same frequency/depth as the main bot (the article clearly indicates " Google might not crawl, index, or rank all of your locale-adaptive content. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA).
This can easily be solved by adding a link on /holding-page.php to the Irish/UK version which contains the full content (accessible for all ip adresses) which can be followed to index the full site (so - only put the ip detection on the homepage - not on the other pages)
The fact that the robots.txt gives a 404 is not relevant: if no robots.txt is found Google assumes that the site can be indexed (check this link) - quote: "You only need a
robots.txt
file if your site includes content that you don't want Google or other search engines to index." -
I'd be concerned about the 404ing robots.txt file.
You should check in Search Console:
-
What does Search Console show in the robots.txt section?
-
What happens if you fetch a page that is no indexed (e.g. https://www.woofadvisor.com/travel-tips.php) with the 'Fetch as Googlebot' tool?
I checked and do not see any obvious indicators of why the pages are not being indexed - we need more info.
-
-
I just did a quick check on your site with Webpagetest.org with California IP address http://www.webpagetest.org/result/150907_G1_TE9/ - as you can see here these IP's also go to the holding page - which is logically the only page which can be indexed as it's the only one Googlebot can access.
rgds,
Dirk
-
Hi,
I can't access your site in Belgium - I guess you are redirecting your users based on ip address. If , like me, they are not located in your target country they are 302 redirected to https://www.woofadvisor.com/holding-page.php and there is only 1 page that is indexed.
Not sure which country you are actually targeting - but could it be that you're accidentally redirecting Google bot as well?
Check also this article from Google on ip based targeting.
rgds
Dirk
-
Strangely, there are two pages indexed on Google Search.
The homepage and one other
-
I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.
Sometimes developers say this stuff. If you are getting a 404, demonstrate it to them.
-
I noticed the robots.txt file returned a 404 and asked the developers to take a look and they said the content of it is fine.
But yes, I'll doublecheck the WordPress settings now.
-
Your sitemap all looked good, but when I tried to view the robots.txt file in your root, it returned a 404 and so was unable to determine if there was an issue. Could any of your settings in your WordPress installation also be causing it to trip over.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Landing pages showing up as HTTPS when we haven't made the switch
Hi Moz Community, Recently our tech team has been taking steps to switch our site from http to https. The tech team has looked at all SEO redirect requirements and we're confident about this switch, we're not planning to roll anything out until a month from now. However, I recently noticed a few https versions of our landing pages showing up in search. We haven't pushed any changes out to production yet so this shouldn't be happening. Not all of the landing pages are https, only a select few and I can't see a pattern. This is messing up our GA and Search Console tracking since we haven't fully set up https tracking yet because we were not expecting some of these pages to change. HTTPS has always been supported on our site but never indexed so it's never shown up in the search results. I looked at our current site and it looks like landing page canonicals are already pointing to their https version, this may be the problem. Anyone have any other ideas?
Technical SEO | | znotes0 -
SERP result (URL) doesn't change after a 301
A couple of months ago there was a result in Google for our branded search term which wasn't the 'official' URL, actually the result shown in the SERP was www.mycompany-ip.nl. We've applied a 301 redirect of this URL to the 'official' URL which is a subdomain: department.mycompany.nl. From Google the redirect is obviously working, but up until now, I don't see Google replacing the incorrect URL by the correct URL. I am wondering what to do to make the result correct. André
Technical SEO | | ConclusionDigital0 -
Category URL Pagination where URLs don't change between pages
Hello, I am working on an e-commerce site where there are categories with multiple pages. In order to avoid pagination issues I was thinking of using rel=next and rel=prev and cannonical tags. I noticed a site where the URL doesn't change between pages, so whether you're on page 1,2, or 3 of the same category, the URL doesn't change. Would this be a cleaner way of dealing with pagination?
Technical SEO | | whiteonlySEO0 -
Google Indexing Development Site Despite Robots.txt Block
Hi, A development site that has been set-up has the following Robots.txt file: User-agent: * Disallow: / In an attempt to block Google indexing the site, however this isn't the case and the development site has since been indexed. Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | CarlWint0 -
Redirecting a old aged site to a new exact match site?
Hi All, I have a question. I have 2 sites with me in the same sector and want some help. site 1 is a old site started back in 2003 and has some amount of links to it and has a pr 3 with some good links to it but doesn't rank much for any keywords for the timing. site 2 is a aged domain but newly developed with unique content and has a good amount of exact match with a .com version. so will there be any benefit by redirecting site 1 to site 2 to get the seo benefits and a start for link bulding? or is it best to develop and work on each site? the sector is health insurance. Thanks
Technical SEO | | macky71 -
Site indexing and traffic increased so dramatically overnight
Number of indexed pages jumped from 39000 to 52000 and traffic increased around 50% in my site.Note: used "site" command to check the indexed pages. I understand this is approximate.In addition, number of crawled pages/day also increased dramatically.No change in the robots.txt, sitemap, crawl errors and duplicate issues. But server migrated to different IT infrastructure. Before any celebration, want to identify the helper. Thanks.
Technical SEO | | gmk15670 -
Multiple Domains, Same IP address, redirecting to preferred domain (301) -site is still indexed under wrong domains
Due to acquisitions over time and the merging of many microsites into one major site, we currently have 20+ TLD's pointing to the same IP address as our "preferred domain:" for our consolidated website http://goo.gl/gH33w. They are all set up as 301 redirects on apache - including both the www and non www versions. When we launched this consolidated website, (April 2010) we accidentally left the settings of our site open to accept any of our domains on the same IP. This was later fixed but unfortunately Google indexed our site under multiple of these URL's (ignoring the redirects) using the same content from our main website but swapping out the domain. We added some additional redirects on apache to redirect these individual pages pages indexed under the wrong domain to the same page under our main domain http://goo.gl/gH33w. This seemed to help resolve the issue and moved hundreds of pages off the index. However, in December of 2010 we made significant changes in our external dns for our ip addresses and now since December, we see pages indexed under these redirecting domains on the rise again. If you do a search query of : site:laboratoryid.com you will see a few hundred examples of pages indexed under the wrong domain. When you click on the link, it does redirect to the same page but under the preferred domain. So the redirect is working and has been confirmed as 301. But for some reason Google continues to crawl our site and index under this incorrect domains. Why is this? Is there a setting we are missing? These domain level and page level redirects should be decreasing the pages being indexed under the wrong domain but it appears it is doing the reverse. All of these old domains currently point to our production IP address where are preferred domain is also pointing. Could this be the issue? None of the pages indexed today are from the old version of these sites. They only seem to be the new content from the new site but not under the preferred domain. Any insight would be much appreciated because we have tried many things without success to get this resolved.
Technical SEO | | sboelter0 -
I have a penalized site and don't know what the cause is
I have a site which appears to have a Google indexation penalty. According to Google because its violating the T/Cs. Here are some background details about the site: The site is a online poker + deposit methods related site on a .co.uk TLD. It has 30+ uniquely written pages, and no advertising at the moment. In June of 2010, June 10 to be precisely, I bought this site from a fellow webmaster/affiliate. After the site 's ownership changed I tried accessing the server, but I couldn't log into it . I noticed that this host had serious problems and the IP was unreachable. After trying for some time the previous owner got me all the content in Word files and I created a new hosting account and re-launched the site on June 28. Between a couple of days after June 10 and June 28, the site was unreachable, and completely de-indexed from Google. When I re-launched the site, I used the default Wordpress Template Twenty Ten, and created new pages with the Word files I received from the previous owner. I waited a bit, but noticed the site didn't get re-indexed. So on August 18th I moved the content of domain xxx.com to yyy.co.uk/xxx/ and 301-ed all the former locations, hoping that this might help yyy.co.uk get indexed..... but nothing. On October 28 of 2010 I submitted my first reconsideration request, which was processed on November 17th without any change. At that time Google didn't say if anything was wrong like now, so I just waited... and waited... and waited some more. At some point I was ready to let this one go, as I didn't/don't see any problems with it. In fact, it used to be indexed before. By now, I removed all links pointing to it that I had control off, and there are hardly any left over. The site as well doesn't have any outgoing links left, so that can't be it either. I also removed a kind-a duplicate keyword heavy menu from the sidebar, as well as the widgets from the footer. Finally I also fixed a problem caused by Yoast Wordpress SEO Plugin, but I only installed this plugin recently, so that could not be the problem that caused the penalty. So after another reconsideration request Google again let me know this site still has issues, but I really have no clue which, or how to find out. I don't feel like doing any work on this site, as there is no guarantee that it will ever lose its penalty. What should I do now?
Technical SEO | | VisualSense0