Why isn't google indexing our site?

Studio33

Hi,

We have majorly redesigned our site. Is is not a big site it is a SaaS site so has the typical structure, Landing, Features, Pricing, Sign Up, Contact Us etc...

The main part of the site is after login so out of google's reach.

Since the new release a month ago, google has indexed some pages, mainly the blog, which is brand new, it has reindexed a few of the original pages I am guessing this as if I click cached on a site: search it shows the new site.

All new pages (of which there are 2) are totally missed. One is HTTP and one HTTPS, does HTTPS make a difference.

I have submitted the site via webmaster tools and it says "URL and linked pages submitted to index" but a site: search doesn't bring all the pages?

What is going on here please? What are we missing? We just want google to recognise the old site has gone and ALL the new site is here ready and waiting for it.

Thanks

Andrew

MikeTek

Well, links/shares are good. But of course I'm just begging the question of how you can get those.

Rand gave a great talk called "Inbound Marketing for Startups" at a Hackers & Founders meetup that was focused more on Inbound as a whole than SEO in particular, but it's full of valuable insights: http://vimeo.com/39473593 [video]

Ultimately it'll come down to some kind of a publishing/promotional strategy for your startup. Sometimes your startup is so unique/interesting that it has its own marketing baked right in - in which case you can get a lot of traction by simply doing old-school PR to get your startup in front of the right people.

Other times, you've got to build up links/authority on the back of remarkable marketing.

BufferApp is a great example of a startup that built traction off their blog. Of course, they weren't necessarily blogging as an SEO play - it was more in the aim of getting directly in front of the right audience for direct signups for their product. But they definitely built up some domain authority as a result.

I'd also take a look at the guides Mailchimp has created - they created the dual benefit of getting in front of the right audience in a positive/helpful way (which benefits the brand and drives sign-ups directly) as well as building a considerable number of inbound links, boosting their domain authority overall.

Unfortunately no quick/easy ways to build your domain authority, but things you do to build your authority can also get you immediately in front of the audience you're looking for - and SEO just becomes a lateral benefit to that.

Studio33

Thank you all for your responses. It is strange. we are going to add a link to our g+ page and then add a post.

As a new site what is the best way to get our domain authority up so we get crailed quicker?

Thanks again

Andrew

MikeTek

I disagree. Unless the old pages have inbound links from external sites, there's not much reason to 301 them (and not much benefit). If they're serving up 404 errors, they will fall out of the index.

Google absolutely does have a way to know these new pages exist - by crawling the home page and following the links discovered there. Both of the pages in question are linked to prominently, particularly the Features page which is part of the main navigation. A sitemap is just an aid for this process - it can help move things along and help Google find otherwise obscure/deep pages, but it by no means is a necessity for getting prominent pages indexed, particularly pages that are 1-2 levels down from the home page.

ThompsonPaul

If you didn't redirect the old URLs to the new ones when the new site went live, this will absolutely be the cause of your problem, Studio33. That, combined with having no (or misdirected) sitemap means there was essentially no way for Google to even know your site's pages existed.

Good catch Billy.

MikeTek

Hi Andrew,

Google has been indexing HTTPS URLs for years now without a problem, so is unlikely to be part of the issue.
Your domain authority on the whole may be slowing Google down in indexing new pages. Bottom line is crawl rate and depth are both functions of how authoritative/important you appear based on links/shares/etc.
That said, I don't see any indication as to why these two particular pages are not being indexed by Google. I'm a bit stumped here.

I see some duplication between your Features page and your Facebook timeline, but not with the invoice page.

As above, your domain authority (17) is a bit on the low side. So this could simply be a matter of Google not dedicating enough resources to crawl/index all of your pages yet. But why these two pages would be the only ones is perplexing, particularly after a full month. There are no problems with your Robots.txt, no canonical tag issues, the pages are linked to properly.

Wish I had an easy answer here. One idea, a bit of a long shot: we've seen Google index pages faster when they're linked to from Google+ posts. I see you have a Google+ business page for this website - you might try simply writing a (public) post there that includes a link over to the Features page.

As weak as that is, that's all I've got.

Best of Luck,
Mike

Vizergy

OK - I would get a list of all of your old pages and start 301 redirecting them to your new pages asap. This could be part of your issue.

Studio33

Hi checked XML, its there if you view source it just doesn't have a stylesheet

Studio33

Hi thanks about 1 month. The blog page you are getting maybe the old ones,as they are working this end http://www.invoicestudio.com/Blog . What you have mentioned re the blog is part of the problem. Google has the old site and not the new.

Vizergy

Getting this on your Blog pages:

The page cannot be displayed because an internal server error has occurred.

where you aware?

Anyway - may I ask how old these pages are?

Studio33

Thanks. I will look into the sitemap. That only went live about an hour ago whilst this thread has been going on.

Vizergy

Yeah - with no path specified the directive is ignored. (you don't have a '/' so the directive (disallow) is ignored)

however, you do direct to your xml sitemap which appears to be empty. You might want to fix that....

Studio33

Hi no I think its fine as we do not have the forward slash after the disallow. See

http://www.robotstxt.org/robotstxt.html

I wish it was as simple as that. Thanks for your help though its appreciated.

Tunji

Hmmm. That link shows that the way you have it will block all robots.

Studio33

Thanks but I think Robots.txt is correct. Excert from http://www.robotstxt.org/robotstxt.html

To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:

(or just create an empty "/robots.txt" file, or don't use one at all)

Tunji

It looks like your robots.txt file is the problem. http://www.invoicestudio.com/robots.txt has:

User-agent: * 
Disallow:

When it should be:

User-agent: *
Allow: /

Studio33

Hi,

The specific pages are

https://www.invoicestudio.com/Secure/InvoiceTemplate

http://www.invoicestudio.com/Features

I'm not sure what other pages are not indexed.

New site has been live 1 month.

Thanks for your help

Andrew

Sarbs

Without seeing the specific pages i cant check for things such as noindex tags or robot text blocking access, i would suggest you double check these aspects. The pages will need to be accesible to Search engines when they crawl your site, so if there are no links to those pages Google will be unable to access them.

How long have they been live since the site re-launch as it may just be that they have not been crawled yet, particuarly if they are deeper pages within your site hierarchy.

Heres a link to Googles resources on crawling and indexing sites incase you have not been able to check through them yet.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.