Why isn't google indexing our site?
-
Hi,
We have majorly redesigned our site. Is is not a big site it is a SaaS site so has the typical structure, Landing, Features, Pricing, Sign Up, Contact Us etc...
The main part of the site is after login so out of google's reach.
Since the new release a month ago, google has indexed some pages, mainly the blog, which is brand new, it has reindexed a few of the original pages I am guessing this as if I click cached on a site: search it shows the new site.
All new pages (of which there are 2) are totally missed. One is HTTP and one HTTPS, does HTTPS make a difference.
I have submitted the site via webmaster tools and it says "URL and linked pages submitted to index" but a site: search doesn't bring all the pages?
What is going on here please? What are we missing? We just want google to recognise the old site has gone and ALL the new site is here ready and waiting for it.
Thanks
Andrew
-
Well, links/shares are good. But of course I'm just begging the question of how you can get those.
Rand gave a great talk called "Inbound Marketing for Startups" at a Hackers & Founders meetup that was focused more on Inbound as a whole than SEO in particular, but it's full of valuable insights: http://vimeo.com/39473593 [video]
Ultimately it'll come down to some kind of a publishing/promotional strategy for your startup. Sometimes your startup is so unique/interesting that it has its own marketing baked right in - in which case you can get a lot of traction by simply doing old-school PR to get your startup in front of the right people.
Other times, you've got to build up links/authority on the back of remarkable marketing.
BufferApp is a great example of a startup that built traction off their blog. Of course, they weren't necessarily blogging as an SEO play - it was more in the aim of getting directly in front of the right audience for direct signups for their product. But they definitely built up some domain authority as a result.
I'd also take a look at the guides Mailchimp has created - they created the dual benefit of getting in front of the right audience in a positive/helpful way (which benefits the brand and drives sign-ups directly) as well as building a considerable number of inbound links, boosting their domain authority overall.
Unfortunately no quick/easy ways to build your domain authority, but things you do to build your authority can also get you immediately in front of the audience you're looking for - and SEO just becomes a lateral benefit to that.
-
Thank you all for your responses. It is strange. we are going to add a link to our g+ page and then add a post.
As a new site what is the best way to get our domain authority up so we get crailed quicker?
Thanks again
Andrew
-
I disagree. Unless the old pages have inbound links from external sites, there's not much reason to 301 them (and not much benefit). If they're serving up 404 errors, they will fall out of the index.
Google absolutely does have a way to know these new pages exist - by crawling the home page and following the links discovered there. Both of the pages in question are linked to prominently, particularly the Features page which is part of the main navigation. A sitemap is just an aid for this process - it can help move things along and help Google find otherwise obscure/deep pages, but it by no means is a necessity for getting prominent pages indexed, particularly pages that are 1-2 levels down from the home page.
-
If you didn't redirect the old URLs to the new ones when the new site went live, this will absolutely be the cause of your problem, Studio33. That, combined with having no (or misdirected) sitemap means there was essentially no way for Google to even know your site's pages existed.
Good catch Billy.
-
Hi Andrew,
-
Google has been indexing HTTPS URLs for years now without a problem, so is unlikely to be part of the issue.
-
Your domain authority on the whole may be slowing Google down in indexing new pages. Bottom line is crawl rate and depth are both functions of how authoritative/important you appear based on links/shares/etc.
-
That said, I don't see any indication as to why these two particular pages are not being indexed by Google. I'm a bit stumped here.
I see some duplication between your Features page and your Facebook timeline, but not with the invoice page.
As above, your domain authority (17) is a bit on the low side. So this could simply be a matter of Google not dedicating enough resources to crawl/index all of your pages yet. But why these two pages would be the only ones is perplexing, particularly after a full month. There are no problems with your Robots.txt, no canonical tag issues, the pages are linked to properly.
Wish I had an easy answer here. One idea, a bit of a long shot: we've seen Google index pages faster when they're linked to from Google+ posts. I see you have a Google+ business page for this website - you might try simply writing a (public) post there that includes a link over to the Features page.
As weak as that is, that's all I've got.
Best of Luck,
Mike -
-
OK - I would get a list of all of your old pages and start 301 redirecting them to your new pages asap. This could be part of your issue.
-
Hi checked XML, its there if you view source it just doesn't have a stylesheet
-
Hi thanks about 1 month. The blog page you are getting maybe the old ones,as they are working this end http://www.invoicestudio.com/Blog . What you have mentioned re the blog is part of the problem. Google has the old site and not the new.
-
Getting this on your Blog pages:
The page cannot be displayed because an internal server error has occurred.
where you aware?
Anyway - may I ask how old these pages are?
-
Thanks. I will look into the sitemap. That only went live about an hour ago whilst this thread has been going on.
-
Yeah - with no path specified the directive is ignored. (you don't have a '/' so the directive (disallow) is ignored)
however, you do direct to your xml sitemap which appears to be empty. You might want to fix that....
-
Hi no I think its fine as we do not have the forward slash after the disallow. See
http://www.robotstxt.org/robotstxt.html
I wish it was as simple as that. Thanks for your help though its appreciated.
-
Hmmm. That link shows that the way you have it will block all robots.
-
Thanks but I think Robots.txt is correct. Excert from http://www.robotstxt.org/robotstxt.html
To exclude all robots from the entire server
User-agent: * Disallow: /
To allow all robots complete access
User-agent: * Disallow:
(or just create an empty "/robots.txt" file, or don't use one at all)
-
It looks like your robots.txt file is the problem. http://www.invoicestudio.com/robots.txt has:
User-agent: * Disallow: When it should be:
User-agent: *
Allow: / -
Hi,
The specific pages are
https://www.invoicestudio.com/Secure/InvoiceTemplate
http://www.invoicestudio.com/Features
I'm not sure what other pages are not indexed.
New site has been live 1 month.
Thanks for your help
Andrew
-
Without seeing the specific pages i cant check for things such as noindex tags or robot text blocking access, i would suggest you double check these aspects. The pages will need to be accesible to Search engines when they crawl your site, so if there are no links to those pages Google will be unable to access them.
How long have they been live since the site re-launch as it may just be that they have not been crawled yet, particuarly if they are deeper pages within your site hierarchy.
Heres a link to Googles resources on crawling and indexing sites incase you have not been able to check through them yet.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing
Hi We have roughly 8500 pages in our website. Google had indexed almost 6000 of them, but now suddenly I see that the pages indexed has gone to 45. Any possible explanations why this might be happening and what can be done for it. Thanks, Priyam
Intermediate & Advanced SEO | | kh-priyam0 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
Silly Question still - Because I am paying high to google adwords is it possible google can't rank me high in organic?
Hello All, My ecommerce site gone in penalty more than 3 years before and within 3 months I got message from google penalty removed. Since then till date my organic ranking is very worst. In this 3 years I improved my site onpage very great. If I compare my site with all other competitors who are ranking in top 10 then my onpage that includes all schema, reviews, sitemap, header tags, meta's etc, social media, site structure, most imp speed, google page speed insight score, pingdom, w3c errors, alexa rank, global rank, UI, offers, design, content, code to text raito, engagement rate, page views, time on site etc all my sites always good compare to competitors. They also have few backlinks I do have few backlinks only. I am doing very high google adwords and my conversion rate is very very good. But do you think because I am paying since last 3 year high to google because of that google have some setting or strategy that those who perform well in adwords so not to bring up in organic? Is it possible I can talk with google on this? If yes then what will be the medium of conversation? Pls give some valuable inputs I am performing very much in paid so user end site is very very well. Thanks!
Intermediate & Advanced SEO | | pragnesh96390 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
Google isn't seeing the content but it is still indexing the webpage
When I fetch my website page using GWT this is what I receive. HTTP/1.1 301 Moved Permanently
Intermediate & Advanced SEO | | jacobfy
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
server: nginx
content-type: text/html
location: https://www.inscopix.com/
x-pantheon-endpoint: 4ac0249e-9a7a-4fd6-81fc-a7170812c4d6
Cache-Control: public, max-age=86400
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 14 Mar 2014 16:29:38 GMT
X-Varnish: 2640682369 2640432361
Age: 326
Via: 1.1 varnish
Connection: keep-alive What I used to get is this: HTTP/1.1 200 OK
Date: Thu, 11 Apr 2013 16:00:24 GMT
Server: Apache/2.2.23 (Amazon)
X-Powered-By: PHP/5.3.18
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Thu, 11 Apr 2013 16:00:24 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1365696024"
Content-Language: en
Link: ; rel="canonical",; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8 xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:og="http://ogp.me/ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:sioct="http://rdfs.org/sioc/types#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <title>Inscopix | In vivo rodent brain imaging</title>0 -
Can't get auto-generated content de-indexed
Hello and thanks in advance for any help you can offer me! Customgia.com, a costume jewelry e-commerce site, has two types of product pages - public pages that are internally linked and private pages that are only accessible by accessing the URL directly. Every item on Customgia is created online using an online design tool. Users can register for a free account and save the designs they create, even if they don't purchase them. Prior to saving their design, the user is required to enter a product name and choose "public" or "private" for that design. The page title and product description are auto-generated. Since launching in October '11, the number of products grew and grew as more users designed jewelry items. Most users chose to show their designs publicly, so the number of products in the store swelled to nearly 3000. I realized many of these designs were similar to each and occasionally exact duplicates. So over the past 8 months, I've made 2300 of these design "private" - and no longer accessible unless the designer logs into their account (these pages can also be linked to directly). When I realized that Google had indexed nearly all 3000 products, I entered URL removal requests on Webmaster Tools for the designs that I had changed to "private". I did this starting about 4 months ago. At the time, I did not have NOINDEX meta tags on these product pages (obviously a mistake) so it appears that most of these product pages were never removed from the index. Or if they were removed, they were added back in after the 90 days were up. Of the 716 products currently showing (the ones I want Google to know about), 466 have unique, informative descriptions written by humans. The remaining 250 have auto-generated descriptions that read coherently but are somewhat similar to one another. I don't think these 250 descriptions are the big problem right now but these product pages can be hidden if necessary. I think the big problem is the 2000 product pages that are still in the Google index but shouldn't be. The following Google query tells me roughly how many product pages are in the index: site:Customgia.com inurl:shop-for Ideally, it should return just over 716 results but instead it's returning 2650 results. Most of these 1900 product pages have bad product names and highly similar, auto-generated descriptions and page titles. I wish Google never crawled them. Last week, NOINDEX tags were added to all 1900 "private" designs so currently the only product pages that should be indexed are the 716 showing on the site. Unfortunately, over the past ten days the number of product pages in the Google index hasn't changed. One solution I initially thought might work is to re-enter the removal requests because now, with the NOINDEX tags, these pages should be removed permanently. But I can't determine which product pages need to be removed because Google doesn't let me see that deep into the search results. If I look at the removal request history it says "Expired" or "Removed" but these labels don't seem to correspond in any way to whether or not that page is currently indexed. Additionally, Google is unlikely to crawl these "private" pages because they are orphaned and no longer linked to any public pages of the site (and no external links either). Currently, Customgia.com averages 25 organic visits per month (branded and non-branded) and close to zero sales. Does anyone think de-indexing the entire site would be appropriate here? Start with a clean slate and then let Google re-crawl and index only the public pages - would that be easier than battling with Webmaster tools for months on end? Back in August, I posted a similar problem that was solved using NOINDEX tags (de-indexing a different set of pages on Customgia): http://moz.com/community/q/does-this-site-have-a-duplicate-content-issue#reply_176813 Thanks for reading through all this!
Intermediate & Advanced SEO | | rja2140 -
Wordpress blog in a subdirectory not being indexed by Google
HI MozzersIn my websites sitemap.xml, pages are listed, such as /blog/ and /blog/textile-fact-or-fiction-egyptian-cotton-explained/These pages are visible when you visit them in a browser and when you use the Google Webmaster tool - Fetch as Google to view them (see attachment), however they aren't being indexed in Google, not even the root directory for the blog (/blog/) is being indexed, and when we query:site: www.hilden.co.uk/blog/ It returns 0 results in Google.Also note that:The Wordpress installation is located at /blog/ which is a subdirectory of the main root directory which is managed by Magento. I'm wondering if this causing the problem.Any help on this would be greatly appreciated!AnthonyToTOHuj.png?1
Intermediate & Advanced SEO | | Tone_Agency0 -
Google Freshness Update & Ecommerce Site Strategies
Just curious what other ecommerce SEO's are doing to battle fresh content. We've been having our clients work on internal blogs, adding articles one click away from landing pages, and implement product reviews when possible but I don't know that it's enough. Our bigger customers have landing pages (usually category pages) with very competitive keywords. So my main issue is what to do with fresh content on category pages.. I've toyed with the idea of having the landing page content re written every now and then. We used to use a blog parser to bring snippits of comments from the blog into landing pages but I believe that to be a problem with duplicate content. News snippits from other sites don't seem beneficial either. Anyone have any other ideas?
Intermediate & Advanced SEO | | iAnalyst.com0