How do I know what pages of my site is not inedexed by google ?
-
Hi
I my Google webmaster tools under Crawl->sitemaps it shows 1117 pages submitted but 619 has been indexed.
Is there any way I can fined which pages are not indexed and why?
it has been like this for a while.
I also have a manual action (partial) message. "Unnatural links to your site--impacts links" and under affects says "Some incoming links"
is that the reason Google does not index some of my pages?
Thank you
Sina
-
Thank you very much for the detail answer,
Is there any way I can find when I got the Manual Actions (Partial)
there is no date
-
Hi Sina,
For your first question, make sure you have Google Webmaster Tools setup (which I gather you do) as you have received a 'low quality/spam links' message by them. I should add that dealing with an 'unnatural link profile by Google is a whole other project!) and super important to boot so get on top of that also! Open Site Explorer is a perfect place to start, to crawl the links and to profile your entire linking domain profile. From here you can begin to examine domain link profile by filtering through options to identify ones which may be causing you that warning from Google. This will need to be rectified in order to ensure solid indexing of your site pages. You will need to clean these up in order for the rest to work and be effective
Now, to look at the indexing issue you asked on. If you look to the right in Webmaster Tools once you login, on the dashboard, you will see a section called SITEMAPS (3rd on the right once you click into the domain) from the main panel. Click on the TITLE of this section from the dashboard, and you will land on the SITEMAPS report file. There is a wealth of information here from Google about the indexing health of your site.
There are 3 steps here, Google needs to have done in order to identify which to help you figure out the information you are looking for:
- Crawling
- Indexing
- Ranking (what you see in the SERP results pages using search terms or Google Operators for site review.
In order to see any results at all, you need to ensure you have a SITEMAPS.XML file built, loaded and submitted to Google. It also needs to be configured properly and have no errors for proper processing. This is the only way you will get clear snapshot of what has been indexed based on your XML file by Google. This will tell you have many pages you have indexed in their index, but not identify. If you don't have any at all, it will state it.
it's also time to look at your robots.txt and .htaccess file to ensure those are configured and installed properly. This would be another troubleshooting step, but seeing as you have a unnatural link profile, you may want to take these steps first. Ensure you don't have any of the <noindex>meta fields listed here as well site-wide.</noindex>
So, from here, once you login to Webmaster Tools (dashboard for the site you are referring to you) under SITEMAPS, you will see a section saying XXX number of pages submitted and XXX # of pages indexed along with any errors and warnings you are getting from them now in that box (link warnings will be here too!). This will give you some important informtion which you can log in an Excel file later Here is where you will most likely see that linking domain link alert from Google as well.
Now you have Google's 'indexed pages' view. Now you have to dig a little.
----- GOOGLE OPERATORS ---- Now, once you have some data from Google WebMaster Tools as mentioned above, You can now go to Google.com (or the Google index you want to see like .ca. or others) and use Google search operators to speficially see which URL's and pages have been indexed by the engine. There are a few different ones you can use below. I found a great resource below and copied in the link.
Domain search with - site: Operator
(site:google.com)
This should returns results only from the specified Domain.
So you will need to be careful if your site is with a SubDomain (or multiple SubDomains) ("www" is a SubDomain).Domain search with - inurl: Operator
(inurl:google.com)
This should return results that contain the specified Domain.
This may not be only from the site in question though! It is possible for other sites to contain your domainname in their URLs (whois.domaintools.com may have such URLs etc.)Domain search with - site: and inurl: Operators
(site:google.com inurl:google.com)
This way you limit the results to your Domain Only ... and it seems to generate more "reliable" results than the site: operator alone.Domain and Path/Query search with - site: and inurl: Operators
(site:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:?this=that&rabbits=lunch)
This way you limit the results to your Domain Only ... and focus on a specific directory/folder or set of paramters etc.Domain and FileType search with - site: and filetype: Operators
(site:google.com filetype:html)
This limits the results to those from your Domain, and to a specific type of file.
Please note - the filetype: operator may not show All of that type - it may only work for URLs that end in that type. thus if you serve content as html, but without the .html in the filename - they will not show in the results!)Domain and Path/Query search with - site:, inurl: and inurl: Operators
(site:google.com inurl:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:google.com inurl:?this=that&rabbits=lunch)
This permits you to start limiting the results to specific parts of your site if you need too.Make sure that your site pages also don't include in the section the <meta-noindex>or <meta-nofollow>tags. This would tell Google not to index or follow the pages from your site </meta-nofollow></meta-noindex>
Ensure that you have, in your .htaccess file the proper redirects for the site if you find you have duplicate content. Ensure you are 301 redirecting the non-www to www versions of your site and pages (or vice-versa), whichever you prefer to have indexed by Google to ensure clean indexing of the site. This will make sure you don't have problems indexing wide for search.
TO NOTE
---- SERVER LOG FILES ---- (Note: please make sure that you request log files) from your hosting company too. If you don't have access to server log files for hosting traffic, switch! Log and keep an eye on these as well for information for your needs. This process is not a fast or easy one and does require some work to detect. Don't get lazy. This is a crucial step to keep an eye on.
What I recommend next is starting to keep log files if you aren't already and tracking those on a weekkly pr monthly basis (which ever is easier). The reason being is once you get indexed to Google, you always want to keep an idea of what is indexed and what isn't (dropped) or de-indexed pages. This can also help identify early problems (or penalties) from Google if you see trending things happening day over day or week over week.
Hope this helps point you in the right direct. Remember don't be lazy here Exhaust all options to indentify your problems! Cheers,
Rob
-
Based on the manual action message from Google, I would guess that one of the possible reasons is that the unindexed pages have bad links pointing towards them. So Google is thinking that those pages are not "quality."
I would also check that all pages are included in your XML sitemap at a minimum and HTML sitemap (if you have the latter one). I'd also check the section of all pages to make sure that no pages are set to "noindex." Lastly, you may have duplicate content. If two pages have the exact-same text with only minor keyword-based variations, for example, then Google will often index only one of the two pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Ignoring Canonical Tag for Hundreds of Sites
Bazaar Voice provides a pretty easy-to-use product review solution for websites (especially sites on Magento): https://www.magentocommerce.com/magento-connect/bazaarvoice-conversations-1.html If your product has over a certain number of reviews/questions, the plugin cuts off the number of reviews/questions that appear on the page. To see the reviews/questions that are cut off, you have to click the plugin's next or back function. The next/back buttons' URLs have a parameter of "bvstate....." I have noticed Google is indexing this "bvstate..." URL for hundreds of sites, even with the proper rel canonical tag in place. Here is an example with Microsoft: http://webcache.googleusercontent.com/search?q=cache:zcxT7MRHHREJ:www.microsoftstore.com/store/msusa/en_US/pdp/Surface-Book/productID.325716000%3Fbvstate%3Dpg:8/ct:r+&cd=2&hl=en&ct=clnk&gl=us My website is seeing hundreds of these "bvstate" urls being indexed even though we have a proper rel canonical tag in place. It seems that Google is ignoring the canonical tag. In Webmaster Console, the main source of my duplicate titles/metas in the HTML improvements section is the "bvstate" URLs. I don't necessarily want to block "bvstate" in the robots.txt as it will prohibit Google from seeing the reviews that were cutoff. Same response for prohibiting Google from crawling "bvstate" in Paramters section of Webmaster Console. Should I just keep my fingers crossed that Google honors the rel canonical tag? Home Depot is another site that has this same issue: http://webcache.googleusercontent.com/search?q=cache:k0MBLFcu2PoJ:www.homedepot.com/p/DUROCK-Next-Gen-1-2-in-x-3-ft-x-5-ft-Cement-Board-172965/202263276%23!bvstate%3Dct:r/pg:2/st:p/id:202263276+&cd=1&hl=en&ct=clnk&gl=us
Intermediate & Advanced SEO | | redgatst1 -
Google update this wknd or page title issue?
Hi, I've seen a big ranking drop for many major terms, for a particular site, just on Google. This happened Fri 20th or Sat 21st just gone. I don't see any news on an algorithm update over the weekend.I had changed many of the sites major page title protocols 2 weeks ago but a) I would have expected any negative effect before now and not all at once b) the protocols were carefully crafted to avoid traffic drops for major terms and c) i'm seeing traffic drops for keywords that still start at the beginning of the page title d) im seeing drops for some pages which are still using the OLD page titles. I had even tested the protocol on a number of pages in advance to ensure it wouldn't cause problems. As a bit of background - the title protocols were changed to make them more user friendly and less keyword heavy. CTR from search improved so was hoping for better not worse rankings! Ideas, gratefully appreciated.Andy
Intermediate & Advanced SEO | | AndyMacLean0 -
Google is indexing the wrong pages
I have been having problems with Google indexing my website since mid May. I haven't made any changes to my website which is wordpress. I have a page with the title 'Peterborough Cathedral wedding', I search Google for 'wedding Peteborough Cathedral', this is not a competitive search phrase and I'd expect to find my blog post on page one. Instead, half way down page 4 I find Google has indexed www.weddingphotojournalist.co.uk/blog with the title 'wedding photojournalist | Portfolio', what google has indexed is a link to the blog post and not the blog post itself. I repeated this for several other blog posts and keywords and found similar results, most of which don't make any sense at all - A search for 'Menorca wedding photography' used to bring up one of my posts at the top of page one. Now it brings up a post titled 'La Mare wedding photography Jersey" which happens to have a link to the Menorca post at the bottom of the page. A search for 'Broadoaks country house weddng photography' brings up 'weddingphotojournalist | portfolio' which has a link to the Broadoaks post. a search for 'Blake Hall wedding photography' does exactly the same. In this case Google is linking to www.weddingphotojournalist.blog again, this is a page of recent blog posts. Could this be a problem with my sitemap? Or the Yoast SEO plugin? or a problem with my wordpress theme? Or is Google just a bit confused?
Intermediate & Advanced SEO | | weddingphotojournalist0 -
Google can't access/crawl my site!
Hi I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings. [URL Errors: 1st photo] 8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up. The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages. After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked. Also when i go to WMT, and try to Fetch as Google the site, this is what i get: [Fetch as Google: 2nd photo] From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles). What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings? Thanks a lot
Intermediate & Advanced SEO | | granitgash
Granit FvhvDVR.png dKx3m1O.png0 -
Why Is Google Indexing These Product Pages On Shopify?
How can we communicate to Google the exact product pages we'd like indexed on our site? We're an apparel company that uses Shopify as our ecommerce platform. Website is sportiqe.com. Currently, Google is indexing all types of different pages on our site. **Example of a product page we want indexed: ** Product Page: sportiqe.com/products/PRODUCT-TITLE (Like This) **Examples of product pages being indexed: ** sportiqe.myshopify.com/products/PRODUCT-TITLE sportiqe.com/collections/COLLECTION-NAME/products/PRODUCT-TITLE See attached for an example of how two different "Boston Celtics Grateful Dead" shirts are being indexed. Any suggestions? We've used both Shopify and Google Webmaster tools to set our preferred domain (sportiqe.com). We've also added this snippet of code to our site three months ago thinking that would do the trick... {% if template == 'product' %}{% if collection %} {% endif %}{% endif %} sKwNZOl
Intermediate & Advanced SEO | | farmiloe0 -
Why is my XML sitemap ranking on the first page of google for 100s of key words versus the actual relevant page?
I still need this question answerd and I know it's something I must have changed. But google is ranking my sitemap for 100s of key terms versus the actual page. It's great to be on the first page but not my site map...... Geeeez.....
Intermediate & Advanced SEO | | ursalesguru0 -
Why is Google Still Penalizing My Site?
We got hit pretty hard by Penguin. There were some bad link issues which we've cleared up and we also had a pretty unique situation stemming from about a year ago when we changed the name of the company and created a whole new site with similar content under a different URL. We used the same phone number and address, and left the old site up as it was still performing well. Google didn't care for that so we eventually used 301 redirects to push the link juice from the old site to the new site. That's the background, here's the problem...... We've partially recovered, but there are several keywords that haven't come back anywhere near where they were in Google. We have higher page rank and more links than our competition and are performing in the top 5 for some of our keywords. Other, similar keywords, where we used to be in the top 5, we are now down on page 4 or 5. Our website is www.hudsoncabinetrydesign.com. We build custom cabinetry and furniture in Westchester County, NY just north of NYC. Examples - For "custom built-ins new york" we are number 3 on Google, number 1 on Bing/Yahoo. For "custom kitchen cabinetry ny" we are number 3 on Bing/Yahoo, not in the top 50 on Google. For "custom radiator covers ny" we used to be #1 on Google, are currently #48, currently #2 on Bing/Yahoo. Obviously, we've done something to upset the Google, but we've run out of ideas as to what it could be. Any ideas as to what is going on? Thanks so much for your feedback, Doug B.
Intermediate & Advanced SEO | | doug_b0 -
What causes internal pages to have a page rank of 0 if the home page is PR 5?
The home page PageRank is 5 but every single internal page is PR 0. Things I know I need to address each page has 300 links (Menu problem). Each article has 2-3 duplicates caused from the CMS working on this now. Has anyone else had this problem before? What things should I look out for to fix this issue. All internal linking is follow there is no page rank sculpting happening on the pages.
Intermediate & Advanced SEO | | SEOBrent0