How do I know what pages of my site is not inedexed by google ?
-
Hi
I my Google webmaster tools under Crawl->sitemaps it shows 1117 pages submitted but 619 has been indexed.
Is there any way I can fined which pages are not indexed and why?
it has been like this for a while.
I also have a manual action (partial) message. "Unnatural links to your site--impacts links" and under affects says "Some incoming links"
is that the reason Google does not index some of my pages?
Thank you
Sina
-
Thank you very much for the detail answer,
Is there any way I can find when I got the Manual Actions (Partial)
there is no date
-
Hi Sina,
For your first question, make sure you have Google Webmaster Tools setup (which I gather you do) as you have received a 'low quality/spam links' message by them. I should add that dealing with an 'unnatural link profile by Google is a whole other project!) and super important to boot so get on top of that also! Open Site Explorer is a perfect place to start, to crawl the links and to profile your entire linking domain profile. From here you can begin to examine domain link profile by filtering through options to identify ones which may be causing you that warning from Google. This will need to be rectified in order to ensure solid indexing of your site pages. You will need to clean these up in order for the rest to work and be effective
Now, to look at the indexing issue you asked on. If you look to the right in Webmaster Tools once you login, on the dashboard, you will see a section called SITEMAPS (3rd on the right once you click into the domain) from the main panel. Click on the TITLE of this section from the dashboard, and you will land on the SITEMAPS report file. There is a wealth of information here from Google about the indexing health of your site.
There are 3 steps here, Google needs to have done in order to identify which to help you figure out the information you are looking for:
- Crawling
- Indexing
- Ranking (what you see in the SERP results pages
using search terms or Google Operators for site review.
In order to see any results at all, you need to ensure you have a SITEMAPS.XML file built, loaded and submitted to Google. It also needs to be configured properly and have no errors for proper processing. This is the only way you will get clear snapshot of what has been indexed based on your XML file by Google. This will tell you have many pages you have indexed in their index, but not identify. If you don't have any at all, it will state it.
it's also time to look at your robots.txt and .htaccess file to ensure those are configured and installed properly. This would be another troubleshooting step, but seeing as you have a unnatural link profile, you may want to take these steps first. Ensure you don't have any of the <noindex>meta fields listed here as well site-wide.</noindex>
So, from here, once you login to Webmaster Tools (dashboard for the site you are referring to you) under SITEMAPS, you will see a section saying XXX number of pages submitted and XXX # of pages indexed along with any errors and warnings you are getting from them now in that box (link warnings will be here too!). This will give you some important informtion which you can log in an Excel file later
Here is where you will most likely see that linking domain link alert from Google as well.
Now you have Google's 'indexed pages' view. Now you have to dig a little.
----- GOOGLE OPERATORS ---- Now, once you have some data from Google WebMaster Tools as mentioned above, You can now go to Google.com (or the Google index you want to see like .ca. or others) and use Google search operators to speficially see which URL's and pages have been indexed by the engine. There are a few different ones you can use below. I found a great resource below and copied in the link.
Domain search with - site: Operator
(site:google.com)
This should returns results only from the specified Domain.
So you will need to be careful if your site is with a SubDomain (or multiple SubDomains) ("www" is a SubDomain).Domain search with - inurl: Operator
(inurl:google.com)
This should return results that contain the specified Domain.
This may not be only from the site in question though! It is possible for other sites to contain your domainname in their URLs (whois.domaintools.com may have such URLs etc.)Domain search with - site: and inurl: Operators
(site:google.com inurl:google.com)
This way you limit the results to your Domain Only ... and it seems to generate more "reliable" results than the site: operator alone.Domain and Path/Query search with - site: and inurl: Operators
(site:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:?this=that&rabbits=lunch)
This way you limit the results to your Domain Only ... and focus on a specific directory/folder or set of paramters etc.Domain and FileType search with - site: and filetype: Operators
(site:google.com filetype:html)
This limits the results to those from your Domain, and to a specific type of file.
Please note - the filetype: operator may not show All of that type - it may only work for URLs that end in that type. thus if you serve content as html, but without the .html in the filename - they will not show in the results!)Domain and Path/Query search with - site:, inurl: and inurl: Operators
(site:google.com inurl:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:google.com inurl:?this=that&rabbits=lunch)
This permits you to start limiting the results to specific parts of your site if you need too.Make sure that your site pages also don't include in the section the <meta-noindex>or <meta-nofollow>tags. This would tell Google not to index or follow the pages from your site
</meta-nofollow></meta-noindex>
Ensure that you have, in your .htaccess file the proper redirects for the site if you find you have duplicate content. Ensure you are 301 redirecting the non-www to www versions of your site and pages (or vice-versa), whichever you prefer to have indexed by Google to ensure clean indexing of the site. This will make sure you don't have problems indexing wide for search.
TO NOTE
---- SERVER LOG FILES ---- (Note: please make sure that you request log files) from your hosting company too. If you don't have access to server log files for hosting traffic, switch! Log and keep an eye on these as well for information for your needs. This process is not a fast or easy one and does require some work to detect. Don't get lazy. This is a crucial step to keep an eye on.
What I recommend next is starting to keep log files if you aren't already and tracking those on a weekkly pr monthly basis (which ever is easier). The reason being is once you get indexed to Google, you always want to keep an idea of what is indexed and what isn't (dropped) or de-indexed pages. This can also help identify early problems (or penalties) from Google if you see trending things happening day over day or week over week.
Hope this helps point you in the right direct. Remember don't be lazy here
Exhaust all options to indentify your problems! Cheers,
Rob
-
Based on the manual action message from Google, I would guess that one of the possible reasons is that the unindexed pages have bad links pointing towards them. So Google is thinking that those pages are not "quality."
I would also check that all pages are included in your XML sitemap at a minimum and HTML sitemap (if you have the latter one). I'd also check the section of all pages to make sure that no pages are set to "noindex." Lastly, you may have duplicate content. If two pages have the exact-same text with only minor keyword-based variations, for example, then Google will often index only one of the two pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google has penalized me for a keyword,and removed from google some one know for how long time is the penalty
i have by some links from fiverr i was ranking 9 for this keyword with 1200 of searches after fiverr it has disappeared from google more then 10 days i guess this is a penalty someone know how long a penalty like this is how many days to months ? i don't get any messages in webmaster tools this is the gig https://www.fiverr.com/carissa30/do-20-unique-domains-high-tf-and-cf-flow-backlinks-high-da?source=Order+page+gig+link&funnel=a7b5fa4f-8c0a-4c3e-98a3-74112b658c7f
Intermediate & Advanced SEO | | alexmuller870 -
OSE link report showing links to 404 pages on my site
I did a link analysis on this site mormonwiki.com. And many of the pages shown to be linked to were pages like these http://www.mormonwiki.com/wiki/index.php?title=Planning_a_trip_to_Rome_By_using_Movie_theatre_-_Your_five_Fun_Shows2052752 There happens to be thousands of them and these pages actually no longer exist but the links to them obviously still do. I am planning to proceed by disavowing these links to the pages that don't exist. Does anyone see any reason to not do this, or that doing this would be unnecessary? Another issue is that Google is not really crawling this site, in WMT they are reporting to have not crawled a single URL on the site. Does anyone think the above issue would have something to do with this? And/or would you have any insight on how to remedy it?
Intermediate & Advanced SEO | | ThridHour0 -
Moving career site to new URL from main site. Will it hurt SEO for main page?
For one of our clients we are building a career site and putting it under a different URL and hosting service (mainly due to security concerns of hosting it under the same host and domain). almost 100% of the incoming traffic to their current career section (which it is in a sub-folder) receives traffic for branded keywords (brand + job/career/employment), that is, there are no job position specific keywords. The client is now worried that after moving the site, the inbound traffic to the main site will be severely affected as well as the SERP results. My questions are, will the non-career related SERPs be affected? I don't see how will they be but I could be wrong If no, how could we reassure her that the SEO to the main site wont be affected? are there any case studies of a similar case (splitting part of the website under a new URL and hosting service?) Thank you for your help. PS: this is my first post so please forgive me if this has been asked before. I could not find a good response.
Intermediate & Advanced SEO | | rflores0 -
Would be the network site map page considered link spam
In the course of the last 18 months my sites have lost from 50 to 70 percent of traffic. Never have used any tricks, just simple white-hat SEO. Anyway, I am now trying to fix things that hadn't been a problem before all those Google updates, but apparently now are. Would appreciate any help.. I used to have a network site map page on everyone of my sites (about 30 sites). It basically would be a page called 'our network' and it'll show a list of links to all of my other sites. These pages were indexed, had decent PR and didn't seem to cause any problem. Here's an example of one of them:
Intermediate & Advanced SEO | | romanbond
http://www.psoriasisguide.ca/psoriasis_scg.html In the light of Panda and Penguin and all these 'bad links' I decided to get rid of most of them. My traffic didn't recover at all, it actually went further down. Not sure if there is any connection to what I'd done. So, the question is: In your opinion/experience, do you think such network sitemap pages could be causing penalties for link spam?0 -
What Sources to use to compile an as comprehensive list of pages indexed in Google?
As part of a Panda recovery initiative we are trying to get an as comprehensive list of currently URLs indexed by Google as possible. Using the site:domain.com operator Google displays that approximately 21k pages are indexed. Scraping the results however ends after the listing of 240 links. Are there any other sources we could be using to make the list more comprehensive? To be clear, we are not looking for external crawlers like the SEOmoz crawl tool but sources that would be confidently allow us to determine a list of URLs currently hold in the Google index. Thank you /Thomas
Intermediate & Advanced SEO | | sp800 -
Is My site seeing a Google Dance ? - Rankings all over the place
My eCommerce website is seeing some rankings flucuate daily from say rank 20 to rank 130 whilst some other keywords ago up and down by as much as 40 places. I have been putting up alot of new unique content and can see in GWT that google has been crawling my site more but given that I was affected by the google panda updates which saw a 40% drop in traffic I'm only just to recover some of it., i have also been trying to get rid of any poor links and our linking building is only concentrating on high quality posts and links. I am wondering if this is the post panda update - "google dance" or is google having issues trying to work where to rank my site and possibly punish me? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0 -
Dynamically creating unique page titles on enterprise site
Hi, I want to dynamically create unique page titles (possible meta descriptions too) on a 10k page site. Many of the page titles are either duplicates or are missing. I heard about the option of grabbing the page titles from a database or possibly using the h1 as the page title. solmelia.com (the website consist of mostly static pages) Any suggestions would be much appreciated. Best Regards,
Intermediate & Advanced SEO | | Melia0 -
"site" operator and pages
Hi folks, We are having trouble in indexing, We have certain pages which are not coming in results when I am using the site operator in Google. for e.g. : sitename.com/widgets/red They are not showing any link results in Google webmaster tools too. But the pages which only linked through them are displaying in results when I am using site operator. for e.g: sitename.com/widgets/red/large We are redirecting some of the search which are close or exact match to the respective pages for e.g: sitename.com/search/red --> sitename.com/widgets/red We are fluctuating on rankings too in google serps form top ppositions to no where, for sitename.com/widgets/red and most of the times when google shows sitename.com/search/red instead of itename.com/widgets/red. Can you please put a light on this issues.
Intermediate & Advanced SEO | | semshah1430