How do I know what pages of my site is not inedexed by google ?
-
Hi
I my Google webmaster tools under Crawl->sitemaps it shows 1117 pages submitted but 619 has been indexed.
Is there any way I can fined which pages are not indexed and why?
it has been like this for a while.
I also have a manual action (partial) message. "Unnatural links to your site--impacts links" and under affects says "Some incoming links"
is that the reason Google does not index some of my pages?
Thank you
Sina
-
Thank you very much for the detail answer,
Is there any way I can find when I got the Manual Actions (Partial)
there is no date
-
Hi Sina,
For your first question, make sure you have Google Webmaster Tools setup (which I gather you do) as you have received a 'low quality/spam links' message by them. I should add that dealing with an 'unnatural link profile by Google is a whole other project!) and super important to boot so get on top of that also! Open Site Explorer is a perfect place to start, to crawl the links and to profile your entire linking domain profile. From here you can begin to examine domain link profile by filtering through options to identify ones which may be causing you that warning from Google. This will need to be rectified in order to ensure solid indexing of your site pages. You will need to clean these up in order for the rest to work and be effective
Now, to look at the indexing issue you asked on. If you look to the right in Webmaster Tools once you login, on the dashboard, you will see a section called SITEMAPS (3rd on the right once you click into the domain) from the main panel. Click on the TITLE of this section from the dashboard, and you will land on the SITEMAPS report file. There is a wealth of information here from Google about the indexing health of your site.
There are 3 steps here, Google needs to have done in order to identify which to help you figure out the information you are looking for:
- Crawling
- Indexing
- Ranking (what you see in the SERP results pages
using search terms or Google Operators for site review.
In order to see any results at all, you need to ensure you have a SITEMAPS.XML file built, loaded and submitted to Google. It also needs to be configured properly and have no errors for proper processing. This is the only way you will get clear snapshot of what has been indexed based on your XML file by Google. This will tell you have many pages you have indexed in their index, but not identify. If you don't have any at all, it will state it.
it's also time to look at your robots.txt and .htaccess file to ensure those are configured and installed properly. This would be another troubleshooting step, but seeing as you have a unnatural link profile, you may want to take these steps first. Ensure you don't have any of the <noindex>meta fields listed here as well site-wide.</noindex>
So, from here, once you login to Webmaster Tools (dashboard for the site you are referring to you) under SITEMAPS, you will see a section saying XXX number of pages submitted and XXX # of pages indexed along with any errors and warnings you are getting from them now in that box (link warnings will be here too!). This will give you some important informtion which you can log in an Excel file later
Here is where you will most likely see that linking domain link alert from Google as well.
Now you have Google's 'indexed pages' view. Now you have to dig a little.
----- GOOGLE OPERATORS ---- Now, once you have some data from Google WebMaster Tools as mentioned above, You can now go to Google.com (or the Google index you want to see like .ca. or others) and use Google search operators to speficially see which URL's and pages have been indexed by the engine. There are a few different ones you can use below. I found a great resource below and copied in the link.
Domain search with - site: Operator
(site:google.com)
This should returns results only from the specified Domain.
So you will need to be careful if your site is with a SubDomain (or multiple SubDomains) ("www" is a SubDomain).Domain search with - inurl: Operator
(inurl:google.com)
This should return results that contain the specified Domain.
This may not be only from the site in question though! It is possible for other sites to contain your domainname in their URLs (whois.domaintools.com may have such URLs etc.)Domain search with - site: and inurl: Operators
(site:google.com inurl:google.com)
This way you limit the results to your Domain Only ... and it seems to generate more "reliable" results than the site: operator alone.Domain and Path/Query search with - site: and inurl: Operators
(site:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:?this=that&rabbits=lunch)
This way you limit the results to your Domain Only ... and focus on a specific directory/folder or set of paramters etc.Domain and FileType search with - site: and filetype: Operators
(site:google.com filetype:html)
This limits the results to those from your Domain, and to a specific type of file.
Please note - the filetype: operator may not show All of that type - it may only work for URLs that end in that type. thus if you serve content as html, but without the .html in the filename - they will not show in the results!)Domain and Path/Query search with - site:, inurl: and inurl: Operators
(site:google.com inurl:google.com inurl:/somepath/somedirectory/)
(site:google.com inurl:google.com inurl:?this=that&rabbits=lunch)
This permits you to start limiting the results to specific parts of your site if you need too.Make sure that your site pages also don't include in the section the <meta-noindex>or <meta-nofollow>tags. This would tell Google not to index or follow the pages from your site
</meta-nofollow></meta-noindex>
Ensure that you have, in your .htaccess file the proper redirects for the site if you find you have duplicate content. Ensure you are 301 redirecting the non-www to www versions of your site and pages (or vice-versa), whichever you prefer to have indexed by Google to ensure clean indexing of the site. This will make sure you don't have problems indexing wide for search.
TO NOTE
---- SERVER LOG FILES ---- (Note: please make sure that you request log files) from your hosting company too. If you don't have access to server log files for hosting traffic, switch! Log and keep an eye on these as well for information for your needs. This process is not a fast or easy one and does require some work to detect. Don't get lazy. This is a crucial step to keep an eye on.
What I recommend next is starting to keep log files if you aren't already and tracking those on a weekkly pr monthly basis (which ever is easier). The reason being is once you get indexed to Google, you always want to keep an idea of what is indexed and what isn't (dropped) or de-indexed pages. This can also help identify early problems (or penalties) from Google if you see trending things happening day over day or week over week.
Hope this helps point you in the right direct. Remember don't be lazy here
Exhaust all options to indentify your problems! Cheers,
Rob
-
Based on the manual action message from Google, I would guess that one of the possible reasons is that the unindexed pages have bad links pointing towards them. So Google is thinking that those pages are not "quality."
I would also check that all pages are included in your XML sitemap at a minimum and HTML sitemap (if you have the latter one). I'd also check the section of all pages to make sure that no pages are set to "noindex." Lastly, you may have duplicate content. If two pages have the exact-same text with only minor keyword-based variations, for example, then Google will often index only one of the two pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Title page google serp
Why does Google change the titles automatically? I have <title>Canyoning Açores - São Jorge | Discover Experience Açores</title> but google show Discover Experience Açores: Canyoning Açores - São Jorge
Intermediate & Advanced SEO | | tiagoarruda0 -
A single page from site not ranking
Hello, We have a new site launched in March, that is ranking well in search for all of the pages, except one and we don't know why. This page it is optimised exactly the same way like the others, but still doesn't rank in Google. We have verified robots.txt for noffollow, noindex tags, we have verified if it was penalized by Google, but still didn't find nothing. Initially we had another site and was on the topic of this page, but we have redirected it to the new one. In case this old site was anytime in the past penalized by Google, could it be possible that the new page be influenced by this? Also, we have another site that ranks on the first position, that targets the same keywords like the page that does not rank. It was the first site we launched, so it is pretty much old, but we do not have duplicate content on them. Maybe Google doesn't like the fact that both target the same keywords and chooses to display only the old site? Please help us if you have any ideas or have been through such thing. Thank you!
Intermediate & Advanced SEO | | daniela.pirlogea0 -
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
Why is my site not getting crawled by google?
Hi Moz Community, I have an escort directory website that is built out of ajax. We basically followed all the recommendations like implementing the escaped fragment code so Google would be able to see the content. Problem is whenever I submit my sitemap on Google webmastertool it always 700 had been submitted and only 12 static pages had been indexed. I did the site query and only a number of pages where indexed. Does it have anything to do with my site being on HTTPS and not on HTTP? My site is under HTTPS and all my content is ajax based. Thanks
Intermediate & Advanced SEO | | en-gageinc0 -
When removing a product page from an ecommerce site?
What is the best practice for removing a product page from an Ecommerce site? If a 301 is not available and the page is already crawled by the search engine A. block it out in the robot.txt B. let it 404
Intermediate & Advanced SEO | | Bryan_Loconto0 -
Should I let Google crawl my production server if the site is still under development?
I am building out a brand new site. It's built on Wordpress so I've been tinkering with the themes and plug-ins on the production server. To my surprise, less than a week after installing Wordpress, I have pages in the index. I've seen advice in this forum about blocking search bots from dev servers to prevent duplicate content, but this is my production server so it seems like a bad idea. Any advice on the best way to proceed? Block or no block? Or something else? (I know how to block, so I'm not looking for instructions). We're around 3 months from officially launching (possibly less). We'll start to have real content on the site some time in June, even though we aren't planning to launch. We should have a development environment ready in the next couple of weeks. Thanks!
Intermediate & Advanced SEO | | DoItHappy0 -
Google Ranking Wrong Page
The company I work for started with a website targeting one city. Soon after I started SEO for them, they expanded to two cities. Optimization was challenging, but we managed to rank highly in both cities for our keywords. A year or so later, the company expanded to two new locations, so now 4 total. At the time, we realized it was going to be tough to rank any one page for four different cities, so our new SEO strategy was to break the website into 5 sections or minisites consisting of 4 city-targeted sites, and our original site which will now be branded as more of a national website. Our URL structures now look something like this:
Intermediate & Advanced SEO | | cpapciak
www.company.com
www.company.com/city-1
www.company.com/city-2
www.company.com/city-3
www.company.com.city-4 Now, in the present time, all is going well except for our original targeted city. The problem is that Google keeps ranking our original site (which is now national) instead of the new city-specific site we created. I realize that this is probably due to all of the past SEO we did optimizing for that city. My thoughts are that Google is confused as to which page to actually rank for this city's keyword terms and I was wondering if canonical tags would be a possible solution here, since the pages are about 95% identical. Anyone have any insight? I'd really appreciate it!0 -
Got a site in Google news, now what do I do?
I have been working on a site for 7 months, publishing articles each day. The site is dedicated to the niche that I am in. It was accepted in to Google news 1 week ago and I am now getting 3 times the amount of traffic. Its great news but I am now wondering how beneficial it is for me. I sell business management advice and really need to generate leads. Thats how I increase my business Any ideas how I could use this Google news site to do this?
Intermediate & Advanced SEO | | JohnPeters0