Page Count in Webmaster Tools Index Status Versus Page Count in Webmaster Tools Sitemap
-
Greeting MOZ Community:
I run www.nyc-officespace-leader.com, a real estate website in New York City.
The page count in Google Webmaster Tools Index status for our site is 850. The page count in our Webmaster Tools Sitemap is 637. Why is there a discrepancy between the two?
What does the Google Webmaster Tools Index represent? If we filed a removal request for pages we did not want indexed, will these pages still show in the Google Webmaster Tools page count despite the fact that they no longer display in search results? The number of pages displayed in our Google Webmaster Tools Index remains at about 850 despite the removal request. Before a site upgrade in June the number of URLs in the Google Webmaster Tools Index and Google Webmaster Site Map were almost the same.
I am concerned that page bloat has something to do with a recent drop in ranking.
Thanks everyone!!
Alan
-
Using the noindex,follow combination is a form of advanced page sculpting, which is not truly an SEO best practice.
Here's why:
If you deem a page not worthy of being in the Google index, attempting to say "it's not worthy of indexing, but the links on it are worthy" is a mixed message.
Links to those other pages should already exist from pages you do want indexed.
By doing noindex,follow, you increase the internal link counts in artificial ways.
-
Hi Alan:
That is very clear, thanks!!
For pages with thin content, why the "no-follow" in addition to the "no-index"? My SEO firm was also of the opinion that the thin content pages be "no-indexed" however they did not suggest a "no-follow" also.
So I think I will work on improving site speed, enhancing content and no-indexing (and no following?) thin pages. If that does not induce an improvement I guess I will have to consider alternatives.
Thanks,
Alan -
As I already communicated, these are issues that MAY be causing your problems. Without direct access to Google's algorithms, there is zero guarantee that anyone could absolutely say with 100% certainty exactly what impact they are having. And without a full audit, there is no way to know what other problems you have.
Having said that, proper SEO best practices always dictates that any major SEO flaws that you know exist should be cleaned up / fixed. So - if two thirds of your listings have thin content, the best suggestion would be to work to add much more content to each of those (unique, highly relevant, trustworthy and helpful), or to consider a "noindex,nofollow" on those specific pages.
The problem then being if you noindex,nofollow that many pages, what do you have left in terms of overall site scale that Google would find worthy of high rankings? How big are your competitors? Taking away very thin pages helps reduce "low quality" signals, yet if there isn't other "high quality" volume of content you still don't solve all your problems most of the time.
200-250 words is NOT considered a strong volume of content in most cases. Typically these days it's around the 600 words + range. However that also depends on the majority of the competition for that unique type of content in that specific market.
And site speed is also something that best practices dictates needs to be as efficient as possible so if it's slow even intermittently, that would be another thing to definitely work on.
-
Hi Alan:
About maybe 220 pages of the 305 listings have thin content. Meaning less than 100 words.
Is that likely to have triggered a Panda 4.0 penalty in late May? If I add content to those pages of no-index them could that reverse the penalty if it exists. Also my building pages contain 200-250 words. Is that considered "thin"? They are less geared towards the needs of tenants leasing space and contain historical information. I intend to enhance them and display listings on them. Do you think that could help?
Do you think the site speed could be a major factor impacting performance on my site? If so, I can invest in improving speed.
Thanks, Alan
-
thanks for the GA data - so - there's very little traffic to the site so Google isn't able to get accurate page speed data consistently every day.
Note however, that back around July 6th, the site-wide average was almost 40 seconds a page. That's extremely slow. Then on the 17th, it was up around 16 seconds site-wide. So even though the little bit of data the rest of the month shows much faster speeds, those are definitely not good.
I honestly don't know however, given the very small data set, what impact site speed is having on the site. And there's just no way to know how it's impacting the site compared to other problems.
Next - thin content pages - what percentage of the listings has this problem? When I go to a sample listing such as this one I see almost no content. If a significant number of listings you have are this severely thin, that could well be a major problem.
Again though, I don't believe in randomly looking at one, two or even a few individual things as a valid basis for making a wild guess as to exact causes. SEO is not rocket science, however it is computer science. It's complex and hundreds of main factors are involved.
-
Hi Alan:
Interesting tools, URIValet.com, I never heard of it before.
I reviewed site speed on Google Analytics and its seems that intermittently download speeds seem very slow. According to "Site Speed Timings" (see attached) there has been a drop in download speed.
Is download speed a potentially more significant problem than the unknown 175 URLs?
Also, the listing do not appear elsewhere on the web. But many of them have light content. The call to action at the end of the listing is somewhat repetitive. I plan on either no-indexing listings with less than 100 words or adding to the content. The total number of listing URLs is 310. There are also 150 short building write ups URLs (like: http://www.nyc-officespace-leader.com/metropolitan-life-tower). These don't have more than 150 content. Could they be contributing to the issue?
Is the load time for the URLs on this site so slow that it could be affecting ranking?
Thanks,
Alan -
It would require a developer to examine the structure of the site, how pages are generated - to do an inventory audit related to pages generated, then to match that to the sitemap file. If there are a large number of pages that are duplicate content, or very thin on content, that could be a contributing factor. Since there's less than 1,000 pages indexed in Google, I don't think 175 would be enough by itself as a single factor.
There are many reasons that could be causing your problem. Overall quality is another possible factor. In a test I ran just now at URIValet.com, the page processing speed for the home page in the 1.5 mbps emulator was 13 seconds. Since Google has an ideal of under 3 seconds, if you have serious site-wide processing issues, that could also be a contributing factor. A test of a different page came back at 6 seconds, so this isn't necessarily a site-wide problem, and it may even be intermittent.
Yet if there are intermittent times when speeds are even slower, then yes, that could well be a problem that needs fixing.
So many other possible issues exist. Are the property listings anywhere else on the web, or is the content you have on them exclusive to your site?
What about your link profile? Is it questionable?
Without a full blown audit it's a guess as to what the cause of your visibility drop problems are.
-
Hi Alan:
Your hypothesis regarding the URL structure is interesting. But in this case two the URLs represent buildings and the one with "/listings/" represents a listings. SO that seems ok.
Now you mention the possibility that there may be URLs that do not appear in the site map and are getting indexed by Google. That there is a site map issue with the site. How could I determine this?
Could the additional 175 URLs that have appeared in the last two months contribute to a drop in ranking?
I am complete stumped on thus issue and have been harassing the MOZ community for two months. If you could help get the bottom of this I would be most grateful.
Thanks, Alan
-
Hi Keri:
OK. I will keep that in mind moving forward. I did not realize the duplication.
If a question does not get answered are users allowed to repost?
Thanks,
Alan
-
Hi Alan:
Thanks for your response. Actually the 1st and 3rd URL are for buildings rather than listings, so they are actually formatted correctly. All listings contain "/listings/". So I think, but I am not an expert, that the URL structure is OK.
Thanks,
Alan -
There are many reasons this can be happening.
One cause is where more URLs exist than your sitemap might even include. So the question then is whether the sitemap file is accurate and includes all the pages you want indexed.
Sometimes it's a coding or Information Architecture flaw. where content is found multiple ways.
Doing a random check, I found you have listings showing up in three different ways
- http://www.nyc-officespace-leader.com/listings/38-broad-street-between-beaver--manhattan-new-york
- http://www.nyc-officespace-leader.com/113-133-west-18th-street
- http://www.nyc-officespace-leader.com/inquire-about-the-ladders-137-varick-street-to-rent-office-space-in-ny
See those? One has the address as a sub-page beneath "/listings/" the 2nd version does not, and the 3rd URL is entirely different altogether. There should only be one URL structure for all property listings so this would cause me to wonder whether you have properties showing up with two different URLs.
I didn't find duplication, yet it's a flawed URL issue that leaves me wondering if it's a contributing factor.
This is just a scratching on the surface of possibilities. I did check about blog tags and blog date archives, however none of those are indexed, so they're not a cause based on my preliminary evaluation.
-
Noindexed pages should not appear in your "Index Status". I could be wrong but it doesn't make sense to appear there if the page is noindexed.
Doing a site:www.nyc-officespace-leader.com, I get 849 results. Seems normal to me. Again you would probably have to scrutinize your sitemap instead, sitemaps don't always pull all the URLs depending how on you get them.
Based on Screaming Frog, you got about 860 pages and ~200 noindexed pages. Your index status may update eventually.
Its working as is anyway, http://www.nyc-officespace-leader.com/blog/tag/manhattan-office-space
Does not show up in SERPs. I wouldn't use Index Status as definitive but more as directional.
-
Thanks for your response.
I am very suspicious that something is amiss. The number of URLs in MOZ's crawl of our site is about 850, almost exactly the same as is on the crawl of our site. This 850 includes no index pages.
Is it normal for Google to show the total number of pages, even if they are no-index in The Webmaster Tools Index?
I would upload the Excel file of the MOZ crawl but I don't know how to do so.
Thanks,
Alan
-
It's best to just ask the same question once, and clarify if needed in the question itself. This seems real similar to the question you asked at http://moz.com/community/q/difference-in-number-of-urls-in-crawl-sitemaps-index-status-in-webmaster-tools-normal, unless I'm missing something.
-
Index status is how many pages Google has indexed of your site.
Sitemap is different, incase your site has pages that are too deep for Google to find, sitemaps are created as a way to direct Googlebot to crawl pages that they won't necessarily find.
In your case Google indexed more pages than the amount of pages in your sitemap, which is absolutely normal.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No Index thousands of thin content pages?
Hello all! I'm working on a site that features a service marketed to community leaders that allows the citizens of that community log 311 type issues such as potholes, broken streetlights, etc. The "marketing" front of the site is 10-12 pages of content to be optimized for the community leader searchers however, as you can imagine there are thousands and thousands of pages of one or two line complaints such as, "There is a pothole on Main St. and 3rd." These complaint pages are not about the service, and I'm thinking not helpful to my end goal of gaining awareness of the service through search for the community leaders. Community leaders are searching for "311 request service", not "potholes on main street". Should all of these "complaint" pages be NOINDEX'd? What if there are a number of quality links pointing to the complaint pages? Do I have to worry about losing Domain Authority if I do NOINDEX them? Thanks for any input. Ken
Intermediate & Advanced SEO | | KenSchaefer0 -
Google indexing only 1 page out of 2 similar pages made for different cities
We have created two category pages, in which we are showing products which could be delivered in separate cities. Both pages are related to cake delivery in that city. But out of these two category pages only 1 got indexed in google and other has not. Its been around 1 month but still only Bangalore category page got indexed. We have submitted sitemap and google is not giving any crawl error. We have also submitted for indexing from "Fetch as google" option in webmasters. www.winni.in/c/4/cakes (Indexed - Bangalore page - http://www.winni.in/sitemap/sitemap_blr_cakes.xml) 2. http://www.winni.in/hyderabad/cakes/c/4 (Not indexed - Hyderabad page - http://www.winni.in/sitemap/sitemap_hyd_cakes.xml) I tried searching for "hyderabad site:www.winni.in" in google but there also http://www.winni.in/hyderabad/cakes/c/4 this link is not coming, instead of this only www.winni.in/c/4/cakes is coming. Can anyone please let me know what could be the possible issue with this?
Intermediate & Advanced SEO | | abhihan0 -
Google is indexing the wrong page
Hello, I have a site I am optimizing and I cant seem to get a particular listing onto the first page due to the fact google is indexing the wrong page. I have the following scenario. I have a client with multiple locations. To target the locations I set them up with URLs like this /<cityname>-wedding-planner.</cityname> The home page / is optimized for their port saint lucie location. the page /palm-city-wedding-planner is optimized for the palm city location. the page /stuart-wedding-planner is optimized for the stuart location. Google picks up the first two and indexes them properly, BUT the stuart location page doesnt get picked up at all, instead google lists / which is not optimized at all for stuart. How do I "let google know" to index the stuart landing page for the "stuart wedding planner" term? MOZ also shows the / page as being indexed for the stuart wedding planner term as well but I assume this is just a result of what its finding when it performs its searches.
Intermediate & Advanced SEO | | mediagiant0 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Indexing/Sitemap - I must be wrong
Hi All, I would guess that a great number of us new to SEO (or not) share some simple beliefs in relation to Google indexing and Sitemaps, and as such get confused by what Web master tools shows us. It would be great if somone with experience/knowledge could clear this up for once and all 🙂 Common beliefs: Google will crawl your site from the top down, following each link and recursively repeating the process until it bottoms out/becomes cyclic. A Sitemap can be provided that outlines the definitive structure of the site, and is especially useful for links that may not be easily discovered via crawling. In Google’s webmaster tools in the sitemap section the number of pages indexed shows the number of pages in your sitemap that Google considers to be worthwhile indexing. If you place a rel="canonical" tag on every page pointing to the definitive version you will avoid duplicate content and aid Google in its indexing endeavour. These preconceptions seem fair, but must be flawed. Our site has 1,417 pages as listed in our Sitemap. Google’s tools tell us there are no issues with this sitemap but a mere 44 are indexed! We submit 2,716 images (because we create all our own images for products) and a disappointing zero are indexed. Under Health->Index status in WM tools, we apparently have 4,169 pages indexed. I tend to assume these are old pages that now yield a 404 if they are visited. It could be that Google’s Indexed quotient of 44 could mean “Pages indexed by virtue of your sitemap, i.e. we didn’t find them by crawling – so thanks for that”, but despite trawling through Google’s help, I don’t really get that feeling. This is basic stuff, but I suspect a great number of us struggle to understand the disparity between our expectations and what WM Tools yields, and we go on to either ignore an important problem, or waste time on non-issues. Can anyone shine a light on this for once and all? If you are interested, our map looks like this : http://www.1010direct.com/Sitemap.xml Many thanks Paul
Intermediate & Advanced SEO | | fretts0 -
How to remove an entire subdomain from the Google index with URL removal tool?
Does anyone have clear instructions for how to do this? Do we need to set up a separate GWT account for each subdomain? I've tried using the URL removal tool, but it will only allow me to remove URLs indexed under my domain (i.e. domain.com not subdomain.domain.com) Any help would be much appreciated!!!
Intermediate & Advanced SEO | | nicole.healthline0 -
HELP - got the following message - Google Webmaster Tools notice of detected unnatural links
Hi All, While trying to grow we used several freelancers and small companies for guest blogging, article submissions etc. We lost about 90% of traffic from our peek at December. We don't know if it is related but we got the following message last week:
Intermediate & Advanced SEO | | BeytzNet
"Google Webmaster Tools notice of detected unnatural links to www.domain.com" Is it related (getting this message after two months of losing traffic)? What to do???? (P.S
We fired most of the companies we used months ago since we noticed they used bad methods. We didn't believe it can hurt us - just thought it would be useless...) Please Help...0 -
NOINDEX listing pages: Page 2, Page 3... etc?
Would it be beneficial to NOINDEX category listing pages except for the first page. For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages? Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.
Intermediate & Advanced SEO | | Peter2640