Having Problems to Index all URLs on Sitemap
-
Hi all again ! Thanks in advance ! My client's site is having problems to index all its pages. I even bought the full extension of XML Sitemaps and the number of urls increased, but we still have problems to index all of them.
What are the reasons? The robots.txt is open for all robots, we only prohibit users and spiders to enter our Intranet. I've read that duplicate content and 404's can be the reason. Anything else?
-
Like already answered above it's quite hard to get to the 100% indexation rate for your webpages. What's your current indexation rate? If it's below 90% you might still have some issues somewhere.
-
Hi there
According to Google:
"...we don't guarantee that we'll crawl all of the pages of a particular site. Google doesn't crawl all the pages on the web, and we don't index all the pages we crawl. It's perfectly normal for not all the pages on a site to be indexed."
Google also provides tips and resources to help your site being indexed properly and, possibly, more fully. You can check that resource out here. Kissmetrics has a few other tips.
To Andy's point, Google indexes what it wants - don't be discouraged if your entire site isn't indexed in WMT. If you know how many pages are on your site (which you definitely should), I would try the "site:" function and get a better idea of what actually is indexed in Google.
Hope this helps! Good luck!
-
I have yet to see Google index every page of a site. They tend not to index pages that they don't think meet the criteria, so unless it was something like 90% weren't being indexed, I wouldn't worry about it. There are so many reasons that Google won't index a page.
You will also find that over time, if Google attributes more trust to the site, that more pages will be indexed.
Of course, you can do things to improve your changes, such as making sure Google can crawl all pages, check to see there are no bottlenecks anywhere and the big one - make sure your content is amazing. As long as the site is the best it can be, over time the number of indexed pages will increase.
Remember - the sitemap is not a guarantee that pages will be indexed. It just helps Google crawl your site.
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Url folder structure
I work for a travel site and we have pages for properties in destinations and am trying to decide how best to organize the URLs basically we have our main domain, resort pages and we'll also have articles about each resort so the URL structure will actually get longer:
Technical SEO | | Vacatia_SEO
A. domain.com/main-keyword/state/city-region/resort-name
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature _
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village/kid-friend-pool_ B. Another way to structure would be to remove the location and keyword folders and combine. Note that some of the resort names are long and spaces are being replaced dynamically with dashes.
ex. domain.com/main-keyword-in-state-city/resort-name
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature_
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village-kid-friend-pool_ Question: is that too many folders or should i combine or break up? What would you do with this? Trying to avoid too many dashes.0 -
Should We Index These Category Pages?
Currently we have marked category pages like http://www.yournextshoes.com/celebrities/kim-kardashian/ as follow/noindex as they essentially do not include any original content. On the other hand, for someone searching for Kim Kardashian shoes, it's a highly relevant page as we provide links to all the Kim Kardashian shoe sightings that we have covered. Should we index the category pages or leave them unindexed?
Technical SEO | | Jantaro0 -
Walking into a site I didn't build, easy way to fix this # indexing problem?
I recently joined a team with a site without a) Great content b) Not much of any search traffic I looked and all their url's are built in this way: Normal looking link -> not actually a new page but # like: /#content-title And it has no h1 tag. Page doesn't refresh. My initial thought is to gut the site and build it in wordpress, but first have to ask, is there a way to make a site with /#/ content loading friendly to search engines?
Technical SEO | | andrewhyde0 -
Will rel canonical tags remove previously indexed URLs?
Hello, 7 days ago, we implemented canonical tags to resolve duplicate content issues that had been caused by URL parameters. These "duplicate content" had already been indexed. Now that the URLs have rel canonical tags in place, will Google automatically remove from its index the other URLs with the URL parameters? I ask because we have been tracking the approximate number of URLs indexed by doing a site: search in Google, and we have barely noticed a decrease in URLs indexed. Thanks.
Technical SEO | | yacpro130 -
Sitemaps for Google
In Google Webmaster Central, if a URL is reported in your site map as 404 (Not found), I'm assuming Google will automatically clean it up and that the next time we generate a sitemap, it won't include the 404 URL. Is this true? Do we need to comb through our sitemap files and remove the 404 pages Google finds, our will it "automagically" be cleaned up by Google's next crawl of our site?
Technical SEO | | Prospector-Plastics0 -
Blog URLs
I read somewhere - pretty sure is was in Art of SEO - that having dates in the blog permalink URLs was a bad idea. e.g. /blog/2011/3/my-blog-post/ However, looking at Wordpress best practice, it's also not a good idea to have a URL without a number - it's more resource hungry if you don't , apparently. e.g. /blog/my-blog-post/ Does anyone have any views on this? Thanks Ben
Technical SEO | | atticus70 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3