Sitemap url's not being indexed

GreenStone

There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)

The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.

For example

Url in the sitemap: http://example.com/example-category/0246

Url once you actually go to that link: http://example.com/example-category/0246#.VR5a

Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.

Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?

Thanks all for your help.

GreenStone

Anders,

Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.

In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.

Thanks for your help.

AndersS

I agree - i probably would ignore everything after the "#".

But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.

Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?

Best regards,
Anders

GreenStone

Lesley,

Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.

At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.

Thoughts ?

Appreciate the help.

GreenStone

Patrick,

We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:

It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?

Thanks for your help.

LesleyPaone

Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.

Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls

PatrickDelehanty

Hi there

Could you provide you website's URL? It would help the community take a deeper look - thanks!

Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Sitemap url's not being indexed

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!

Why are only PDFs on my client's site being indexed, and not actual pages?

Unused url 'A' contains frameset - can it damage the other site B?

What's best practice for blog meta titles?

SEOMoz is indicating I have 40 pages with duplicate content, yet it doesn't list the URL's of the pages???

What's the best way to eliminate duplicate page content caused by blog archives?

If a page isn't linked to or directly sumitted to a search engine can it get indexed?

Google has not indexed my site in over 4 weeks, what's the problem?