Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404's being re-indexed
Hi All, We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually. However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's. Thanks
Technical SEO | | Tom3_151 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
Site Migration between CMS's
Hi There, I have a technical question about migrating CMS's but not servers. My client has site A on Joomla install, He want's ot migrate to Wordpress and we will call this site B. As he has a lot of old content on site A he doesn't want to lose, he has put site B (wordpress install) on a subdirectory site.com/siteb (for example). and will use a htaccess to forward the root domain to this wordpress site. Therefore anyone going to www.site.com will see the new wordpress site and the old content and joomla install will sit on the root of the server. Will Google have an issue with this? Will it even find the old content? what are the issues for the new site and new content? Look forward getting your guys input
Technical SEO | | nezona1 -
Pro's & contra's: http vs https
Hi there, We are planning to take the step and go from http to https. The main reason to do this, is to mean trustfull to our clients. And of course the rumours that it would be better for ranking (in the future). We have a large e-commerce site. A part of this site ia already HTTPS. I've read a lot of info about pro's and contra's, also this MOZ article: http://moz.com/blog/seo-tips-https-ssl
Technical SEO | | Leonie-Kramer
But i want to know some experience from others who already done this. What did you encountered when changing to HTTPS, did you had ranking drops, or loss of links etc? I want to make a list form pro's and contra's and things we have to do in advance. Thanx, Leonie0 -
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated! Thanks
Technical SEO | | BestRide0 -
Blocked URL's by robots.txt
In Google Webmaster Tools shows me 10,936 Blocked URL's by robots.txt and it is very strange when you go to the "Index Status" section where shows that since April 2012 robots.txt blocked many URL's. You can see more precise on the image attached (chart WMT) I can not explain why I have blocked URL's ? because I have nothing in robots.txt.
Technical SEO | | meralucian37
My robots.txt is like this: User-agent: * I thought I was penalized by Penguin in April 2012 because constantly i'am losing visitors now reaching over 40%. It may be a different penalty? Any help is welcome because i'm already so saturated. Mera robotstxt.jpg0 -
If multiple links on a page point to the same URL, and one of them is no-followed, does that impact the one that isn't?
Page A has two links on it that both point to Page B. Link 1 isn't no-follow, but Link 2 is. Will Page A pass any juice to Page B?
Technical SEO | | Jay.Neely0 -
What's the best way to deal with an entire existing site moving from http to https?
I have a client that just switched their entire site from the standard unsecure (http) to secure (https) because of over-zealous compliance issues for protecting personal information in the health care realm. They currently have the server setup to 302 redirect from the http version of a URL to the https version. My first inclination was to have them simply update that to a 301 and be done with it, but I'd prefer not to have to 301 every URL on the site. I know that putting a rel="canonical" tag on every page that refers to the http version of the URL is a best practice (http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394), but should I leave the 302 redirects or update them to 301's. Something seems off to me about the search engines visiting an http page, getting 301 redirected to an https page and then being told by the canonical tag that it's actually the URL they were just 301 redirected from.
Technical SEO | | JasonCooper0