Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's best practice for cart pages?
i don't mean e-commerce site in general, but the actual cart page itself. What's best practice for the links that customers click to add products to the cart, and the cart page itself? Also, I use vanity URLs for my cart links which redirect to the actual cart page with the parameters applied. Should I use use 301 or 302 redirects for the links? Do I make the cart page's canonical tag point back to the store home page so that I'm not accruing link juice to a page that customers don't actually want to land on from search? I'm kinda surprised at the dearth of information out there on this, or maybe I'm not looking in the right places?
Technical SEO | | VM-Oz0 -
Moving wordpress to it's own server
Our company wants to remove wordpress from our current windows OS server at provider 1 and move it to a new server at provider 2. Godaddy handles our DNS. I would like to have it on the same domain without masking. I would like to make a DNS entry on godaddy so that our current server and our new server can use the same URL (ie sellstuff.com). But I only want the DNS to direct traffic to our current server. The goal here is to have the new server using the same URL as the old server so nothing needs to be masked once traffic is redirected with a 301 rule in the htaccess file. But no traffic outside of the 301 rule will end up going to the new server. I would then like to edit the htaccess file on our current server to redirect to the new servers IP address when someone goes to sellstuff.com/blog. Does this make since and is it possible?
Technical SEO | | larsonElectronics0 -
Soft 404's on a 301 Redirect...Why?
So we launched a site about a month ago. Our old site had an extensive library of health content that went away with the relaunch. We redirected this entire section of the site to the new education materials, but we've yet to see this reflected in the index or in GWT. In fact, we're getting close to 500 soft 404's in GWT. Our development team confirmed for me that the 301 redirect is configured correctly. Is it just a waiting game at this point or is there something I might be missing? Any help is appreciated. Thanks!
Technical SEO | | MJTrevens0 -
Why do some URLs for a specific client have "/index.shtml"?
Reviewing our client's URLs for a 301 redirect strategy, we have noticed that many URLs have "/index.shtml." The part we don'd understand is these URLs aren't the homepage and they have multiple folders followed by "/index.shtml" Does anyone happen to know why this may be occurring? Is there any SEO value in keeping the "/index.shtml" in the URL?
Technical SEO | | FranFerrara0 -
Google indexing less url's then containded in my sitemap.xml
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
Technical SEO | | Juist0 -
Replacing H1's with images
We host a few Japanese sites and Japanese fonts tend to look a bit scruffy the larger they are. I was wondering if image replacement for H1 is risky or not? eg in short... spiders see: Some header text optimized for seo then in the css h1 {
Technical SEO | | -Al-
text-indent: -9999px;
} h1.header_1{ background:url(/images/bg_h1.jpg) no-repeat 0 0; } We are considering this technique, I thought I should get some advise before potentially jeopardising anything, especially as we are dealing with one of the most important on page elements. In my opinion any attempt to hide text could be seen as keyword stuffing, is it a case that in moderation it is acceptable? Cheers0 -
How does a sitemap affect the definition of canonical URLs?
We are having some difficulty generating a sitemap that includes our SEO-friendly URLs (the ones we want to set as canonical), and I was wondering if we might be able to simply use the non-SEO-friendly, non-canonical URLs that the sitemap generator has been producing and then use 301 redirects to send them to the canonical. Is there a reason why we should not be doing this? We don't want search engines to think that the sitemap URLs are more important than the pages to which they redirect. How important is it that the sitemap URLs match the canonical URLs? We would like to find a solution outside of the generation of the sitemap itself as we are locked into using a vendor’s product in order to generate the sitemap. Thanks!
Technical SEO | | emilyburns0 -
Google shows the wrong domain for client's homepage
Whenever the homepage of my client's homepage appears in Google results, the search engine is not showing our URL as our domain, but instead a partner domain that is linking to us. (The correct title and meta description of our homepage is showing.) I believe this is caused by the partner website (with a much higher pank rank) linking to our homepage from their footer to a URL with it's own domain that 302 redirects to our homepage. Example: Link: http://www.partnerwebsite.com/?ad2203 302 redirects to: http://www.clientwebsite.com/?moreadtracking The simple fix would be for the client to ask for removal of the 302 hijacking link - but they are uncomfortable with this request since they had requested it prior, and their relationship is not the best. Is there any other way to fix this?
Technical SEO | | Conor_OShea_ETUS0