Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Community Discussion - What's been your experience with accessibility?
When Laura Lippay came to me with the idea to write a series of posts on the Moz blog about SEO and accessibility, it really got my gears turning. As the blog manager, I realized I'd been thinking about all sorts of ways to make the blog the best it can be, but accessibility was one place I had yet to explore in-depth. While I have my own goals and projects around this topic churning along in the background, I'd love to hear what the community's done to be inclusive to all users of the Internet. What've you struggled with in terms of making sites you've worked on accessible -- both technically and as an initiative in general? What's often missing that you've become passionate about including? Do you have any big wins you're especially proud of and want to share? Looking forward to reading your thoughts and stories, folks! 🙂
Technical SEO | | FeliciaCrawford1 -
What's the best way to integrate off site inventory?
I can't seem to make any progress with my car dealership client in rankings or traffic. I feel like I've narrowed out most of the common problems, the only other thing I can see is that all their inventory is on a subdomain using a dedicated auto dealership software. Any suggestion of a better way to handle this situation? Am I missing something obvious? The url is rcautomotive.com Thanks for your help!
Technical SEO | | GravitateOnline0 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
Is there a way for me to automatically download a website's sitemap.xml every month?
From now on we want to store all our sitemap.xml over the next years. Its a nice archive to have that allows us to analyse how many pages we have on our website and which ones were removed/redirected. Any suggestions? Thanks
Technical SEO | | DeptAgency0 -
Why can't i get the page if i type/paste url directly?
Hello, just click the following link, http://www.tuscany-cooking-class.com/es/alojamiento/villa-pandolfini/ It might be show the 404 page, but follow this way, www.tuscany-cooking-class.com/es then select alojamiento link, then select first property name with villa-pandolfini, Now you can view the page content, why it behave like this, We are using joomla with customized. Anyone help me to fix this issue Thanks Advance Alex
Technical SEO | | massimobrogi0 -
Product landing page URL's for e-commerce sites - best practices?
Hi all I have built many e-commerce websites over the years and with each one, I learn something new and apply to the next site and so on. Lets call it continuous review and improvement! I have always structured my URL's to the product landing pages as such: mydomain.com/top-category => mydomain.com/top-category/sub-category => mydomain.com/top-category/sub-category/product-name Now this has always worked fine for me but I see more an more of the following happening: mydomain.com/top-category => mydomain.com/top-category/sub-category => mydomain.com/product-name Now I have read many believe that the longer the URL, the less SEO impact it may have and other comments saying it is better to have the just the product URL on the final page and leave out the categories for one reason or another. I could probably spend days looking around the internet for peoples opinions so I thought I would ask on SEOmoz and see what other people tend to use and maybe establish the reasons for your choices? One of the main reasons I include the categories within my final URL to the product is simply to detect if a product name exists in multiple categories on the site - I need to show the correct product to the user. I have built sites which actually have the same product name (created by the author) in multiple areas of the site but they are actually different products, not duplicate content. I therefore cannot see a way around not having the categories in the URL to help detect which product we want to show to the user. Any thoughts?
Technical SEO | | yousayjump0 -
What's the best way to switch over to a new site with a different CMS?
Is it better to 301 or to closely duplicate each page URL when switching over to a new website from an established site with good ranking and a different CMS ( Drupal switching to Wordpress)?
Technical SEO | | OhYeahSteve0 -
URL's for news content
We have made modifications to the URL structure for a particular client who publishes news articles in various niche industries. In line with SEO best practice we removed the article ID from the URL - an example is below: http://www.website.com/news/123/news-article-title
Technical SEO | | mccormackmorrison
http://www.website.com/news/read/news-article-title Since this has been done we have noticed a decline in traffic volumes (we have not as yet assessed the impact on number of pages indexed). Google have suggested that we need to include unique numerical IDs in the URL somewhere to aid spidering. Firstly, is this policy for news submissions? Secondly (if the previous answer is yes), is this to overcome the obvious issue with the velocity and trend based nature of news submissions resulting in false duplicate URL/ title tag violations? Thirdly, do you have any advice on the way to go? Thanks P.S. One final one (you can count this as two question credits if required), is it possible to check the volume of pages indexed at various points in the past i.e. if you think that the number of pages being indexed may have declined, is there any way of confirming this after the event? Thanks again! Neil0