Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect indexed lightbox URLs?
Hello all, So I'm doing some technical SEO work on a client website and wanted to crowdsource some thoughts and suggestions. Without giving away the website name, here is the situation: The website has a dedicated /resources/ page. The bulk of the Resources are industry definitions, all encapsulated in colored boxes. When you click on the box, the definition opens in a lightbox with its own unique URL (Ex: /resources/?resource=augmented-reality). The information for these colored lightbox definitions is pulled from a normal resources page (Ex: /resources/augmented-reality/). Both of these URLs are indexed, leading to a lot of duplicate indexed content. How would you approach this? **Things to Consider: ** -Website is built on Wordpress with a custom theme.
Technical SEO | | Alces
-I have no idea how to even find settings for the lightbox (will be asking the client today).
-Right now my thought is to simply disallow the lightbox URL in robots.txt and hope Google will stop crawling and eventually drop from the index.
-I've considered adding the main resource page canonical to the lightbox URL, but it appears to be dynamically created and thus there is no place to access (outside of the FTP, I imagine?). I'm most rusty with stuff like this, so figured I'd appeal to the masses for some assistance. Thanks! -Brad0 -
Google tries to index non existing language URLs. Why?
Hi, I am working for a SAAS client. He uses two different language versions by using two different subdomains.
Technical SEO | | TheHecksler
de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly. But Google Search Console tries to index URLs which were never existing before and are still not existing. de.domain.com**/en/company
en.domain.com/de/**company ... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that 😉 ). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
We do see in our logfiles, that a Screaming Frog Installation and moz.com w opensiteexplorer were trying to access this earlier. My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed? Any ideas? Thanks 🙂0 -
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages.
Sitemap.gz is being indexed and is showing up in SERP instead of actual pages. I recently uploaded my sitemap file - https://psglearning.com/sitemapcustom/sitemap-index.xml - via Search Console. The only record within the XML file is sitemaps.gz. When I searched for some content on my site - here is the search https://goo.gl/mqxBeq - I was shown the following search result, indicating that our GZ file is getting indexed instead of our pages. http://www.psglearning.com/catalog 1 http://www.psglearning.com ...www.psglearning.com/sitemapcustom/sitemap.gz... 1 https://www.psglearning.com/catalog/productdetails/9781284059656/ 1 https://www.psglearning.com/catalog/productdetails/9781284060454/ 1 ... My sitemap is listed at https://psglearning.com/sitemapcustom/sitemap-index.xml inside the sitemap the only reference is to sitemap.gz. Should we remove the link the the sitemap.gz within the xml file and just serve the actual page paths? <sitemapindex< span=""> xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"></sitemapindex<><sitemap></sitemap>https://www.psglearning.com/sitemapcustom/sitemap.gz<lastmod></lastmod>2017-06-12T09:41-04:00
Technical SEO | | pdowling0 -
To integrate a blog tool onto site - or build a blog solution - what's better for SEO?
Currently looking at adding a blog to our company site subdirectory and wanted to know if there was a SEO distinction between the following methods: Integrating a bolt-on blog tool with the site to create the blog VS. just using the current site infrastructure to build blog functionality. What's better for SEO? (and if tool integration is the overwhelming response - which tool?). Cheers.
Technical SEO | | Oxfordcomma0 -
Any idea why our sitemap images aren't indexed?
Here's our sitemap: http://www.driftworks.com/shop/sitemap/dw_sitemap.xml In google webmaster tools, I can see the sitemap report and it says: Items:Web Submitted:2,798 Indexed:2,910 Items:Images Submitted:3,178 Indexed:0 Do you have any idea why our images are not being indexed according to webmaster tools? I checked a few of the image URLs and they worked nicely. Thanks in advance, J
Technical SEO | | DWJames0 -
Rel cannonical on all my URL's
Hi, sorry if this question has already been asked, but I can't seem to find the correct answer. In my crawling report for the domain: http://www.wellbo.de I get rel cannonical notices. I have redirected all pages of http://wellbo.de to http://www.wellbo.de with a 301 redirect. Where is my error? Why do I get these notices? I hope the image helps. Ep7Rw.jpg
Technical SEO | | wellbo0 -
How do I 301 url's with numbers in them?
I have a number of 404 error pages showing in webmaster tools and some of the url's have numbers, % symbols, and some are pdf's. My usual 301 redirect in my htaccess file does NOT redirect these pages where the url's have special characters. What am I doing wrong?
Technical SEO | | BradBorst0 -
A sitemap... What's the purpose?
Hello everybody, my question is really simple: what's the purpose of a sitemap? It's to help the robots to crawl your website but if you're website has a good architecture, the robots will be able to crawl your site easily! Am I wrong? Thank you for yours answers, Jonathan
Technical SEO | | JonathanLeplang0