Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404's being re-indexed
Hi All, We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually. However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's. Thanks
Technical SEO | | Tom3_151 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
HTTP URLs Still in Index
One of the sites I manage was migrated to secure 2 months ago. XML sitemaps have been updated, canonical tags all have https:, and a redirect rule was applied. Despite all this, I'm still seeing non-secure URLs in Google's index. The weird thing is, when I click those links, they go to the secure version. Has anyone else seen weird things with Google not properly indexing secure versions of URLs?
Technical SEO | | LoganRay0 -
Best practices when merging 2 domains with different themes and CMS's?
I have a client with 2 sites - one for an external audience and one for their ~2,000-3,000 employees. The external site (call it acme.com), built on WP with a custom theme, is pretty small. The internal site (call it acmeinternal.com) has TONS of high quality content with incredible engagement metrics, but it's built on a separate CMS with an entirely different custom theme. The problem we're trying to solve now: Can we bring the internal site over to the external domain (acme.com and acme.com/internal, for example) so that client.com can benefit from the quantity and quality of content and behavioral metrics associated with the internal content? The external and internal audiences, and the corresponding content for each, are both entirely mutually exclusive. A potential client of theirs who would come to acme.com would have no reason to visit acme.com/internal (we'd actually prefer to not provide navigation to it for them), and the internal audience would treat acme.com/internal as their landing page, and all the posts would then live at acme.com/internal/news/post-name. I'm assuming there are reasons why we couldn't have half of the site on one template using one CMS, having certain SEO tags, certain HTML structure, etc where the other half of the site is using a completely different template with a different CMS with different SEO tags, different URL structure etc? To reap the reward of the great content, would we have to essentially recreate the internal site's content on the external site's cms and template? Is it even possible for the domain authority of acme.com to improve based on the engagement on acme.com/internal/_xxxx _if there's virtually zero linking back and forth between acme.com and /internal/? Any advice would be much appreciated!
Technical SEO | | ThinkAOR0 -
Submitting a new sitemap index file. Only one file is getting read. What is the error?
Hi community, I am working to submit a new a new sitemap index files, where about 5 50,000 sku files will be uploaded. Webmasters is reporting that only 50k skus have been submitted. Google Webmasters is accepting the index, however only the first file is getting read. I have 2 errors and need to know if this is the reason that the multiple files are not getting uploaded. Errors: | 1 | | Warnings | Invalid XML: too many tags | Too many tags describing this tag. Please fix it and resubmi | | 2 | | Warnings | Incorrect namespace | Your Sitemap or Sitemap index file doesn't properly declare the namespace. | 1 | Here is the url I am submitting: http://www.westmarine.com/sitemap/wm-sitemap-index.xml | 1 | | | | |
Technical SEO | | mm9161570 -
Should you change Temporary redirects 302's to a 301 even if page is not important/intended for ranking ?
Hi Whilst i appreciate its best practice to 301 redirect permanently moved pages, what if the page is say a login page or other page you not really interested in ranking or transferring juice to ? is it still important/best practice to do so simply because the page has permanently moved hence should still be a 301 even though you don't really want it to rank ? cheers dan
Technical SEO | | Dan-Lawrence1 -
What's the rules on overly dynamic URLs ?
Developer says "Overly-Dynamic URL. Developer says that this is the hardest and complex part. It will be possible to change all of the search criterias to use ( / )
Technical SEO | | stewbuch1872
But in this case each of the pages will be indexed and every time listing gets added, content will get changed. Which for example Google will start blocking what is the best way to address this and will google block as suggested ? thanks0 -
What's the website that analyzes all local business submissions?
I was recently looking at a blog post here or a webinar and it showed a website where you could see all of the local sites (yelp, Google places) where your business has been submitted. It was an automated tool. Does anyone remember the name of the site?
Technical SEO | | beeneeb0