Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Clean URL vs. Parameter URL and Using Canonical URL...That's a Mouthfull!
Hi Everyone, I a currently migrating a Magento site over to Shopify Plus and have a question about best practices for using the canonical URL. There is a competitor that I believe is not doing it the correct way, so I want to make sure my way is the better choice. With 'Vendor Pages' in Shopify, they show up looking like: https://www.campusprotein.com/collections/vendors?q=Cellucor. Not as clean. Problem is that Shopify also creates https://www.campusprotein.com/collections/cellucor. Same products, same page, just a different more clean URL. I am seeing both indexed in Google. What I want to do is basically create a canonical URL from the URL with the parameter that points to the clean URL. The two pages are very similar. The only difference is that the clean URL page has some additional content at the top of the page. I would say the two pages are 90% the same. Do you see any issue with that?
Technical SEO | | vetofunk0 -
Why are only PDFs on my client's site being indexed, and not actual pages?
My client has recently built a new site (we did not build this), which is a subdomain of their main site. The new site is: https://addstore.itelligencegroup.com/uk/en/. (Their main domain is: http://itelligencegroup.com/uk/) This new Addstore site has recently gone live (in the past week or so) and so far, Google appears to have indexed 56 pdf files that are on the site, but it hasn't indexed any of the actual web pages yet. I can't figure out why though. I've checked the robots.txt file for the site which appears to be fine: https://addstore.itelligencegroup.com/robots.txt. Does anyone have any ideas about this?
Technical SEO | | mfrgolfgti0 -
Unused url 'A' contains frameset - can it damage the other site B?
Client has an old unused site 'A' which I've discovered during my backlink research. It contains this source code below which frames the client's 'proper' site B inside the old unused url A in the browser address. Quick question - will google penalise the website B which is the one I'm optimising? Should the client be using a redirect instead? <frameset <span class="webkit-html-attribute-name">border='0' frameborder='0' framespacing='0'></frameset <span> <frame src="http: www.clientwebsite.co.ukb" frameborder="0" noresize="noresize" scrolling="yes"></frame src="http:> Please go to http://www.clientwebsite.co.ukB <noframes></noframes> Thanks, Lu.
Technical SEO | | Webrevolve0 -
What's best practice for blog meta titles?
I have the option of placing meta titles on the actual blog, or on the blog category on my site. Should I have separate meta titles for each blog or bundle them under a category and try to drive traffic to the category? Can anyone help with best practice?
Technical SEO | | Lubeman0 -
SEOMoz is indicating I have 40 pages with duplicate content, yet it doesn't list the URL's of the pages???
When I look at the Errors and Warnings on my Campaign Overview, I have a lot of "duplicate content" errors. When I view the errors/warnings SEOMoz indicates the number of pages with duplicate content, yet when I go to view them the subsequent page says no pages were found... Any ideas are greatly welcomed! Thanks Marty K.
Technical SEO | | MartinKlausmeier0 -
What's the best way to eliminate duplicate page content caused by blog archives?
I (obviously) can't delete the archived pages regardless of how much traffic they do/don't receive. Would you recommend a meta robot or robot.txt file? I'm not sure I'll have access to the root directory so I could be stuck with utilizing a meta robot, correct? Any other suggestions to alleviate this pesky duplicate page content issue?
Technical SEO | | ICM0 -
If a page isn't linked to or directly sumitted to a search engine can it get indexed?
Hey Guys, I'm curious if there are ways a page can get indexed even if the page isn't linked to or hasn't been submitted to a search engine. To my knowledge the following page on our website is not linked to and we definitely didn't submit it to Google - but it's currently indexed: <cite>takelessons.com/admin.php/adminJobPosition/corp</cite> Anyone have any ideas as to why or how this could have happened? Hopefully I'm missing something obvious 🙂 Thanks, Jon
Technical SEO | | TakeLessons0 -
Google has not indexed my site in over 4 weeks, what's the problem?
We recently put in permanent redirects to our new url, but Google seems to not want to index the new url. There was no problems with the old url and the new url is brand new so should have no 'black marks' against it. We have done everything we can think off in terms of submitting site maps, telling google our url has changed in webmaster tools, mentioning the new url on social sites etc...but still nothing. It has been over 4 weeks now since we set up the redirects to the url, any ideas why Google seems to be choosing not to index it? Thanks
Technical SEO | | cewe0