Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden Drop in Indexed Pages and Images under Sitemap
Hello! Just a couple days back, realised that under the Google Webmaster Tool > Sitemap, my website www.bibliotek.co has a sudden drop in indexed pages and images. Previously, it was almost fully indexed. However, I checked and the Google Index > Index Status, it is still fully indexed Any reason why and how do I resolve? Any help is very much appreciated! Thanks in advance!
Technical SEO | | Bibliotek1230 -
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
What's with the redirects?
Hi there,
Technical SEO | | HeadStud
I have a strange issue where pages are redirecting to the homepage.Let me explain - my website is http://thedj.com.au Now when I type in www.thedj.com.au/payments it redirects to https://thedj.com.au (even though it should be going to the page https://thedj.com.au/payments). Any idea why this is and how to fix? My htaccess file is below: BEGIN HTTPS Redirection Plugin <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteRule ^home.htm$ https://thedj.com.au/ [R=301,L]
RewriteRule ^photos.htm$ http://photos.thedj.com.au/ [R=301,L]
RewriteRule ^contacts.htm$ https://thedj.com.au/contact-us/ [R=301,L]
RewriteRule ^booking.htm$ https://thedj.com.au/book-dj/ [R=301,L]
RewriteRule ^downloads.htm$ https://thedj.com.au/downloads/ [R=301,L]
RewriteRule ^payonline.htm$ https://thedj.com.au/payments/ [R=301,L]
RewriteRule ^price.htm$ https://thedj.com.au/pricing/ [R=301,L]
RewriteRule ^questions.htm$ https://thedj.com.au/faq/ [R=301,L]
RewriteRule ^links.htm$ https://thedj.com.au/links/ [R=301,L]
RewriteRule ^thankyous/index.htm$ https://thedj.com.au/testimonials/ [R=301,L]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://thedj.com.au/ [L,R=301]</ifmodule> END HTTPS Redirection Plugin BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress RewriteCond %{HTTP_HOST} ^mrdj.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.net.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^mrdj.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^theperthweddingdjs.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.theperthweddingdjs.com$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.net.au$
RewriteRule ^/?$ "https://thedj.com.au" [R=301,L]0 -
Some URLs in the sitemap not indexed
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index. It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible. Any ideas on why this is? Is this type of discrepancy typical?
Technical SEO | | Mase0 -
Sitemap
Hi, I am setting up a new sitemap for our website. the website contains about 8000 - 10.000 pages. Of wich are 6000 productpages. I have 10 categories, about 80 sub-catagories and about 400 sub-sub categories ( these ar my most important landingpages) At this moment our sitemap is only 1 MB. From that point of view 1 sitemap will be enough. But can i take SEO advantage by splitting this sitemap in 10 categories? Or are there other ways to set it up for a better SEO? Thanks!
Technical SEO | | Leonie-Kramer0 -
Robots.txt crawling URL's we dont want it to
Hello We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to. Does anyone have an ideas how we can stop this happening?
Technical SEO | | ShearingsGroup0 -
Are URL's with trailing slash seen as two different URLs
Hello, http://www.example.com and http://ww.example.com/ Are these seen as two different URL's ? Just as with www or non www ? Or it doesn't make any difference ?
Technical SEO | | seoug_20050 -
What's the difference between a category page and a content page
Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? ( I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill
Technical SEO | | wparlaman0