Sitemap Query
-
I've decided to write my own sitemap because frankly, the automated ones pull all kinds of out of I don't know where. So to get around that, manual it is. But I have some products appear in various categories, should I still list every product in each category in the sitemap, regardless of some being duplicates, or should I choose the most relevant category and list them there?
I do have a canonical URL extension which should resolve any duplicate content I have.
-
Hi,
It's ideal if your XML sitemap is an accurate representation of URLs you want indexed, i.e. the canonical versions. If you're using Screaming Frog to manually build your sitemaps, you make sure the 'Include Canonicals' button is unchecked. Doing so will trigger Screaming Frog to automatically leave out any URLs that canonicalize towards another URL, thus solving your problem.
-
If you have canonical you can put all the links, because in this way Google-Bot will crawl all the pages and see easier which have canonical and which is unique.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trouble Indexing one of our sitemaps
Hi everyone thanks for your help. Any feedback is appreciated. We have three separate sitemaps: blog/sitemap.xml events.xml sitemap.xml Unfortunately we keep trying to get our events sitemap to pickup and it just isn't happening for us. Any input on what could be going on?
Intermediate & Advanced SEO | | TicketCity0 -
Big discrepancies between pages in Google's index and pages in sitemap
Hi, I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.) Anyone know possible causes for this and how i can fix? It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? Any help appreciated, Karen
Intermediate & Advanced SEO | | Digirank0 -
Best server-side sitemap generators
I've been looking into sitemap generators recently and have got a good knowledge of what creating a sitemap for a small website of below 500 URLs involves. I have successfully generated a sitemap for a very small site, but I’m trying to work out the best way of crawling a large site with millions of URLs. I’ve decided that the best way to crawl such a large number of URLs is to use a server side sitemap, but this is an area that doesn’t seem to be covered in detail on SEO blogs / forums. Could anyone recommend a good server side sitemap generator? What do you think of the automated offerings from Google and Bing? I’ve found a list of server side sitemap generators from Google, but I can’t see any way to choose between them. I realise that a lot will depend on the type of technologies we use server side, but I'm afraid that I don't know them at this time.
Intermediate & Advanced SEO | | RG_SEO0 -
Indexing falling/search queries the same - concerned
Hello, I posted abou this a few days ago but didn't really get anywhere and now have new information after looking into it more. This is my site - http://www.whosjack.org My page indexing has been falling steadily daily currently from thousands of pages indexed to just a couple of hundred. My search queries don't seem to be currently affected, I have done crawl tests to see if the site can be crawled and put the site:whosjack.org into Google and had 12,000 results come back when goole has said it has indexed 133 and falling. However all pages indexed on the site:whosjack.org search seem to be stories with just two words in the title? I am sure I am missing out on traffic here but can't work out what the issue is and how to fix it. I have no alerts on my dashboard and when I submit sitemaps to webmaster tools I get 15,115 URLs submitted 12,088 URLs indexedwhich cant be bad?Any help/suggestions really appreciated.
Intermediate & Advanced SEO | | luwhosjack0 -
302 redirects in the sitemap?
My website uses a prefix at the end to instruct the back-end about visitor details. The setup is similar to this site - http://sanfrancisco.giants.mlb.com/index.jsp?c_id=sf with a 302 redirect from the normal link to the one with additional info and a canonical tag on the actual URL without the extra info ((the normal one here being http://sanfrancisco.giants.mlb.com,) However, when I used www.xml-sitemaps.com to create a sitemap they did so using the URLs with the extra info on the links... what should I do to create a sitemap using the normal URLs (which are the ones I want to be promoting)
Intermediate & Advanced SEO | | theLotter0 -
Sitemap Folders on Search Results
Hello! We are managing SEO campaign of a video website. We have an issue about sitemap folders. I have sitemaps like ** /xml/sitemap-name.xml .** But Google is indexing my /xml/ folder and also sitemaps and they appear in search results. If i will add Disallow: /xml/ to my robots.txt and remove /xml/ folder from webmaster tools, Google could see my sitemaps? or it ignores them? Will my site effect negatively after remove /xml/ folder completely from search results? What should i do?
Intermediate & Advanced SEO | | roipublic0 -
Is there a way to keep sitemap.xml files from getting indexed?
Wow, I should know the answer to this question. Sitemap.xml files have to be accessible to the bots for indexing they can't be disallowed in robots.txt and can't block the folder at the server level. So how can you allow the bots to crawl these xml pages but have them not show up in google's index when doing a site: command search, or is that even possible? Hmmm
Intermediate & Advanced SEO | | irvingw0 -
XML Sitemap Index Percentage (Large Sites)
Hi all I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages). What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap. I'm trying to figure out whether, The average index % out there There is a ceiling (i.e. will never reach 100%) It's possible to improve the indexing percentage further Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further. I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet. However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites. Any suggestions/insights would be appreciated. Thanks.
Intermediate & Advanced SEO | | danng0