Sitemap.xml - autogenerated by CMS is full of crud
-
Hi all,
hope you can help.
the Magento ecommerce system I'm working with autogenerates sitemap.xml - it's well formed with priority and frequency parameters.
However, it has generated lots of URLs that are pointing to broken pages returning fatal erros, duplicate URLs (not canonicals), 404s etc
I'm thinking of hand creating sitemap.xml - the site has around 50 main pages including products and categories, and I can get the main page URLs listed by screaming frog or xenu.
Then I'll have to get into the hand editing the crud pages with noindex, and useful duplicates with canonicals.
Is this the way to go or is there another solution
thanks in advance for any advice
-
If the cron is working then I would personally turn to the other forum to see if anyone knows a way to rope those messy URLs in and get them under control. I try to avoid manually generating and updating sitemaps whenever I can, because it's a hassle on a small site, not to mention the trouble on an ecommerce site.
If your site is going to stay that small, then a manual sitemap might be less of a headache for you than customizing Magento.
I would worry about keeping a clean sitemap. If the search engines learn that you keep a messy sitemap, they will rely on it less and less. 404 & 500 codes especially, but also redirects and perhaps duplicate content.
For Further Reading:
Google Sitemaps Ask For Clean URLs - http://www.johnfdoherty.com/google-sitemaps-ask-for-clean-urls/
-
Hi Kane,
the sitemap is new - it's just that Magento create lots of duplicate files on the fly & it's not putting the canonical URLs in the sitemap etc.
I just wondered whether its worth hand creating a sitemap.xml containing the content pages (60 or 70 of them) for this relatively small site, or not worry too much about the sitemap, the site is pretty well indexed by google already
I'll head over to the Magento forums again to see if I can find more info
many thanks for you help
-
If it's returning 404 pages, that sounds like a dated sitemap. Have you activated the cron service?
See the "Refreshing Sitemaps at Regular Intervals" section of this page if not:
Magento can be set up to automatically refresh Google Sitemaps at regular intervals. This function is configured in Admin > System > Configuration > Google Sitemap.
To use Magento’s automatic generation of Google Sitemaps, you must activate the Magento Cron service.
If you do have that setup, and you're certain it's working correctly, then I would turn to the forums at MagentoCommerce.com - you're going to get a lot faster answer there since everyone is familiar with that exact platform.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Should I add my html sitemap to Robots?
I have already added the .xml to Robots. But should I also add the html version?
Technical SEO | | Trazo0 -
Duplicate Content due to CMS
The biggest offender of our website's duplicate content is an event calendar generated by our CMS. It creates a page for every day of every year, up to the year 2100. I am considering some solutions: 1. Include code that stops search engines from indexing any of the calendar pages 2. Keep the calendar but re-route any search engines to a more popular workshops page that contains better info. (The workshop page isn't duplicate content with the calendar page). Are these solutions possible? If so, how do the above affect SEO? Are there other solutions I should consider?
Technical SEO | | ycheung0 -
Should 301-ed links be removed from sitemap?
In an effort to do some housekeeping on our site we are wanting to change the URL format for a couple thousand links on our site. Those links will all been 301 redirected to corresponding links in the new URL format. For example, old URL format: /tag/flowers as well as search/flowerswill be 301-ed to, new URL format: /content/flowers**Question:**Since the old links also exist in our sitemap, should we add the new links to our sitemap in addition to the old links, or replace the old links with new ones in our sitemap? Just want to make sure we don’t lose the ranking we currently have for the old links.Any help would be appreciated. Thanks!
Technical SEO | | shawn811 -
Google Webmaster tools: Sitemap.xml not processed everyday
Hi, We have multiple sites under our google webmaster tools account with each having a sitemap.xml submitted Each site's sitemap.xml status ( attached below ) shows it is processed everyday except for one _Sitemap: /sitemap.xml__This Sitemap was submitted Jan 10, 2012, and processed Oct 14, 2013._But except for one site ( coed.com ) for which the sitemap.xml was processed only on the day it is submitted and we have to manually resubmit every day to get it processed.Any idea on why it might?thank you
Technical SEO | | COEDMediaGroup0 -
Sitemap nos being indexed
Hi! How are you? I'm having a problem: for some reason I don't understand, Google Webmasters Tool isn't indexing the sitemaps I'm uploading. One of them is http://chelagarto.com/index.php?option=com_xmap&sitemap=1&view=xml&lang=en . Do you see what could be the problem? It says it only indexed 2 website. I've already sent this Sitemap several times and I'm always getting the same result. I'd really use some advice. Thanks!
Technical SEO | | arielbortz0 -
Siemap.xml appearing in SERP
My sitemap.xml was appearing in the google serp for certain keywords (& not my actual page onsite). Please see image. I recently blocked my sitemap.xml with a robots.txt exclusion but now the sitemap.xml is not getting crawled in google webmaster. Is this the correct method of excluding the sitemap.xml for the serp? User-agent: * Disallow: /assets/cache/ Disallow: /assets/docs/ Disallow: /assets/export/ Disallow: /assets/import/ Disallow: /assets/modules/ Disallow: /assets/plugins/ Disallow: /assets/snippets/ Disallow: /manager/ Disallow: /sitemap.xml Sitemap: http://bryansryan.ie/sitemap.xml Any suggestions what should be done here? thanks. nQo2g.png
Technical SEO | | Socialdude0 -
Does google recognize original content when affiliates use xml-feeds of this content
Hi, Concerning the upcoming (We're from the Netherlands) Panda release: -Could the fact that our affiliates use XML-feeds of our content effect our rankings in some way -Is it possible to indicate to google that content is yours? Kind regards, Dennis Overbeek dennis@acsi.eu | ACSI publishing | www.suncamp.nl | www.eurocampings.eu
Technical SEO | | SEO_ACSI0