Sitemap.xml - autogenerated by CMS is full of crud
-
Hi all,
hope you can help.
the Magento ecommerce system I'm working with autogenerates sitemap.xml - it's well formed with priority and frequency parameters.
However, it has generated lots of URLs that are pointing to broken pages returning fatal erros, duplicate URLs (not canonicals), 404s etc
I'm thinking of hand creating sitemap.xml - the site has around 50 main pages including products and categories, and I can get the main page URLs listed by screaming frog or xenu.
Then I'll have to get into the hand editing the crud pages with noindex, and useful duplicates with canonicals.
Is this the way to go or is there another solution
thanks in advance for any advice
-
If the cron is working then I would personally turn to the other forum to see if anyone knows a way to rope those messy URLs in and get them under control. I try to avoid manually generating and updating sitemaps whenever I can, because it's a hassle on a small site, not to mention the trouble on an ecommerce site.
If your site is going to stay that small, then a manual sitemap might be less of a headache for you than customizing Magento.
I would worry about keeping a clean sitemap. If the search engines learn that you keep a messy sitemap, they will rely on it less and less. 404 & 500 codes especially, but also redirects and perhaps duplicate content.
For Further Reading:
Google Sitemaps Ask For Clean URLs - http://www.johnfdoherty.com/google-sitemaps-ask-for-clean-urls/
-
Hi Kane,
the sitemap is new - it's just that Magento create lots of duplicate files on the fly & it's not putting the canonical URLs in the sitemap etc.
I just wondered whether its worth hand creating a sitemap.xml containing the content pages (60 or 70 of them) for this relatively small site, or not worry too much about the sitemap, the site is pretty well indexed by google already
I'll head over to the Magento forums again to see if I can find more info
many thanks for you help
-
If it's returning 404 pages, that sounds like a dated sitemap. Have you activated the cron service?
See the "Refreshing Sitemaps at Regular Intervals" section of this page if not:
Magento can be set up to automatically refresh Google Sitemaps at regular intervals. This function is configured in Admin > System > Configuration > Google Sitemap.
To use Magento’s automatic generation of Google Sitemaps, you must activate the Magento Cron service.
If you do have that setup, and you're certain it's working correctly, then I would turn to the forums at MagentoCommerce.com - you're going to get a lot faster answer there since everyone is familiar with that exact platform.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
XML sitemap and rel alternate hreflang requirements for Google Shopping
Our company implemented Google Shopping for our site for multiple countries, currencies and languages. Every combination of language and country is accessible via a url path and for all site pages, not just the pages with products for sale. I was not part of the project. We support 18 languages and 14 shop countries. When the project was finished we had a total of 240 language/country combinations listed in our rel alternate hreflang tags for every page and 240 language/country combinations in our XML sitemap for each page and canonicals are unique for every one of these page. My concern is with duplicate content. Also I can see odd language/country url combinations (like a country with a language spoken by a very low percentage of people in that country) are being crawled, indexed, and appearing in serps. This uses up my crawl budget for pages I don't care about. I don't this it is wise to disallow urls in robots.txt for that we are simultaneously listing in the XML sitemap. Is it true that these are requirements for Google Shopping to have XML sitemap and rel alternate hreflang for every language/country combination?
Technical SEO | | awilliams_kingston0 -
Sudden Drop in Indexed Pages and Images under Sitemap
Hello! Just a couple days back, realised that under the Google Webmaster Tool > Sitemap, my website www.bibliotek.co has a sudden drop in indexed pages and images. Previously, it was almost fully indexed. However, I checked and the Google Index > Index Status, it is still fully indexed Any reason why and how do I resolve? Any help is very much appreciated! Thanks in advance!
Technical SEO | | Bibliotek1230 -
301 Redirects Relating to Your XML Sitemap
Lets say you've got a website and it had quite a few pages that for lack of a better term were like an infomercial, 6-8 pages of slightly different topics all essentially saying the same thing. You could all but call it spam. www.site.com/page-1 www.site.com/page-2 www.site.com/page-3 www.site.com/page-4 www.site.com/page-5 www.site.com/page-6 Now you decided to consolidate all of that information into one well written page, and while the previous pages may have been a bit spammy they did indeed have SOME juice to pass through. Your new page is: www.site.com/not-spammy-page You then 301 redirect the previous 'spammy' pages to the new page. Now the question, do I immediately re-submit an updated xml sitemap to Google, which would NOT contain all of the old URL's, thus making me assume Google would miss the 301 redirect/seo juice. Or do I wait a week or two, allow Google to re-crawl the site and see the existing 301's and once they've taken notice of the changes submit an updated sitemap? Probably a stupid question I understand, but I want to ensure I'm following the best practices given the situation, thanks guys and girls!
Technical SEO | | Emory_Peterson0 -
Linking to CMS page ID
Hi all, Is it that detrimental to SEO if you link to the CMS page ID of a URL rather than the text URL of a page even if when you look at the source code Google sees it as a text URL? Thanks! 🙂
Technical SEO | | Diana.varbanescu0 -
Best XML Sitemap Generator for Mac?
Hi all, Recently moved from PC to Mac when starting a new job. One of the things I'm missing from my PC is G Site Crawler, and I haven't yet found a decent equivalent for the Mac. Can anybody recommend something as good as G Site Crawler for the Mac? I.e. I need the flexibility to exclude by URL parameter etc etc. Cheers everyone, Mark
Technical SEO | | markadoi840 -
301 technical issue / sitemap being found in search
Hi I recently did some 301 redirects. But now when I type into the search results 'Kyocera black and white printers' my site map is coming up in the listings - does any have any advise as to why this may be happening and how to fix it? Please see image attached, thanks. 1XqHW.jpg
Technical SEO | | Socialdude0 -
Sitemap.xml problem in Google webmaster
Hi, My sitemap.xml is not submitting correctly in Google Webmaster. There is 697 url submitted but only 56 are in Google index. At the top of webmaster this is what it says ->>> http://www.example.com/sitemap.xml has been resubmitted. But when when I clicked status button RED X occurs. Any suggestions about this, thanks...
Technical SEO | | Socialdude0 -
Sitemap with links and images together
Hi there, my e-commerce platform (Magento with "Mageworx SEO Enterprise" plugin) is generating an sitemap.xml that mix text (links) and images, and the result is something like that: <url><loc>http://www.e-lustre.com.br/abajur/abajur-keops</loc>
Technical SEO | | e-Lustre
<lastmod>2011-04-10</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
image:imageimage:lochttp://www.e-lustre.com.br/catalog/product/image/size/250x250/e/-/e- lustre_mantra_0030_grande.jpg</image:loc></image:image></url> WebmasterTools accepts it, but recognize it as an image sitemap. Have you seen that kind of sitemap, and most important, do you think that's a problem ? Full file: http://www.e-lustre.com.br/sitemap.xml0