Sitemap.xml - autogenerated by CMS is full of crud
-
Hi all,
hope you can help.
the Magento ecommerce system I'm working with autogenerates sitemap.xml - it's well formed with priority and frequency parameters.
However, it has generated lots of URLs that are pointing to broken pages returning fatal erros, duplicate URLs (not canonicals), 404s etc
I'm thinking of hand creating sitemap.xml - the site has around 50 main pages including products and categories, and I can get the main page URLs listed by screaming frog or xenu.
Then I'll have to get into the hand editing the crud pages with noindex, and useful duplicates with canonicals.
Is this the way to go or is there another solution
thanks in advance for any advice
-
If the cron is working then I would personally turn to the other forum to see if anyone knows a way to rope those messy URLs in and get them under control. I try to avoid manually generating and updating sitemaps whenever I can, because it's a hassle on a small site, not to mention the trouble on an ecommerce site.
If your site is going to stay that small, then a manual sitemap might be less of a headache for you than customizing Magento.
I would worry about keeping a clean sitemap. If the search engines learn that you keep a messy sitemap, they will rely on it less and less. 404 & 500 codes especially, but also redirects and perhaps duplicate content.
For Further Reading:
Google Sitemaps Ask For Clean URLs - http://www.johnfdoherty.com/google-sitemaps-ask-for-clean-urls/
-
Hi Kane,
the sitemap is new - it's just that Magento create lots of duplicate files on the fly & it's not putting the canonical URLs in the sitemap etc.
I just wondered whether its worth hand creating a sitemap.xml containing the content pages (60 or 70 of them) for this relatively small site, or not worry too much about the sitemap, the site is pretty well indexed by google already
I'll head over to the Magento forums again to see if I can find more info
many thanks for you help
-
If it's returning 404 pages, that sounds like a dated sitemap. Have you activated the cron service?
See the "Refreshing Sitemaps at Regular Intervals" section of this page if not:
Magento can be set up to automatically refresh Google Sitemaps at regular intervals. This function is configured in Admin > System > Configuration > Google Sitemap.
To use Magento’s automatic generation of Google Sitemaps, you must activate the Magento Cron service.
If you do have that setup, and you're certain it's working correctly, then I would turn to the forums at MagentoCommerce.com - you're going to get a lot faster answer there since everyone is familiar with that exact platform.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps:
Hello, doing an audit found in our sitemaps the tag which at the time was to say that the url was mobile. In our case the URL is the same for desktop and mobile.
Technical SEO | | romaro
Do you recommend leaving or removing it?
Thank you!0 -
Sitemap
Hi, I am setting up a new sitemap for our website. the website contains about 8000 - 10.000 pages. Of wich are 6000 productpages. I have 10 categories, about 80 sub-catagories and about 400 sub-sub categories ( these ar my most important landingpages) At this moment our sitemap is only 1 MB. From that point of view 1 sitemap will be enough. But can i take SEO advantage by splitting this sitemap in 10 categories? Or are there other ways to set it up for a better SEO? Thanks!
Technical SEO | | Leonie-Kramer0 -
Exclude Noindex, Followed pages from sitemap?
Hello Everyone! This is a question about my site, which is running on WordPress. Currently, I have category page to have the noindex, follow attributes, as they have little unique content. I do have them currently in my sitemap.xml file, however. Should I remove them from the sitemap since Google technically shouldn't index them? Thanks for your help!
Technical SEO | | Zachary_Russell0 -
Removing Redirected URLs from XML Sitemap
If I'm updating a URL and 301 redirecting the old URL to the new URL, Google recommends I remove the old URL from our XML sitemap and add the new URL. That makes sense. However, can anyone speak to how Google transfers the ranking value (link value) from the old URL to the new URL? My suspicion is this happens outside the sitemap. If Google already has the old URL indexed, the next time it crawls that URL, Googlebot discovers the 301 redirect and that starts the process of URL value transfer. I guess my question revolves around whether removing the old URL (or the timing of the removal) from the sitemap can impact Googlebot's transfer of the old URL value to the new URL.
Technical SEO | | RyanOD0 -
Best practice for XML sitemap depth
We run an eCommerce for education products with 20 or so subject based catalogues (Maths, Literacy etc) and each catalogue having numerous ranges (Counting, Maths Games etc) then products within those. We carry approximately 15,000 products. My question is around the sitemap we submit - nightly - and it's depth. It is currently set to cover off home, catalogues and ranges plus all static content (about us etc). Should we be submitting sitemaps to include product pages as well? Does it matter or would it not make much difference in terms of search. Thanks in advance.
Technical SEO | | TTS_Group0 -
Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site. One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK? Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search. Any help appreciated!
Technical SEO | | anilababla0 -
Changed cms - google indexes old and new pages
Hello again, after posting below problem I have received this answer and changed sitemap name Still I receive many duplicate titles and metas as google still compares old urls to new ones and sees duplicate title and description.... we have redirectged all pages properly we have change sitemap name and new sitemap is listed in webmastertools - old sitemap includes ONLY new sitemap files.... When you deleted the old sitemap and created a new one, did you use the same sitemap xml filename? They will still try to crawl old URLs that were in your previous sitemap (even if they aren't listed in the new one) until they receive a 404 response from the original sitemap. If anone can give me an idea why after 3 month google still lists the old urls I'd be more than happy thanks a lot Hello, We have changed cms for our multiple language website and redirected all odl URl's properly to new cms which is working just fine.
Technical SEO | | Tit
Right after the first crawl almost 4 weeks ago we saw in google webmaster tool and SEO MOZ that google indexes for almost every singlepage the old URL as well and the new one and sends us for this duplicate metatags.
We deleted the old sitemap and uploaded the new and thought that google then will not index the old URL's anymore. But we still see a huge amount of duplicate metatags. Does anyone know what else we can do, so google doe snot index the old url's anymore but only the new ones? Thanks so much Michelle0 -
Duplicate content issues caused by our CMS
Hello fellow mozzers, Our in-house CMS - which is usually good for SEO purposes as it allows all the control over directories, filenames, browser titles etc that prevent unwieldy / meaningless URLs and generic title tags - seems to have got itself into a bit of a tiz when it comes to one of our clients. We have tried solving the problem to no avail, so I thought I'd throw it open and see if anyone has a soultion, or whether it's just a fault in our CMS. Basically, the SEs are indexing two identical pages, one ending with a / and the other ending /index.php, for one of our sites (www.signature-care-homes.co.uk). We have gone through the site and made sure the links all point to just one of these, and have done the same for off-site links, but there is still the duplicate content issue of both versions getting indexed. We also set up an htaccess file to redirect to the chosen version, but to no avail, and we're not sure canonical will work for this issue as / pages should redirect to /index.php anyway - and that's we can't work out. We have set the access file to point to index.php, and that should be what should be happening anyway, but it isn't. Is there an alternative way of telling the SE's to only look at one of these two versions? Also, we are currently rewriting the content and changing the structure - will this change the situation we find ourselves in?
Technical SEO | | themegroup0