Sitemap.xml - autogenerated by CMS is full of crud
-
Hi all,
hope you can help.
the Magento ecommerce system I'm working with autogenerates sitemap.xml - it's well formed with priority and frequency parameters.
However, it has generated lots of URLs that are pointing to broken pages returning fatal erros, duplicate URLs (not canonicals), 404s etc
I'm thinking of hand creating sitemap.xml - the site has around 50 main pages including products and categories, and I can get the main page URLs listed by screaming frog or xenu.
Then I'll have to get into the hand editing the crud pages with noindex, and useful duplicates with canonicals.
Is this the way to go or is there another solution
thanks in advance for any advice
-
If the cron is working then I would personally turn to the other forum to see if anyone knows a way to rope those messy URLs in and get them under control. I try to avoid manually generating and updating sitemaps whenever I can, because it's a hassle on a small site, not to mention the trouble on an ecommerce site.
If your site is going to stay that small, then a manual sitemap might be less of a headache for you than customizing Magento.
I would worry about keeping a clean sitemap. If the search engines learn that you keep a messy sitemap, they will rely on it less and less. 404 & 500 codes especially, but also redirects and perhaps duplicate content.
For Further Reading:
Google Sitemaps Ask For Clean URLs - http://www.johnfdoherty.com/google-sitemaps-ask-for-clean-urls/
-
Hi Kane,
the sitemap is new - it's just that Magento create lots of duplicate files on the fly & it's not putting the canonical URLs in the sitemap etc.
I just wondered whether its worth hand creating a sitemap.xml containing the content pages (60 or 70 of them) for this relatively small site, or not worry too much about the sitemap, the site is pretty well indexed by google already
I'll head over to the Magento forums again to see if I can find more info
many thanks for you help
-
If it's returning 404 pages, that sounds like a dated sitemap. Have you activated the cron service?
See the "Refreshing Sitemaps at Regular Intervals" section of this page if not:
Magento can be set up to automatically refresh Google Sitemaps at regular intervals. This function is configured in Admin > System > Configuration > Google Sitemap.
To use Magento’s automatic generation of Google Sitemaps, you must activate the Magento Cron service.
If you do have that setup, and you're certain it's working correctly, then I would turn to the forums at MagentoCommerce.com - you're going to get a lot faster answer there since everyone is familiar with that exact platform.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Japanese URL-structured sitemap (pages) not being indexed by Bing Webmaster Tools
Hello everyone, I am facing an issue with the sitemap submission feature in Bing Webmaster Tools for a Japanese language subdirectory domain project. Just to outline the key points: The website is based on a subdirectory URL ( example.com/ja/ ) The Japanese URLs (when pages are published in WordPress) are not being encoded. They are entered in pure Kanji. Google Webmaster Tools, for instance, has no issues reading and indexing the page's URLs in its sitemap submission area (all pages are being indexed). When it comes to Bing Webmaster Tools it's a different story, though. Basically, after the sitemap has been submitted ( example.com/ja/sitemap.xml ), it does report an error that it failed to download this part of the sitemap: "page-sitemap.xml" (basically the sitemap featuring all the sites pages). That means that no URLs have been submitted to Bing either. My apprehension is that Bing Webmaster Tools does not understand the Japanese URLs (or the Kanji for that matter). Therefore, I generally wonder what the correct way is to go on about this. When viewing the sitemap ( example.com/ja/page-sitemap.xml ) in a web browser, though, the Japanese URL's characters are already displayed as encoded. I am not sure if submitting the Kanji style URLs separately is a solution. In Bing Webmaster Tools this can only be done on the root domain level ( example.com ). However, surely there must be a way to make Bing's sitemap submission understand Japanese style sitemaps? Many thanks everyone for any advice!
Technical SEO | | Hermski0 -
What would cause a sudden drop in indexed sitemap pages?
I have made no changes to my site for awhile and on 7/14 I had a 20% drop in indexed pages from the sitemap. However my total indexed pages has stayed the same. What would cause that?
Technical SEO | | EcommerceSite0 -
Sitemaps: Good Image And Video Sitemap Generators
Hello, We are trying to update our sitemap. We have currently updated our XML and HTML sitemaps but would like to have an image and video sitemap also. Can anyone recommend a good image and video sitemap generator? Kind regards, | Deeana Radley Web Designer & SEO Assistant Phone: 01702 460047 Email: dee@solopress.com Google+: +DeeanaRadley Twitter: @DeeanaRadley |
Technical SEO | | SolopressPrint0 -
Home link tags slash or full domain name
Recently I was asked the question by a client who has some SEO knowledge..... We built their site and during testing we simply set the home page links to be: home rather than: home This was mainly to help while in test and dev as they were on different servers to the live site and if we had put the full domain in it kept taking us from dev/test to live. Is it bad to have just a slash for SEO purposes or is it REALLY that important on home links to have the full domain as a slash takes you back to the full domain? Thanks
Technical SEO | | spiralsites0 -
How to display the full structure of website on Google serps
I have been searching around but unable to gather information as to how we can control or list top pages of a website on Google's first page , i.e. if we type seomoz in google , we can see the main listing with 6 subdomain listings , which link to Blog , Seo tool , Beginner Seo guide , Learn Seo , Pricing & Plans and login My question is can we control these listings i.e. what to display and what not , and if yes how can we make this type of visibility on first page , by using html or xml sitemaps or theirs something mostly websites are missing. Cause this type of data is coming up for very less websites and mostly websites are with single urls. c43Ki.jpg
Technical SEO | | ngupta10 -
Best practice for XML sitemap depth
We run an eCommerce for education products with 20 or so subject based catalogues (Maths, Literacy etc) and each catalogue having numerous ranges (Counting, Maths Games etc) then products within those. We carry approximately 15,000 products. My question is around the sitemap we submit - nightly - and it's depth. It is currently set to cover off home, catalogues and ranges plus all static content (about us etc). Should we be submitting sitemaps to include product pages as well? Does it matter or would it not make much difference in terms of search. Thanks in advance.
Technical SEO | | TTS_Group0 -
Google sitemap just for a part of site?
Hi, I am about reorganize (content and seo-wise) a part of a larger site and I wondered if it is possible to use a Google sitemap just for some but not all pages of a site? Does anyone know if this has any impact on pages that are not included in the sitemap? Thanks
Technical SEO | | haest0