Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
HTML Visual Sitemap
Hello, Can anyone suggest the best tool for a visual HTML sitemap? I want to show what a website looks like architecturally before and be able to drag pages around visually to show an enhanced site architecture. I have looked at a few tools online but would like to try before I buy and also get recommendations. Any ideas?
On-Page Optimization | | AL123al1 -
*** Please HELP *** A/B tests and optimisation implications
Hi Mozzers, We've been A-B testing landing pages, and have had some success. The changes we've been making have been quite radical in some instances - for example we tested this page: https://www.turnkeymortgages.co.uk/todays-mortgage-deals/ against this one: https://www.turnkeymortgages.co.uk/mortgage-quote/ (Today's best deals won, but we've decided to keep the quote page as it does work for some channels). The decision was made to try and optimise Today's best deals for 'best mortgage deals today' rather than 'mortgage quote' because it offers so much more than simply a quote. The quote page is optimised for 'mortgage quote', though it doesn't rank particularly well (I'm not overly concerned by this as even though you'd think that when people are looking for a quote that they would fill in the form, they don't - people are strange!) As a result of the change above we changed all links that originally went to the quote page to go to Today's best deals instead. As we go through the process of optimising for best conversion will it be damaging if we don't change the url as well. As I can see lots of iterations and lots of work whenever we make changes to the pages (going through the entire site to change the links). I am worried though that we'll end up with hundreds of landing pages and changing links all over the site - do you think we should keep the URLs the same from now on, unless the content changes as radically as it did in the instance I've highlighted above? Thanks, Amelia
On-Page Optimization | | CommT0 -
[HELP!] File Name and ALT Tags
Hi, please answer my questions: 1. Is it okay to use the same keyword on both file name and alt tags when inserting an image? Example: File Name: buy-lego-online.jpg ALT tag: buy-lego-online Will it trigger Google Panda? Will I be penalized for that? Or the file name and alt tags should be different from each other? Because when inserting an image on Wordpress, the alt tags are always the same as the file name by default. 2. For example, I have 2 images in a page (same topic/niche) and I will put "cheap-lego-for-kids" and "best-lego-for-sale" as alt tags. Considering that I repeat the word "lego", is it considered keyword stuffing? Will I be penalized for that? Thanks in advance!
On-Page Optimization | | bubblymaiko0 -
Help! A site has copied my blog!
My site tanked on July 21 and I have been working so hard to bring it back up but nothing is working. Today I looked at "Links to Your Site" on Google Webmasters and I see a copy of my site on another URL. mysite.eemovies.org/mycategory/mypost The domain name is eemovies.org and then all my stuff is wrapped around it and all my content is there! How do I stop this?!
On-Page Optimization | | 2bloggers0 -
Google Results Title Tag HELP
Can anybody tell us why Google changes your title tag in the SERP? If you check out the below link or type in 'days inn', you will see the 2nd result for www.daysinnrc.co.uk just says 'Days Inn' but on the actual site the title tag for this page is 'Days Inn UK | Days Inn | Daysinnrc.co.uk' http://www.google.co.uk/#hl=en&sclient=psy-ab&q=days+inn&oq=days+inn&gs_l=hp.3...4110.4110.4.4297.1.1.0.0.0.0.0.0..0.0...0.0...1c.1.kWVC24EnCHE&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=7680231318a44bb0&bpcl=35466521&biw=1920&bih=934 This has happened with another site too, does anybody know why? Thanks
On-Page Optimization | | SEOwins0 -
Is having this sitemap worthwhile?
I know that a sitemap can help SEs to find and index pages, and also deal with canonicalisation issues, but I have a client who is insisting that we build a sitemap containing links to every possible category on a huge ecommerce site. We already have tabbed modules that allow users to sort by brand, product type, location etc, plus a strong search function that allows them to select multiple options. I'm worried that a sitemap stuffed with keywords like "Product A by brand B in Location C", "Product A by brand B in Location D" etc. will just seem spammy and do more harm than good. Any thoughts would be appreciated! Thanks.
On-Page Optimization | | gcdtechnologies0 -
Can embedding videos from other sites help SEO?
I work for an eCommerce site and many of the vendors we get our products from have videos our their site that explain how the products work and the different features of them, ect. We really have done nothing with video marketing, and would like to create some of our own unique videos, but this will be impossible for the vast majority of the products on our site. We know from a content standpoint embedding videos from our vendors to our product pages will be beneficial, but at the same time it shows our competitors and other site visitors where we are getting our products from which we would rather not do. How will this effect us from an SEO standpoint though? Will it help us to have new content, even though it is embedded from another site? Or will it hurt us because it is not unique? Or will it have no effect at all? Thanks for you help!
On-Page Optimization | | ClaytonKendall0 -
Google Sitemap
Does adding a Google Sitemap to webmaster tools REALLY help SEO? If so, are there any resources for help creating one? Here is my site: http://www.petmedicalcenter.com Thanks,
On-Page Optimization | | PMC-312087
Brant0