Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11,
."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hello, I need some help: Organic Traffic going down
Hello, I need some help, i am trying to figure out why my Organic traffic has gone down so much in the last six months and trying to find a solution, but I am at a lost as to what i should be looking to see how to fix it. http://goo.gl/arwlON I hope you can help or give me some ideas as to why. Thank you in advance.
On-Page Optimization | | blinky510 -
Google Treating these URL's as diff, but they are same. please help
Google is treating, below URL's as two different URL's when they are same. How to solve this. Please help. Case 1:/2570/Venture-Capital-and-Capital-Markets/2570/venture-capital-and-capital-marketsCase 2: /xxx/Java-Programming//xxx/Java-ProgrammingPlease help, how to solve this. Thanks in advance
On-Page Optimization | | AnkammaRao0 -
Yoast SEO sitemap link 404 problem
I have recently moved my wordpress blog from a subdomain into a directory e.g. www.mysite.com/blog/ and installed yeast SEO however when I go to the site map as directed in the pluign panel www.mysite.com/blog/sitemap_index.xml its not there are I get a 404 error? Any help much appreciated.
On-Page Optimization | | SamCUK0 -
Need professional SEO help for onpage analysis and advice
Hi guys, I am not sure if here is the right place to ask for this, but we really need help with one of our projects, especially for the site analysis. We dont need link building tips etc, we just need help regarding our site structure, code etc etc. Do you have any ideas where I can find some real professionals on this topic? Thank you a lot !
On-Page Optimization | | commissionshare0 -
Sitemap.xml parameters
Hello, I confused about the parameters when create sitemap.xml. I don'n know how to use these parameters efficiently. Some parameters are:
On-Page Optimization | | JohnHuynh
- **Page changing frequency: **what is a good value? Can I choose "None"?
- **Last modified date: **what is a good value? Can I choose "Don’t specify"?
- **Page priority: **what is a good value? Can I choose "Don’t specify"? Thanks,0 -
Optimization help
Looking for suggestions - one of my targeted keywords is "IT Support NY". I can't for the life of me figure out a way to use it in a sentence. Any ideas?
On-Page Optimization | | CsmBill0 -
Page Titles For Local - Help on URL Structure
Trying to figure out the best way to construct localized urls for the dental website. For example, If I have the URL:
On-Page Optimization | | Czubmeister
http://www.kooskidental.com/services/cosmetic-dentistry/
and If I want to make it local to the city I would use: http://www.kooskidental.com/services/richardson-tx-cosmetic-dentistry/ But what happens is that I have other options off the menu like: http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/teeth-whitening/ But if I am trying to rank for richardson tx teeth whitening, I would have to do http://www.koooskidental.com/services/richardson-tx-cosmetic-dentistry/richardson-tx-teeth-whitening/ But that's pretty long and ugly and I don't think I need richardson-tx in their twice. If I am trying to rank for richardson tx cosmetic dentistry and richardson tx teeth whitening, what would be the best structure for the url's?0 -
Please Help ! Strange description
The site http://goo.gl/bQkCF is displaying a strange description for keyword "taxi software" in google.com (second position ). Instead of description, text "logo" is being displayed. I understand that there is no description on the site, but still why this text. I would appreciate if you would let me know if anything is wrong with the site.
On-Page Optimization | | seoug_20050