Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Yoast SEO sitemap link 404 problem
I have recently moved my wordpress blog from a subdomain into a directory e.g. www.mysite.com/blog/ and installed yeast SEO however when I go to the site map as directed in the pluign panel www.mysite.com/blog/sitemap_index.xml its not there are I get a 404 error? Any help much appreciated.
On-Page Optimization | | SamCUK0 -
Static or Dynamic Home Page? Please Help!
Hi again 🙂 I ask for ur input again my friends I finally decided to go for the shorter .COM domain, and I put my domain online. Since it will be an online games site, every post will be in fact a new game. And of course, every game, will be in fact a post. I put my home page as "latest posts" this way, all my posts are shown in home page, and posts being for me games, all my games appear on my home page. So my home page always has the latest games and is always fresh. The theme I use has the option to ad some "static content" at the end of the posts, on home page, for SEO. I my home page I have also about 300 words of "article" content, that never changes, and is for seo, since my post content, is in fact java script, and other html codes ( flash games). So my home pages is that article plus excerpts of latest post. But the problem is, that I cannot control the seo, on page this way, and I cannot control my keywords very well, or the density of them, since the home page isnt a page. Now the question is, to let my Home page as it is, or change it and set it up as a page. I can put games on pages to, that is not a big problem. What is best, for the future of the site, google ranking, and seo, in ur opinion. I ad that I recived "A" Grade on moz for Page Seo Grade! Thank You!
On-Page Optimization | | Catinas970 -
Help an SEO-DUMMY : ) Established hyphenated domain...redirect?!...new domain?!
Hello, everybody. I am definitely not an SEO specialist. My family owns a transportation business (since 2010) and i am the one responsible for the website (until we find a good SEO company). My question: Several years ago i did not know much about SEO and have chosen a domain name www.airporttransportation-limo.com (it is not the actual domain...just an example...i'm not sure if i can post the real website here) and another domain that is just the name of our company (it also has hyphen in it). Both websites are still doing good and we receive quite a bit of traffic, but i read more an more about how hyphenated domains and domains with more then two worlds can be bad for your SEO/business/traffic. I feel like the websites are stuck and not moving up any more..could that be because of the hyphens? I registered another domain that is the name of our company (which is well known by now) without any hyphens. Now i have no idea what to do. Should i redirect both old domains (old websites are different and do not have duplicate content) to the new one, or should i just redirect the old domain (just the name of our company with hyphen) to a new one (without hyphen) and leave the www.airportransportation-limo.com as is... Or maybe i should register another domain without any hyphens (two words only) and redirect the www.airporttransportation-limo.com to it... I am very nervous to make any changes and loose all the traffic. My family will kill me. Please help! I'm lost!
On-Page Optimization | | KL20140 -
Is this sitemap valid?
I seem to be having a problem getting google to index all the pages on my site. Usually within a week they've been indexed. The sitemap url is http://www.local-sex-search.com/sitemap.xml (this is an adult dating site). Any help appreciated.
On-Page Optimization | | SamCUK0 -
Robots file include sitemap
Hello, I see that google, facebook and moz... have robots.txt include sitemap at the footer.
On-Page Optimization | | JohnHuynh
Eg: http://www.google.com.vn/robots.txt Sitemap: http://www.google.com/sitemaps_webmasters.xml
Sitemap: http://www.google.com/ventures/sitemap_ventures.xml Should I include my sitemap file (sitemap.xml) at the footer of robots.txt and why should do this? Thanks,0 -
I need some help...
I am completely perplexed here guys. I have accomplished all of the the things that the On- Page Analysis tool says that we need to perform as far as( Keyword laden page titles and webpages) yet the report comes back and gives the webpage a C and says that we still need to correct these issues. Can anyone explain this? The keywords are: " real estate augusta ga" " property management augusta ga" the address is: www.aubenrealty.com Thanks in advance, C
On-Page Optimization | | AubbiefromAubenRealty0 -
How to SEO a website that is being help back by duplicate content?
We have over 20 websites that sell property. Each website is targeted to a different country. People advertise to sell their property. The websites are not getting to page 1 for the terms we want probably because of duplication issues. If we compare one website with another country website on www.duplicatecontent.net we find it is nearly 70% between one and the other. So we trying to understand why this is. If someone wanted to sell a property in Spain we would create an advert for them but rather than putting this on the back-end of the Spain website it goes on a separate website that does on all countries. We have tried to put nofollow tags so that the country specific website gets acknowledgement of being the original website but the rankings for key-terms will not rise and the duplication % remains nearly 70%. Can anyone suggest the best way forward?
On-Page Optimization | | Feily0 -
Please Help ! Strange description
The site http://goo.gl/bQkCF is displaying a strange description for keyword "taxi software" in google.com (second position ). Instead of description, text "logo" is being displayed. I understand that there is no description on the site, but still why this text. I would appreciate if you would let me know if anything is wrong with the site.
On-Page Optimization | | seoug_20050