Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does name of town in title tag help if queries don't include the town name?
Hi. Wanted to know if targeting local traffic online and the search volume of KWs in the area do not include the local names (according to KW planner) does it still help to keep the town names in the title tag? does google deliver local results based on location names in title tag if query didn't mention it?
On-Page Optimization | | Morris770 -
Navigation Links Causing Too Many Links Help?
Hello, I have read some SEOMOZ search results for this, but am still concerned that Google may see 4,500 Too Many Link warnings as a problem. This is caused primarily due to our header navigation, which is not intended to be keyword stuffing, but to provide all avenues for our breadth of content. site: crazymikesapps.com. Most answers seem to advise if there is no keyword stuffing at hand don't worry about it. Any help appreciated. thank you Mike
On-Page Optimization | | crazymikesapps0 -
Unable to see internal link numbers on Opensiteexplorer - Need help
I'm Anuj, a regular user of SEOMOZ. I need some SEO guidance from SEO experts. I'm trying to optimize a webstore for few keywords. I am facing some issues on SEO I was using https all over the webstore and was advised by the community members to not have https through out the site (Due to various reasons). The internal links were not showing up in opensiteexplorer & Google Webmaster Tools too when the site was with https (They were just showing 1 or 2). After changing the pages from https to http, I'm now able to see all the internal links of my website on GWT. Unfortunately, the internal link count on opensiteexplorer shows a very small fraction when compared to the # of internal links shown on GWT. The link update from Opensiteexplorer was on 27th FEB 2013. I had done the https to http (for all pages) somewhere between 17-24th of JANUARY 2013. I wanted to know if I have missed something as I am unable to see those numbers on Opensiteexplorer or will it take time for opensiteexplorer to show the internal link numbers ?
On-Page Optimization | | Pepperjet0 -
How to leverage user reviews & ratings to help in SEO rankings
Folks, We have User Generated Content site in travel domain, we want to leverage reviews & ratings for SEO rankings/CTR . We already are getting micro formats in google but not in all the cases. Any ideas/suggestions will be really very helpful. Thanks in Advance. -Amit
On-Page Optimization | | holidayiq0 -
URL 404 errors after crawl? HELP!
I am getting Crawl errors. It shows multiple pages as. I know this is more of a technical question however, I cannot find the answer anywhere. I'm using wordpress www.mydomain.com/title-of-page/mydomain.com/contact WHAT IS THIS?!
On-Page Optimization | | ChristineWeinbrecht0 -
Need help with fluctuating ranking for a specific keyword
my website www.totalmanagement.com fluctuates for the search term: web based property management software I have been using SEO Moz for a few months now and have managed to get to the top 5 and jump around between 3 and 5. Does anyone have any suggestions to assist me? Long term goal is also to really target: Property Management Software But I am still very new at this. Thanks in advance for the help!
On-Page Optimization | | dgruhin0 -
Submitting multiple sitemaps
I recently moved over from html to wordpress. I have the google sitemap plugin on the new wordpress site, but in webmaster tools, it's only showing 71 pages, and I have hundreds, but many are html. Is it okay, to submit an html sitemap as well as the wp sitemap that's already in there?
On-Page Optimization | | azguy0 -
Can someone please help me identify where all these URLS to my homepage are coming from?
Hi. I installed the SEOmoz toolbar for Firefox, and analyzed my home page, then clicked on 'get a full site analysis at Site Explorer'. This is what came up: http://www.opensiteexplorer.org/www.frs-solutions.com%252Fcontent%252Fhome/a!links?src=mb I hope that link works. If not, the URL is www.frs-solutions.com Anyway, there are about 57 different URLS within my site all pointing to my homepage! I have no idea where they are coming from. Can someone with an experienced eye take a quick look and tell me what I might be up against? Thank you!
On-Page Optimization | | aprilm-1890400