Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help an SEO-DUMMY : ) Established hyphenated domain...redirect?!...new domain?!
Hello, everybody. I am definitely not an SEO specialist. My family owns a transportation business (since 2010) and i am the one responsible for the website (until we find a good SEO company). My question: Several years ago i did not know much about SEO and have chosen a domain name www.airporttransportation-limo.com (it is not the actual domain...just an example...i'm not sure if i can post the real website here) and another domain that is just the name of our company (it also has hyphen in it). Both websites are still doing good and we receive quite a bit of traffic, but i read more an more about how hyphenated domains and domains with more then two worlds can be bad for your SEO/business/traffic. I feel like the websites are stuck and not moving up any more..could that be because of the hyphens? I registered another domain that is the name of our company (which is well known by now) without any hyphens. Now i have no idea what to do. Should i redirect both old domains (old websites are different and do not have duplicate content) to the new one, or should i just redirect the old domain (just the name of our company with hyphen) to a new one (without hyphen) and leave the www.airportransportation-limo.com as is... Or maybe i should register another domain without any hyphens (two words only) and redirect the www.airporttransportation-limo.com to it... I am very nervous to make any changes and loose all the traffic. My family will kill me. Please help! I'm lost!
On-Page Optimization | | KL20140 -
Page Analysis - Helping Product Pages Outrank Search Results Pages
Hi! We have a lot of our search results pages that have been indexed and outrank our product pages and in some instance the actual product pages barely show up at all. Here is an example query that includes our brand name: http://goo.gl/cgB6W So, we have loads of actual product pages, video pages, etc that should be showing up here, but are not and this is just one example. Unfortunately, there are a LOT of these Search Results pages out there and utlimately we would love to de-index them altogether, but it's going to have to be carefully done. So, was wondering if anyone would want to check out one of our product pages and give any feedback as to what we could change to possibly improve rank or to make them more search friendly or hopefully to help them rise above these indexed search results pages? Here is an example product page: http://goo.gl/2R4IT Thanks!! Craig
On-Page Optimization | | TheCraig0 -
Need help regarding On page issue for dating site?
I'm having problem with on page of dating website datetolove.com. If you go to the location link in the website you will see the profile of same persons are showcasing in Country,state & city which will lead to duplication.if i use canonical tag in the country page then I think google will crawl only country page & will leave state & city pages.But I need that google should crawl all the 3 pages without any duplication's.Please help me out with this problem please check the link below.There are many other problems also in this website. http://www.datetolove.com/en/locations
On-Page Optimization | | varun18000 -
Need help, i am lost
Hello all, I am new in this community. I have been for a while on Page 1 on Google with the keyword "Wooden Signs" with my website CreateYourWoodSign.com Since the Google update (April 24th i think)I completely disappeared from Google. I have not been able to come back since. Any help would you highly appreciated. Thank you!
On-Page Optimization | | manu45
Emmanuel0 -
Does having more products in a category than others help with ranking that page with a particular keyword like iphone cases. We have thousands of products and wondered if it would hellp these all being on the website.
does having more products in a category than other websites help with ranking that page with a particular keyword like iphone cases. We have thousands of products and wondered if it would help these all being on the website.
On-Page Optimization | | Casefun0 -
Our sitemap is not indexed well
Hey there, Hope you guys can help. We get the following error: Nested indexing. Another Sitemap index refers to the index of sitemaps. The thing is that we cant find the error they are talking about. Thanks!!!!
On-Page Optimization | | Comunicare0 -
Complex navigation structure leaving me puzzled with Meta keywords! Would love some help...
Hi there So I have a main navigation, It includes 5 categories Each category contains 4-6 sub categories Within these sub categories, there are 6 - 10 sub sub categories Its a rather complex navigation but allows the end user to land exactly where they want without much mooching. Now my issue is the use of keywords. Should I be feeding the keywords used in the main category through to the sub category and the sub sub category as they are all linked or should I use unique keywords for each sub/sub/sub category? I have added an image of the nav layout so you can see how it works. I hope that makes sense? Could love some help! dHve8.jpg
On-Page Optimization | | onlineforequine0 -
Google Sitemap
Does adding a Google Sitemap to webmaster tools REALLY help SEO? If so, are there any resources for help creating one? Here is my site: http://www.petmedicalcenter.com Thanks,
On-Page Optimization | | PMC-312087
Brant0