Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11,
."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I am looking for help eliminating issues with website
I am looking for a company or person to assist me on the technical aspect of improving our site speed and resolving so errors that MOZ is reporting. For instance, we have an old URL domain that we retired when we consolidated 2 websites. We pointed (mapped) the old site pages to the current website, is this redirect helping (links) or hurting us? Can anyone recommend someone I can hire to help me with this project?
On-Page Optimization | | JulieALS0 -
Disqus Comment is help Ntaifitness seo?
We use Disqus on our blog https://www.fitness-china.com/diy-dip-station Is it helpful for SEO? I found a few comments. Is our blog content not good enough, or use this? Few plugins
On-Page Optimization | | ahislop5740 -
Rel Canonical help
Is it possible to confirm this to me please? My understanding of the rel canonical tag was to tell google of duplicate content? so for instance product 1
On-Page Optimization | | TeamacPaints
www.ourdomain.co.uk/products/category/subcategory/theproduct1 Product 1a
www.ourdomain.co.uk/products/category/subcategory/theproduct1a same content just a different colour would be rel canonical'd to Product 1 as thats the main product, is my understanding correct? Now here is what I have discovered. www.ourdomain.co.uk/products/category/subcategory/theproduct1 has a rel canonical tag that reverts back to www.ourdomain.co.uk/products/ which isn't optimized as such its just a generic catalog page. This is inccorect and google will dismiss the actial product and revert to the generic catalog page? any help would be great.0 -
Keyword and SERP Help Please
So I am curious about keyword placements etc. My main question is: So is whatever you search for in say Google must be the same in a website - to be found? So say you search for plumbers in Colorado Then you must have that exact, same phrase, in your website to be found? or does Google know based on title tags and such that a page is about plumbers and they service Colorado? I just want to make sure I am understanding how keywords work to be found. I mean you can have Colorado plumbers and plumbers in Colorado. So its hard to figure out how to use keywords. So a brief suggestion is greatly appreciated Chris
On-Page Optimization | | Berner0 -
I need some help...
I am completely perplexed here guys. I have accomplished all of the the things that the On- Page Analysis tool says that we need to perform as far as( Keyword laden page titles and webpages) yet the report comes back and gives the webpage a C and says that we still need to correct these issues. Can anyone explain this? The keywords are: " real estate augusta ga" " property management augusta ga" the address is: www.aubenrealty.com Thanks in advance, C
On-Page Optimization | | AubbiefromAubenRealty0 -
Rel Canonical - Could someone please help confirm something?
Morning Mozzers, I'm looking at a site (www.zitan.co.uk) and making a few recommendations for SEO, one of the things I've spotted is something weird with rel canonical. It looks (to me) as if they've got almost every single page set with this tag: rel="canonical" href="http://www.zitan.co.uk" /> I'm 99% certain that this means that every page on the site (that has this tag) is pointing all link juice / authority back to the homepage? If someone could please check and just confirm that, I'd really appreciate it. Thanks in advance, James
On-Page Optimization | | JamesMio0 -
Help with Appropriate Use of Rel Canonical
Whenever i enable Canonical URL through the 3DCart Control panel I get this Critical Factor error when running the on page report card: Appropriate Use of Rel Canonical Moderate fix <dl> <dt>Canonical URL</dt> <dd>"http://rcnitroshop.com/Nitro-Monster-Truck"</dd> <dt>Explanation</dt> <dd>If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL.</dd> <dt>Recommendation</dt> <dd>We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply.</dd> </dl> Now if I disable Canonical URL then run the on page report card again the critical error goes away but I get this Optional Factor error instead: Canonical URL Tag Usage Moderate fix <dl> <dt>Number of Canonical tags</dt> <dd>0</dd> <dt>Explanation</dt> <dd>Although the canonical URL tag is generally thought of as a way to solve duplicate content problems, it can be extremely wise to use it on every (unique) page of a site to help prevent any query strings, session IDs, scraped versions, licensing deals or future developments to potentially create a secondary version and pull link juice or other metrics away from the original. We believe the canonical URL tag is a best practice to help prevent future problems, even if nothing is specifically duplicate/problematic today.</dd> <dt>Recommendation</dt> <dd>Add a canonical URL tag referencing this URL to the header of the page.</dd> </dl> So basically I disabled it because obviously a Critical error is much worse then an optional error. Is there a way I can get rid of both errors?
On-Page Optimization | | bilsonx0 -
How to use good keyword URL to help main site
Hi. I'm a long time ecommerce guy and starting a third business. The main site URL is the name of the new business but I also purchased a .com URL that is our #1 keyword to target. So I need to know the best strategy to use the keyword url for helping with getting a top ranking for that keyword. I'm curious if I can or should build out the keyword URL site for the search engines and use a 301 redirect. Can you get top ranking for a site that just redirects? Anyway, I guess you get my question. This keyword gets a ton of perfectly targeted traffic so seems like a goldmine if I work it right. Thanks very much.
On-Page Optimization | | jimmyseo1