Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help optimizing website for speed
Hello, My website is www.likechimp.com and is a University project. I need to optimise the website for speed as the bounce rate is fairly quick - I feel this could be due to how long it takes web site to load? Any tips in increasing internet speed. I am willing to higher someone if they feel they can help! Thanks, L
On-Page Optimization | | xlucax0 -
Which version of the homepage on the sitemap?
We have been wondering this for a while now. When we build our sitemaps, or when the yoast plugin does in WP we are often left with www.yourdomain.co.uk/ and www.yourdomain.co.uk/index.html in our sitemaps. Surely it isn't healthy to have both in the sitemap. Which one should we take out? Thanks
On-Page Optimization | | EveronSEO0 -
Homepage SEO: Does Text Content Help Traffic?
Hi Mozzers! My employers homepage (www.swarovski.com) is - amongst other problems we're about to fix - very thin (not to say empty!) in text content. If we were to put relevant text on the next version of the page, would that be beneficial in terms of traffic to that page? Thanks and cheers, Chris
On-Page Optimization | | Diderino0 -
Can someone help with Canonical?
I have a wordpress site that On-Page Grader is saying I don't have Canonical done correctly. Here is the comment. Appropriate Use of Rel Canonical If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL. Recommendation: We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. I have quite a few sites and have never had an issue with this. Can anyone help? I tried installing a plugin but that seems to have made it worse. This is the front page of the site btw.
On-Page Optimization | | jonnyholt1 -
Why Aren’t All My XML Sitemap Images Indexed in Webmaster Tools?
Hi, Here is our main sitemap http://www.vistastores.com/newsitemap/main_sitemap.xml We have submitted all category wise sitemap having Image Tags : For eg - Ac Category http://www.vistastores.com/newsitemap/window_ac_sitemap.xml contains iamge tag - image:imageimage:locimage:captionimage:title</image:title></image:caption></image:loc></image:image> All our 142 category pages includes these format. Still the sitemap report on 4-Apr-2013 says: Sitemaps content Web pages:
On-Page Optimization | | CommercePundit
Submitted 14,569
Indexed 11,219 Images:
Submitted 21,442
Indexed 11,762 You can see major difference in submitted v/s indexed. I have looked into Jay Simpson question - http://www.seomoz.org/q/any-idea-why-our-sitemap-images-aren-t-indexed to find this answer but didn't get Perfect & clear answer. I need urgent answer to fix this issue..... K0NDuw5s.jpg0 -
Can Cascading DropDown HELP improve site architucture?
Our online stores is selling furnace filters. http://www.furnacefilterscanada.com We have 4 different qualities furnace filters and each of them are avaible in almost 50 different sizes. My store is setup with almost 200 different products. ( 4 filtres X 50 sizes) None of them have variable. I'm looking for a solution to help customers to find there furnace filter size fast and easy on our HOME page. I find this cascading dropdown option http://www.asp.net/ajaxLibrary/AjaxControlToolkitSampleSite/CascadingDropDown/CascadingDropDown.aspx where I wiil setup 3 options to select: filter width filter lenght filter depth Will it be the best options? Now my store is setup with categories and it is not the best solution. http://www.furnacefilterscanada.com This was a lots of work to create 200 different products. The reason I did this is because when searching in search engine, vistors will use such keywords: 16x20x4 filter 20x25x2 furnace filter a/c filter 10x20x1 replacement filter online 20x20x1 16x25x5 filter toronto and so on... Must keywords request in search engine will included the furnace filter size. Thank you for your help, BigBlaze
On-Page Optimization | | BigBlaze2050 -
Creating a sitemap
what is the best place to go to create a xml sitemap? I have used xml-sitemap.com but it only does 500 pages. Any suggestions? Does a sitemap have value on the SEO front?
On-Page Optimization | | jgmayes0 -
Are xml sitemaps still in use today?
Hi, Are you still using XML sitemaps today? If yes, does it bring any benefit to your website like faster indexing of webpages or better rankings? Are you using special features like video sitemaps or sitemap index files? Best regards, Tobias
On-Page Optimization | | Tobiask-1215731