Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11,
."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Wordpress 'Hide Title' Feature, does this help shorten title length
Im wondering if anyone with some Wordpress experience can help me. I am using Yoast to create my page titles, but yet Moz tells me that my page titles including my actual page title tag which is 'dumfries wedding photography | Hemera Visuals' by clicking on the 'hide title' feature in wordpress will this in turn stop wordpress from automatically adding my page title and therfor bring my title length down drastically? And if so will I have to wait till google next crawls my page to see if this works? Kind Regards Cameron.
On-Page Optimization | | hemeravisuals120 -
Need professional SEO help for onpage analysis and advice
Hi guys, I am not sure if here is the right place to ask for this, but we really need help with one of our projects, especially for the site analysis. We dont need link building tips etc, we just need help regarding our site structure, code etc etc. Do you have any ideas where I can find some real professionals on this topic? Thank you a lot !
On-Page Optimization | | commissionshare0 -
Google Xml Sitemaps
Which plugin is good to use to create and submit my sitemap: sitemap from yoast or google xml sitemap plugin?
On-Page Optimization | | Sebastyan22
Which one is better? I already saw this video but I get an error when I submited it to webmaster tools and I don't know why:http://www.quicksprout.com/university/how-to-set-up-and-optimize-a-sitemap/_''Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.''_Thank you !0 -
I have a question about having to much content on a single page. Please help :)
I am working on a music related site. We are building a feature in our system to allow people to write information about songs on their playlist. So when a song is currently being played a user can read some cool facts or information about the song. http://imgur.com/5jFumPW ( screenshot). Some playlists have over 100 songs and could be completely random in genre and artist. I am wondering if some of these playlists have over 5,000 words of content if that is going to hurt us? We will be very strict about making sure its non spammy and good content. Also for the titles of the content is it bad to have over 100 h3 tags on one page? Just want to make sure we are on the right track. Any advice is greatly appreciated.
On-Page Optimization | | mikecrib10 -
New adsense account request rejected - need help
I'm moving my company to Australia, shutting down the US company. Google said I had to request a new Adsense account, so I did. They opened the account, I added the same ads, in the same places, and they have rejected my application. What do I do now? The other account has been open since 2004. They never said a word about this before. After two years of working on improvements, now I'm just about destroyed. I need some help, because I thought I knew what I was doing, but obviously not! As usual. their helpful response is no help at all. http://bit.ly/NPACk - there are no G ads on the front page http://bit.ly/V8ubB5 - this is a typical story http://bit.ly/UpTC2r - this is a typical press release As mentioned in our welcome email, we conduct a second review of your AdSense application once AdSense code is placed on your site(s). As a result of this review, we have disapproved your account for the following violation(s): Issues: - Site does not comply with Google policies --------------------- Further detail: Site does not comply with Google policies: We're unable to approve your AdSense application at this time for one of the reasons listed below or another reason listed in our program policies ([https://support.google.com/adsense/bin/topic.py?topic=1271507](https://support.google.com/adsense/bin/topic.py?topic=1271507)). We recommend that you review the information provided below and make the necessary changes to your site. 1\. You need to improve your site’s user experience To ensure a good experience for users and advertisers, publishers participating in the AdSense program are required to adhere to the Webmaster Quality guidelines ([http://www.google.com/support/webmasters/bin/answer.py?answer=35769](http://www.google.com/support/webmasters/bin/answer.py?answer=35769)). These guidelines provide many tips to help you to provide a positive experience for your users. You’ll also find more useful information in this AdSense blog post which highlights five user experience principles: [http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html](http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html). Applying these principles will help you to provide a great experience for users on your site. 2\. Your site is a chat site which is not compliant with our policy Publishers are encouraged to experiment with a variety of ad placements and ad formats. However, as stated in our program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)), AdSense publishers may not place ad code, search boxes or search results in chat programs. This includes, but is not limited to, instant messaging (IMs), chat sites and other pages that contains dynamic content. 3\. You need to remove all content that encourages violation of Google product policies Publishers may not provide the means to circumvent the policies of any Google products, such as by allowing users to download YouTube videos, or encourage the violation of Google AdSense policies. Moreover, publishers may not make use of Google brand features such as logos, screenshots, or other distinctive features without our express permission. For more information, please visit our Help Center ([http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1](http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1)). 4\. Your site is dedicated to the sale and distribution of term papers We’re happy to see our publishers’ sites full of useful and informative content, however, as stated in our program policies ( [https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953](https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953) ), the sale or distribution of term papers, or any other content that is illegal, promotes illegal activity, or infringes on the legal rights of others is not allowed. Please review the AdSense program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)) to ensure that your site meets all of the requirements for approval. As soon as you’ve made the necessary changes, we’ll be happy to take another look at your application.
On-Page Optimization | | loopyal0 -
Google Results Title Tag HELP
Can anybody tell us why Google changes your title tag in the SERP? If you check out the below link or type in 'days inn', you will see the 2nd result for www.daysinnrc.co.uk just says 'Days Inn' but on the actual site the title tag for this page is 'Days Inn UK | Days Inn | Daysinnrc.co.uk' http://www.google.co.uk/#hl=en&sclient=psy-ab&q=days+inn&oq=days+inn&gs_l=hp.3...4110.4110.4.4297.1.1.0.0.0.0.0.0..0.0...0.0...1c.1.kWVC24EnCHE&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=7680231318a44bb0&bpcl=35466521&biw=1920&bih=934 This has happened with another site too, does anybody know why? Thanks
On-Page Optimization | | SEOwins0 -
Will a new domain name help rankings
If I purchase a domain name that links to my site with the new domain name being keyword specific....will that help boost rankings in Google? Reason I ask is that a particular website always ranks higher than ours because of their domain name (keyword specific). They are currently not even "open" and they still manage to rank high. I checked for links with the seomoz tools but did not see any high links etc.. Thanks!
On-Page Optimization | | teachcsg0 -
Is there a tool out there I could use to help me compose unique meta tags in bulk?
We have a website that has hundreds of crawl errors due to duplicate meta tags. I could do with a tool to help compose unique ones in bulk so we don't exceed the recommended character limit and follow any other best practices.
On-Page Optimization | | WebDesignBirmingham0