Sitemap Help!
-
Hi Guys,
Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.
I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?
Also, how do i find out how many pages my site actually has indexed and not indexed??
Thank You all
Wayne
-
The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information. If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.
Just something to watch out for but it's probably your best solution.
-
Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me
- Copy / paste the code below into a text editor.
- Edit the beginning of the file: where you see seomoz.com, put your own domain name there
- Save the file as getsitemap.php and ftp it to the appropriate folder.
- Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
- The moment you do it, a sitemap.xml will be generated in your folder
- Refresh your ftp client and download the sitemap. Make further changes to it if you wish.
=== CODE STARTS HERE ===
define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns=""http://www.google.com/schemas/sitemap/0.84"" ="" <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) { return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) { if (!$dateTime) { $dateTime = date('Y-m-d H:i:s'); } if (is_numeric(substr($dateTime, 11, 1))) { $isoTS = substr($dateTime, 0, 10) ."T" .substr($dateTime, 11, ."+00:00"; } else { $isoTS = substr($dateTime, 0, 10); } return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) { GLOBAL $newLine; GLOBAL $indent; GLOBAL $isoLastModifiedSite; $urlOpen = "$indent<url>$newLine";</url> $urlValue = ""; $urlClose = "$indent$newLine"; $locOpen = "$indent$indent<loc>";</loc> $locValue = ""; $locClose = "$newLine"; $lastmodOpen = "$indent$indent<lastmod>";</lastmod> $lastmodValue = ""; $lastmodClose = "$newLine"; $changefreqOpen = "$indent$indent<changefreq>";</changefreq> $changefreqValue = ""; $changefreqClose = "$newLine"; $priorityOpen = "$indent$indent<priority>";</priority> $priorityValue = ""; $priorityClose = "$newLine"; $urlTag = $urlOpen; $urlValue = $locOpen .makeUrlString("$url") .$locClose; if ($modifiedDateTime) { $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose; if (!$isoLastModifiedSite) { // last modification of web site $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime); } } if ($changeFrequency) { $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose; } if ($priority) { $urlValue .= $priorityOpen .$priority .$priorityClose; } $urlTag .= $urlValue; $urlTag .= $urlClose; return $urlTag;} function rscandir($base='', &$data=array()) { $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array / foreach($array as $value) : / loop through the array at the level of the supplied $base / if (is_dir($base.$value)) : / if this is a directory / $data[] = $base.$value.'/'; / add it to the $data array / $data = rscandir($base.$value.'/', $data); / then make a recursive call with the current $value as the $base supplying the $data array to carry into the recursion / elseif (is_file($base.$value)) : / else if the current $value is a file / $data[] = $base.$value; / just add the current $value to the $data array */ endif; endforeach; return $data; // return the $data array } function kill_base($t) { return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) { $pageLastModified = date ("Y-m-d", filemtime($dir[$key])); $pageChangeFrequency = "monthly"; $pagePriority = 0.8; $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>
=== CODE ENDS HERE ===
-
HTML sitemaps are good for users; having 100,000 links on a page though, not so much.
If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.
-
Is there any way i can see pages that have not been indexed?
Not that I can tell and using site: isn't going to be feasible on a large site I guess.
Is it more beneficial to include various site maps or just the one?
Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.
-
Is there any way i can see pages that have not been indexed?
Is it more beneficial to include various site maps or just the one?
Thanks for your help!!
-
Thanks for your help
do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?
-
How big we talking?
Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.
Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.
Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.
Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html
Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.
-
Hey,
I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453
There are automatic sitemap generators out there - if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.
DD
-
To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disqus Comment is help Ntaifitness seo?
We use Disqus on our blog https://www.fitness-china.com/diy-dip-station Is it helpful for SEO? I found a few comments. Is our blog content not good enough, or use this? Few plugins
On-Page Optimization | | ahislop5740 -
Sitemaps Updating
Im using wordpress and I realise that my sitemaps doesnt update itself when i add an additional page on my website, like a blog post. I have to go to (1) setting > xml sitemap setting > click on build sitemap > save changes in wordpress, and then (2) Export the sitemal.xml file it to webmaster tools in google every single time i blog. Am i doing it wrong? i feel that all these should be automatic.
On-Page Optimization | | kevinbp0 -
Yoast SEO sitemap link 404 problem
I have recently moved my wordpress blog from a subdomain into a directory e.g. www.mysite.com/blog/ and installed yeast SEO however when I go to the site map as directed in the pluign panel www.mysite.com/blog/sitemap_index.xml its not there are I get a 404 error? Any help much appreciated.
On-Page Optimization | | SamCUK0 -
Homepage SEO: Does Text Content Help Traffic?
Hi Mozzers! My employers homepage (www.swarovski.com) is - amongst other problems we're about to fix - very thin (not to say empty!) in text content. If we were to put relevant text on the next version of the page, would that be beneficial in terms of traffic to that page? Thanks and cheers, Chris
On-Page Optimization | | Diderino0 -
Need help, i am lost
Hello all, I am new in this community. I have been for a while on Page 1 on Google with the keyword "Wooden Signs" with my website CreateYourWoodSign.com Since the Google update (April 24th i think)I completely disappeared from Google. I have not been able to come back since. Any help would you highly appreciated. Thank you!
On-Page Optimization | | manu45
Emmanuel0 -
New adsense account request rejected - need help
I'm moving my company to Australia, shutting down the US company. Google said I had to request a new Adsense account, so I did. They opened the account, I added the same ads, in the same places, and they have rejected my application. What do I do now? The other account has been open since 2004. They never said a word about this before. After two years of working on improvements, now I'm just about destroyed. I need some help, because I thought I knew what I was doing, but obviously not! As usual. their helpful response is no help at all. http://bit.ly/NPACk - there are no G ads on the front page http://bit.ly/V8ubB5 - this is a typical story http://bit.ly/UpTC2r - this is a typical press release As mentioned in our welcome email, we conduct a second review of your AdSense application once AdSense code is placed on your site(s). As a result of this review, we have disapproved your account for the following violation(s): Issues: - Site does not comply with Google policies --------------------- Further detail: Site does not comply with Google policies: We're unable to approve your AdSense application at this time for one of the reasons listed below or another reason listed in our program policies ([https://support.google.com/adsense/bin/topic.py?topic=1271507](https://support.google.com/adsense/bin/topic.py?topic=1271507)). We recommend that you review the information provided below and make the necessary changes to your site. 1\. You need to improve your site’s user experience To ensure a good experience for users and advertisers, publishers participating in the AdSense program are required to adhere to the Webmaster Quality guidelines ([http://www.google.com/support/webmasters/bin/answer.py?answer=35769](http://www.google.com/support/webmasters/bin/answer.py?answer=35769)). These guidelines provide many tips to help you to provide a positive experience for your users. You’ll also find more useful information in this AdSense blog post which highlights five user experience principles: [http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html](http://adsense.blogspot.com/2012/10/publisher-insights-part-1-5-principles.html). Applying these principles will help you to provide a great experience for users on your site. 2\. Your site is a chat site which is not compliant with our policy Publishers are encouraged to experiment with a variety of ad placements and ad formats. However, as stated in our program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)), AdSense publishers may not place ad code, search boxes or search results in chat programs. This includes, but is not limited to, instant messaging (IMs), chat sites and other pages that contains dynamic content. 3\. You need to remove all content that encourages violation of Google product policies Publishers may not provide the means to circumvent the policies of any Google products, such as by allowing users to download YouTube videos, or encourage the violation of Google AdSense policies. Moreover, publishers may not make use of Google brand features such as logos, screenshots, or other distinctive features without our express permission. For more information, please visit our Help Center ([http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1](http://support.google.com/adsense/bin/answer.py?hl=en&ctx=as2&answer=1348688&rd=1)). 4\. Your site is dedicated to the sale and distribution of term papers We’re happy to see our publishers’ sites full of useful and informative content, however, as stated in our program policies ( [https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953](https://www.google.com/adsense/support/as/bin/answer.py?hl=en&answer=105953) ), the sale or distribution of term papers, or any other content that is illegal, promotes illegal activity, or infringes on the legal rights of others is not allowed. Please review the AdSense program policies ([http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182](http://support.google.com/adsense/bin/answer.py?hl=en&answer=48182)) to ensure that your site meets all of the requirements for approval. As soon as you’ve made the necessary changes, we’ll be happy to take another look at your application.
On-Page Optimization | | loopyal0 -
Do Blog Comments On Your Site Help SEO?
There is a lot of debate as whether or not having comments on your blog is helpful from an SEO perspective. Proponents believe that more comments (1) creates more content, which search engines love, (2) creates more relevant keywords that can be searched, and (3) helps with "freshness" of the site/content leading to greater site authority. Others like Joost de Valk believe that comments can actually hurt SEO because keyword density cannot be controlled. He argues that his top SEO content are pages not posts for this very reason. What is your opinion?
On-Page Optimization | | marcperry0 -
On my site, www.myagingfolks.com, only a small number of my pages appear to be indexed by google or yahoo. Is that due to not having an XML sitemap, keywords, or some other problem?
On my site, www.myagingfolks.com, only a small number of my pages appear to be indexed by google or yahoo. I have thousands of pages! Is that due to not having an XML sitemap, keywords, or some other problem?
On-Page Optimization | | Jordanrg0