XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No-Indexing on Ecommerce site
Hi Our site has a lot of similar/lower quality product pages which aren't a high priority - so these probably won't get looked at in detail to improve performance as we have over 200,000 products . Some of them do generate a small amount of revenue, but an article I read suggested no-indexing pages which are of little value to improve site performance & overall structure. I wanted to find out if anyone had done this and what results they saw? Will this actually improve rankings of our focus areas? It makes me a bit nervous to just block pages so any advice is appreciated 🙂
Intermediate & Advanced SEO | | BeckyKey0 -
Sitemap Query
I've decided to write my own sitemap because frankly, the automated ones pull all kinds of out of I don't know where. So to get around that, manual it is. But I have some products appear in various categories, should I still list every product in each category in the sitemap, regardless of some being duplicates, or should I choose the most relevant category and list them there? I do have a canonical URL extension which should resolve any duplicate content I have.
Intermediate & Advanced SEO | | moon-boots0 -
Sitemap: unique sitemap or different sitemaps by Country
Hi guys, i have a question about sitemaps. We are doing an international site, e.x. www.offers.com for landing page and www.offers.com/br for brazil, www.offers.com/it for italy, etc... i don't if we should do an unique sitemap for all countries or separate sitemaps by country, e.x.: unique sitemap: www.offers.com/sitemap.xml - including all sitemaps www.offers.com/br/sitemap.xml - sitemap for brazil market only. Thank you
Intermediate & Advanced SEO | | thekiller990 -
How much SEO damage would it do having a subdomain site rather directory site?
Hi all! With a coleague we were arguing about what is better: Having a subdomain or a directory.
Intermediate & Advanced SEO | | Gaston Riera
Let me explain some more, this is about the cases: Having a multi-language site: Where en.domain.com or es.domain.com rather than domain.com/en/ or domain.com/es/ Having a Mobile and desktop version: m.domain.com or domain.com rather than domain.com/m or just domain.com. Having multiple location websites, you might figure. The dicussion started with me saying: Its better to have a directory site.
And my coleague said: Its better to have a subdomain site. Some of the reasons that he said is that big companies (such as wordpress) are doing that. And that's better for the business.
My reasons are fully based on this post from Rand Fishkin: Subdomains vs. Subfolders, Rel Canonical vs. 301, and How to Structure Links for SEO - Whiteboard Friday So, what does the community have to say about this?
Who should win this argue? GR.0 -
301 or Canonical - Ecommerce Site Question
We are making a change to our Navigation and this includes having to change the URL structure of a few pages of our site. Due to issues with the CMS (that are out of my control) we are unable to keep the current URL structure of two of our highest ranking pages. Our site is an E-commerce Site The Structure is changing from..... www.domain.com/page/highrankingpage <----OLD PAGE RANKED WELL to www.domain.com/category/highrankingpage <----NEW PAGE Generally I would have 301 'd this page but I found out that our Tech team added a Canonical to this page instead....(showing the high ranking page to the Search Engines) and on our site the visitors are able to browse the website getting the new page. BOTH PAGES ARE BASICALLY IDENTICAL (Same Content) http://searchenginewatch.com/sew/how-to/2288690/how-and-when-to-use-301-redirects-vs-canonical# Thoughts?
Intermediate & Advanced SEO | | CMcMullen0 -
Xml sitemap only shows up sometimes (magento)
Hi Moz community, I'm using Magento platform. I can generate a sitemap using their xml generator, but it will only pull up sometimes in web explorers, the rest of the time it will show a 404 page. GWT also tells me that I get a 404 error when testing the sitemap, but sometimes it will acknowledge that it's there. Anyone had this problem before or know how to help. sitemap= www.ice.com/sitemap.xml Let me know what other information I can provide to help. Thanks!
Intermediate & Advanced SEO | | IceIcebaby0 -
Bad site migration - what to do!
Hi Mozzers - I'm just looking at a site which has been damaged by a very poor site migration. Basically, the old URLs were 301'd to a page on the new website (not a 404) telling everyone the page no longer existed. They did not 301 old pages to equivalent new pages. So I just checked Google WMT and saw 1,000 crawl errors - basically the old URLs. This migration was done back in February, since when traffic to the website has never recovered. Should I fix this now? Is it worth implementing the correct 301s now, after such a timelapse?
Intermediate & Advanced SEO | | McTaggart0 -
How To Internationalize - Big Question
Hi all, Here is a big question. We have a long-established good content website with a .co.uk domain. The site is UK focussed. However, we are planning a new feature which will be UK and worldwide. So do we: 1. Keep it all on our .co.uk ? 2. Put the non-UK parts on a .com domain ? We don't have any content as such for a separate domain, and are not planning any. But, we are not sure if for example US users would be unimpressed with a UK domain. We could fudge it with "co.uk/us" etc. (Notice how we have not mentioned Google. Fed-up chasing big G the whole time. We just want to concentrate on our users and the service we provide to them. But G remains the elephant crapping in the corner of the room.) Also, we are asking this question before we let our developers and designers get to work. Basically we value Moz community opinions over and above theirs. Realise this is a big question, but you have big brains. Please chip in.
Intermediate & Advanced SEO | | dexm100