XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it posible to improve site rankings working only with an other site?
Hi everyone, i´ll try to explain a situation is happening to me, i´m goint to try to explain the case (im writing the sites without links for explication purposes. Site 1: Adventurerooms Site 2: Adventureroomsmallorca Site 3: Adventureroomsmadrid (the new site) What happen is that at first there was only Adventurerooms and Adventureroomsmallorca, Adventurerooms was for Madrid and linked to the one in Mallorca too, was kind of giving the information for Madrid but in first page split with a link to Mallorca. In a new strategy we create Adventureroomsmadrid for Madrid, and leave Adventurerooms for Spain (with links to Adventureroomsmadrid and Adventureroomsmallorca. We redirect the info for Madrid in Adventurerooms to Adventureroomsmadrid with 301 redirections. We work during this 3 months in Adventureroomsmadrid making content in the blog, and improving (now Adventureroomsmadrid is Moz 15 (perhaps even more), and Adventurerooms is Moz 10. Surprising Adventurerooms is getting better in its search rankings, even when we took away content from it and even without working well. Adventureroomsmadrid is also improving but not as much as Adventurerooms (i know that is a new site, only 3 months), but Adventurerooms gets better results with no content and only DA of 10. I hope i´ve explain the case with my english so the question is: "Is it posible to improve site rankings working only with an other site?" Thanks in advance
Intermediate & Advanced SEO | | webtematica0 -
Why is my client's site not ranking anymore? Like big time!
Ok, I'm reaching out to all of you Moz'rs for some help with this one. My client's site has dropped off the face of google in a real short period of time. It went from page 1 (avg rank 3 to page 6 (avg rank 50) and below in the matter of 2 weeks. Here's some facts: 1. DA is a 22 and homepage PA is a 31. It outranks all other sites in its competitive set. 2. The homepage used to be the page that displays for keyword searches, now its the FAQ page, which has a lower PA of 23. Why has the home page seemingly vaporized? And, why is the FAQ showing as the first result? What should I start checking. I feel paralyzed, not sure where to start. More info: a. There are no alerts present in Webmaster Tools. b. For some reason the homepage (domain.com) was 301'd to domain.com/home.html. Domain.com is indexed by Google, however, domain.com/home.html is not. If this is the issue, what is the best way to handle it? Thanks in advance for your help!
Intermediate & Advanced SEO | | rhoadesjohn1 -
Better SEO Option, 1 Site 3 Subdomains or 4 Separate Sites?
Hey Mozzers, I'm working with a client who wants to redo their web presence. They have a a main website for the umbrella and then 3 divisions which have their own website as well. My question is: Is it better to have the main site on the main domain and then have the 3 separate sites be subdomains? Or 4 different domains with a linking structure to tie them all together? To my understanding option 1 would include high traffic for 1 domain and option 2 would be building Page Authority by having 4 different sites linking to each other? My guess would be option 2, only if all 4 sites start getting relevant authority to make the links of value. But right out of the gates option 1 might be more beneficial. A little advice/clarification would be great!
Intermediate & Advanced SEO | | MonsterWeb280 -
XML Sitemap index within a XML sitemaps index
We have a similar problem to http://www.seomoz.org/q/can-a-xml-sitemap-index-point-to-other-sitemaps-indexes Can a XML sitemap index point to other sitemaps indexes? According to the "Unique Doll Clothing" example on this link, it seems possible http://www.seomoz.org/blog/multiple-xml-sitemaps-increased-indexation-and-traffic Can someone share an XML Sitemap index within a XML sitemaps index example? We are looking for the format to implement the same on our website.
Intermediate & Advanced SEO | | Lakshdeep0 -
Site Structure Question
Hi All, Got a question about site structure, I currently have a website where everything is hosted on the root of the domain. See example below: site.com/men site.com/men-shorts site.com/men-shorts-[product name] I want to change the structure to site.com/men/shorts/[product-name] I have asked a couple of SEOs and some agree with me that the structure needs to be changed and some say that as long as I dictate the structure with internal links and breadcrumbs the URL structure doesn't matter... What do you guys think? Many thanks, Carlos
Intermediate & Advanced SEO | | Carlos-R0 -
Tool to check XML sitemap
Hello, Can anyone help me finding a tool to have closer look of the XML sitemap? Tks in advance! PP
Intermediate & Advanced SEO | | PedroM0 -
Separate Site or should we incorporate it into our main site
Hello, We have a website to sell personal development trainings. The owners want to start 2 blogs - one for each owner - that promotes their personal coaching practices. What's the SEO advantages of embedding both blogs in the current site vs starting 2 brand new blogs with their names as the domain names?
Intermediate & Advanced SEO | | BobGW0 -
2 sites or one sites: 2 locations
Hello, I have a dog training client who is offering services in 2 separate locations. We're looking to be first in the non-local search results and also rank well in google places. Would it be better to go for 2 separate sites or one site and try to rank for 2 different locations with one site? There's both local and standard search results when we type in our keywords. Thanks!
Intermediate & Advanced SEO | | BobGW0