XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO - is it site or page
Hi When we're talking about SEO does the search engine only look at the whole site in general or do they look at the individual page when we're talking about SERP? So if you have a keyword "my search term" Does the search engine look at the site first or the page with the term on then rank you or is it the page then the site.
Intermediate & Advanced SEO | | Cocoonfxmedia0 -
New site. How important is traffic for a new site? And what about domain age?
Hi guys. I've been building a new site because i've seen a real SEO opportunity out there. I'm a mixing professional by trade and so I wanted to take advantage of SEO to help gain more work. Here's the site: www.signalchainstudios.co.uk I'm curious about domain age. This site fairly well optimised for my keywords, and my site got pretty good content on it (i think so anyway). But it's no where to be seen on the SERP's (link at all). Is this just a domain age issue? I'd have though it might be in the top 50 because my site's services are not hard to rank for at all! Also what about traffic? Does Google want to see an 'active' site before it considers 'promoting' it up the ranks? Or are back links and good content the main factor in the equation? Thanks in advance. I love this community to bits 🙂 Isaac.
Intermediate & Advanced SEO | | isaac6631 -
Penguin recovery, no manual action. Are our EMD sites killing our brand site?
Hi guys, Our brand site (http://urban3d.net) has been seeing steady decline due to algorithm updates for the past two years. Our previous SEO company engaged in some black-hat link building which has hurt us very badly. We have recently re-launched the site, with better design, better content, and completed a disavow of hundreds of bad links. The site is technically indexed, but is still nowhere in the SERPs after months of work to recover it by our internal marketing team. The last SEO company also told us to build EMD sites for our core services, which we did: http://3dvisualisation.co.uk/ http://propertybrochure.com/ http://kitchencgi.com/ My question is - could these EMD sites now hurting us even further and stopping our main brand site from ranking? Our plan is to rescue our brand site, with a view to retiring these outlier sites. However, with no progress on the brand site, we can't afford to remove these site (which are ranking). It seems a bit chicken and egg. Any advice would be very much appreciated. Aidan, Urban 3D
Intermediate & Advanced SEO | | aidancass0 -
Do XML sitemaps need to be manually resubmitted every time they are changed?
I have been noticing lately that quite a few of my client's sites are showing sitemap errors/warnings in Google webmaster tools, despite the fact that the issue with the the sitemap (e.g a URL that we have blocked in robots.txt) was fixed several months earlier. Google talks about resubmitting sitemaps here where it says you can resubmit your sitemap when you have made changes to it, I just find it somewhat strange that the sitemap is not automatically re-scanned when Google crawls a website. Does anyone know if the sitemap is automatically rescanned and only webmaster tools is not updated, or am I going to have to manually resubmit or ping Google with the sitemap each time a change is made? It would be interesting to know other people's experiences with this 🙂
Intermediate & Advanced SEO | | Jamie.Stevens0 -
Any Good XML Sitemaps Generator?
I was wondering if everyone could recommend what XML Sitemap generators they use. I've been using XML-Sitemap, and it's been a little hit and miss for me. Some sites it works great, other it really has serious problems indexing pages. I've also uses Google's, but unfortunately it's not very flexible to use. Any recommendation would be much appreciated.
Intermediate & Advanced SEO | | alrockn0 -
Scraped Content on Foreign Language Site. Big deal or not?
Hi All, I've been lurking and learning from this awesome Q&A forum, and I finally have a question. I am working on SEO for an entertainment site that tends to get scraped from time to time. Often, the scraped content is then translated into a foreign language, and posted along with whatever pictures were in the article. Sometimes a backlink to our site is given, sometimes not. Is scraped content that is translated to a foreign language still considered duplicate content? Should I just let it go, provided a backlink is given? Thanks!
Intermediate & Advanced SEO | | MKGraphiques
Jamie0 -
In mobile searches, does Google recognize HTML5 sites as mobile sites?
Does Google recognize HTML5 sites using responsive design as mobile sites? I know that for mobile searches, Google promotes results on mobile sites. I'm trying to determine if my site, created in HTML5 with responsive design falls into that category. Any insights on the topic would be very helpful.
Intermediate & Advanced SEO | | BostonWright0 -
How would you fix this site?
We're currently in the IA and design phase of rolling out a complete overhaul of our main site. In the meantime I've been doing some SEO triage, but I wanted to start making a longer term plan for SEO during and after the new site goes up. We have a pretty decent domain authority, and some quality backlinks, but we're just getting creamed in the SERPs. And so on to my question: How would you fix this site? What SEO strategy would you employ? http://www.adoptionhelp.org Thanks!
Intermediate & Advanced SEO | | AdoptionHelp0