XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Weird Site is linking to our site and links appears to be broken
I have got a lot of weird links indexed from this page: http://kzs.uere.info/files/images/dining-table-and-2-upholstered-chairs.html When clicking the link it shows 404. Also, the spam score is huge. What do you guys suggest to do with this?
Intermediate & Advanced SEO | | Miniorek
Could it be done by somebody to get our rankings down or domain penalized? Best Regards
Mike & Alex0 -
Priority Attribute in XML Sitemaps - Still Valid?
Is the priority value (scale of 0-1) used for each URL in an XML sitemap still a valid way of communicating to search engines which content you (the webmaster) believe is more important relative to other content on your site? I recall hearing that this was no longer used, but can't find a source. If it is no longer used, what are the easiest ways to communicate our preferences to search engines? Specifically, I'm looking to preference the most version version of a product's documentation (version 9) over the previous version (version 8). Thanks!
Intermediate & Advanced SEO | | Allie_Williams0 -
Google favoring old site over new site...
Hi, I started a new site for a client: www.berenjifamilylaw.com. His old site: www.bestfamilylawattorney.com was too loaded up with bad links. Here's the weird part: when you Google: "Los Angeles divorce lawyer" you see the old site come up on the 21st page, but Google doesn't even show the new site (even though it is indexed). It's been about 2 weeks now and no change. Has anyone experienced something like this? If so, what did you do (if anything). Also, I did NOT do a 301 redirect from old to new b/c of spammy links. Thanks.
Intermediate & Advanced SEO | | mrodriguez14400 -
XML Sitemap & Bad Code
I've been creating sitemaps with XML Sitemap Generator, and have been downloading them to edit on my pc. The sitemaps work fine when viewing in a browser, but when I download and open in Dreamweaver, the urls don't work when I cut and paste them in the Firefox URL bar. I notice the codes are different. For example, an "&" is produced like this..."&". Extra characters are inserted, producing the error. I was wondering if this is normal, because as I said, the map works fine when viewing online.
Intermediate & Advanced SEO | | alrockn0 -
Google+ Page Question
Just started some work for a new client, I created a Google+ page and a connected YouTube page, then proceeded to claim a listing for them on google places for business which automatically created another Google+ page for the business listing. What do I do in this situation? Do I delete the YouTube page and Google+ page that I originally made and then recreate them using the Google+ page that was automatically created or do I just keep both pages going? If the latter is the case, do I use the same information to populate both pages and post the same content to both pages? That doesn't seem like it would be efficient or the right way to go about handling this but I could be wrong.
Intermediate & Advanced SEO | | goldbergweismancairo0 -
Do image sitemaps provide value for non e-commerce sites?
Is it worth putting together an image sitemap to submit to Google if you're not an e-commerce site? Also, if you're using a CDN like Amazon Web Services (cloudfront), can you even submit an image sitemap? According to Google you need to verify your CDN in webmaster tools if you're going to do so. https://support.google.com/webmasters/answer/178636?hl=en
Intermediate & Advanced SEO | | kking41201 -
Followup question to rand(om) question: Would two different versions (mobile/desktop) on the same URL work well from an SEO perspective and provide a better overall end-user experience?
We read today's rand(om) question on responsive design. This is a topic we have been thinking about and ultimately landing on a different solution. Our opinion is the best user experience is two version (desktop and mobile) that live on one URL. For example, a non-mobile visitor that visits http://www.tripadvisor.com/ will see the desktop (non-responsive) version. However, if a mobile visitor (i.e. iOS) visits the same URL they will see a mobile version of the site, but it is still on the same URL There is not a separate subdomain or URL - instead the page dynamically changes based on the end user's user agent. It looks like they are accomplishing this by using javascript to change the physical layout of the page to match the user's device. This is what we are considering doing for our site. It seems this would simultaneously solve the problems mentioned in the rand(om) question and provide an even better user experience. By using this method, we can create a truly mobile version of the website that is similar to an app. Unfortunately, mobile versions and desktop users have very different expectations and behaviors while interacting with a webpage. I'm interested to hear the negative side of developing two versions of the site and using javascript to serve the "right" version on the same URL. Thanks for your time!
Intermediate & Advanced SEO | | davidangotti0 -
Internal Site Structure Question (URL Formation and Internal Link Design)
Hi, I have an e-commerce website that has an articles section: There is an articles.aspx file that can be reached from the top menu and it holds links to all of the articles as follows: xxx.com/articles/article1.aspx
Intermediate & Advanced SEO | | BeytzNet
xxx.com/articles/article2.aspx I want to add several new articles under a new sections, for example a complete set of articles under the title of "buying guide" and the question is what would be the best way? I was thinking of adding a "computers-buying-guides.aspx" accessible from the top menu / footer and from it linking to: xxx.com/computer-buying-ghudes/what-to-check-prior-to-buying-a-laptop.aspx
xxx.com/computer-buying-ghudes/weight-vs-performance.aspx
etc. Any thoughts / recommendations? Thanks0