XML Sitemap Questions For Big Site
-
Hey Guys,
I have a few question about XML Sitemaps.
-
For a social site that is going to have presonal accounts created, what is the best way to get them indexed? When it comes to profiles I found out that twitter (https://twitter.com/i/directory/profiles) and facebook (https://www.facebook.com/find-friends?ref=pf) have directory pages, but Google plus has xml index pages (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml).
-
If we go the XML route, how would we automatically add new profiles to the sitemap? Or is the only option to keep updating your xml profiles using a third party software (sitemapwriter)?
-
If a user chooses to not have their profile indexed (by default it will be index-able), how do we go about deindexing that profile? Is their an automatic way of doing this?
-
Lastly, has anyone dappled with google sitemap generator (https://code.google.com/p/googlesitemapgenerator/) if so do you recommend it?
Thank you!
-
-
Thanks for the input guys!
I believe Twitter and Facebook don't run sitemaps for their profiles, what they have is a directory for all their profiles (twitter: https://twitter.com/i/directory/profiles Facebook: https://www.facebook.com/find-friends?ref=pf) and use that to get their profiles crawled, however I feel the best approach is through xml sitemaps and Google plus actually does this with their profiles (http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml) and quite frankly I would rather follow Google then FB or Twitter... I'm just now wondering how the hell they upkeep that monster! Does it create a new sitemap everything one hits 50k? When do they update their sitemap? daily, weekly, or monthly and how?
One other question I have is if their is any penalties to getting a lot of pages crawled at once? Meaning one day we have 10 pages and the next we have 10,000 pages or 50,000 pages...
Thanks again guys!
-
I guess the way I was explaining it was for scalabilty on a large site. You have to think a site like fb or twitter with hundreds of millions of users still has the limitation of only having 50k records in a site map. So if they are running site maps, they have hundreds.
-
I'm not a web developer, so this might may be wrong, but I feel like it might be easier to just add every user to the xml sitemap and then add a noindex robots meta tag ons users pages that don't want to their profiles to be indexed.
-
If it were me and someone were asking me to design a system like that, I would design it in a few parts.
First I would create an application that handled the sitemap minus profiles, just for your tos, sign up pages, terms, and what ever pages like that.
Then I would design a system that handled the actual profiles. It would be pretty complex and resource intensive as the site grew. But the main idea flows like this
Start generation, grab the user record with id 1 in the database, check to see if indexable (move to next if not), see what pages are connected, write to xml file, loop back and start with record #2.
There are a few concessions you have to make, you need to keep up with the number of records in a file before you start another file. You can only have 50k records in one file.
The way I would handle the process in total for a large site would be this, sync the required tables via a weekly or daily cron to another instance (server). Call the php script (because that is what I use) that creates the first sitemap for the normal site wide pages. At the end of that site map, put a location for the user profile sitemap, then at the end of the scrip, execute the user profile site map generating script. At the end of each site map, put the location of the next site map file, because as you grow it might take 2-10000 site map files.
One thing that I would ensure to do is get a list of crawler ip addresses and in your .htaccess have an allow / deny rule. That way you can make the site maps only visible to the search engines.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is possible to submit a XML sitemap to Google without using Google Search Console?
We have a client that will not grant us access to their Google Search Console (don't ask us why). Is there anyway possible to submit a XML sitemap to Google without using GSC? Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
How to answer for question "why xyz site is ranking for abc keyword" and not our website
Hi All, This is a layman question but would like to get a concrete answer for. I would like to know how to answer the questions like "Why our competitor is ranking for keyword ABC but not us"? What metrics or data can I showcase that gives logical answer. Please help in this regard. Thanks!
Intermediate & Advanced SEO | | Avin1230 -
Video Sitemap Creation Question
I have created a sitemap file as per Google Web Master Tools instructions. I have it saved as a .txt file. Am I right in thinking that this needs to be uploaded as a .xml file? If so, how do I convert this to a XML? I have tried but it seems to corrupt - there must be a simple way to do this?!
Intermediate & Advanced SEO | | DHS_SH0 -
Why Did My Site Go Limp On Me?
One of my clients was once in the #1 position for "Philadelphia interior designer" and other related terms, but her site has dropped significantly. Still it is on the first page, but far from its former glory. http://www.interiorsbydonnahoffman.com is the site. What really confuses me is why in her home turf search of "Bucks County Interior Designer" a competitor, http://www.miriamansellinteriors.com, is above her in the SERPS. According to OSE her competitor has a PA of 32 vs my client's 39. My client has 35 Linking Root Domains (and some of high quality) compared to just 11 for the competition. In all aspects her competitor looks weaker and less relevant to me. Her site has been weak in the SERPs since May/June. We are redesigning her site- she has a high bounce rate compared to my other interior design clients, something like 55%. Any insights from y'all?
Intermediate & Advanced SEO | | dfhytrwy0 -
XML Sitemap index within a XML sitemaps index
We have a similar problem to http://www.seomoz.org/q/can-a-xml-sitemap-index-point-to-other-sitemaps-indexes Can a XML sitemap index point to other sitemaps indexes? According to the "Unique Doll Clothing" example on this link, it seems possible http://www.seomoz.org/blog/multiple-xml-sitemaps-increased-indexation-and-traffic Can someone share an XML Sitemap index within a XML sitemaps index example? We are looking for the format to implement the same on our website.
Intermediate & Advanced SEO | | Lakshdeep0 -
Question For Anyone
Hi All, Would you be able to answer one small question If you go to Australian Google - www.google.com.au and search for "loans" on positions number # 38 you will see the following site paydayloansyouknow.com.au . It has only 3 pages , 0 links, PA 1,and DA 1 How it's possible to archive such results? This is the print screen in case you dont see what i am asking about
Intermediate & Advanced SEO | | Webdeal
( http://www.freeimagehosting.net/oa75d Will appreciate any answer?0 -
Site changes lead to big questions
I'm making some changes to my business that will cause me to move my blog to a new domain. The existing site will serve as a sales campaign for our full service programs and I want to keep visitors focused on that campaign. The old site will serve much like a mini site with a sales letter and video sales letter. In moving the blog content to another page - I found a post from Rand from a few years ago http://www.seomoz.org/blog/expectations-and-best-practices-for-moving-to-or-launching-a-new-domain. The way I wanted to approach this was to remove the content from the old site, and then resubmit the site map to Google for indexing. Of course they'll notice that the blog pages are gone. (probably a load of 404's) After perhaps a week, I'd repost the content (about 50 posts) on the new domain, which will be little more than a blog. I'd like some input on the way to approach this. Should I... a) Follow Rand's formula? b) Go with my idea (sort of the brute force model)? c) Consider an alternative method? It's probably worth mentioning that none of these posts have high search engine rankings. I appreciate your input Mozzers!
Intermediate & Advanced SEO | | sdennison0 -
Press Release Sites
Ok, I am getting a lot of conflicting information about press release sites. i have been doing press release's for a while (mostly manually), I have also tried a few companies that claim to do it well (never do). After the Panda update the PR sites I have been using are just not as effective. Does anyone else have this problem or are there better PR sites that can be recommended.
Intermediate & Advanced SEO | | TomBarker820