Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Http and https protocols being indexed for e-commerce website
Hi team, Our new e-commerce website has launched and I've noticed both http and https protocols are being indexed. www.mountainjade.co.nz Our old website was http with only the necessary pages running https (cart, checkout etc). No https pages were indexed and you couldn't access a https page if you manually typed it into the browser. We outrank our competition by a mile, so I'm treading carefully here and don't want to undo the progress we made on the old site, so I have a few questions: 1. How exactly do we remove one protocol from the index? We are running on Drupal. We tried a hard redirect from https to http and excluded the relevant pages (cart, login etc from the redirect), but found that you could still access https pages if you we're in the cart (https) and then pressed back on the browser button for example. At that point you could browse the entire site on https. 2. Is the safer option to emulate what we had in place on the old website e.g http with only the necessary pages being https, rather than making the switch to sitewide https? I've been struggling with this one, so any help would be much appreciated. Jake S
Intermediate & Advanced SEO | | Jacobsheehan0 -
Should we 301 redirect old events pages on a website?
We have a client that has an events category section that is filled to the brim with past events webpages. Another issue is that these old events webpages all contain duplicate meta description tags, so we are concerned that Google might be penalizing our client's website for this issue. Our client does not want to create specialized meta description tags for these old events pages. Would it be a good idea to 301 redirect these old events landing pages to the main events category page to pass off link equity & remove the duplicate meta description tag issue? This seems drastic (we even noticed that searchmarketingexpo.com is keeping their old events pages). However it seems like these old events webpages offer little value to our website visitors. Any feedback would be much appreciated.
Intermediate & Advanced SEO | | RosemaryB0 -
Any idea why this page isn't indexing?
Hi Mozzers, Question for all of you. Any idea why this page isn't indexing in Google? It's indexing in Bing, but we don't see it in Google's results. It doesn't seem like we have any noindex tags or anyway issues with the robots files either. Any ideas? http://ohva.k12.com/
Intermediate & Advanced SEO | | petertong230 -
Amount of pages indexed for classified (number of pages for the same query)
I've notice that classified usually has a lots of pages indexed and that's because for each query/kw they index the first 100 results pages, normally they have 10 results per page. As an example imagine the site www.classified.com, for the query/kw "house for rent new york" there is the page www.classified.com/houses/house-for-rent-new-york and the "index" is set for the first 100 SERP pages, so www.classified.com/houses/house-for-rent-new-york www.classified.com/houses/house-for-rent-new-york-1 www.classified.com/houses/house-for-rent-new-york-2 ...and so on. Wouldn't it better to index only the 1st result page? I mean in the first 100 pages lots of ads are very similar so why should Google be happy by indexing lots of similar pages? Could Google penalyze this behaviour? What's your suggestions? Many tahnks in advance for your help.
Intermediate & Advanced SEO | | nuroa-2467120 -
Too early? Rebuilt website from .asp to .php... Internal pages not ranking at all!
Am I jumping the gun on expecting results? I recently switched our ecommerce website from .asp to .php (new site went up May 15th) but we did not switch domain names. we seems to be doing better until a few days ago when traffic took a steep drop... (not like we were getting that much in the first place either) I was wondering if I'm doing something completely wrong or do i need to wait longer? Are big swings normal when relaunching a website? am I just being too anxious? The internal pages are getting no juice and I dont know why... I know i need to build on the links to the site? but am I doing something else completely wrong to see the orgranic search results drop down to almost nothing? I'm really new to SEO and would love if i could get another opinion on http://www.moondoggieinc.com. Thanks! Kristy
Intermediate & Advanced SEO | | KristyO0 -
Does google detect all updated page with new links
as paid links? Example: A PR 4 page updates the page a year later with new links. Does Google discredit these links as being fishy?
Intermediate & Advanced SEO | | imageworks-2612900 -
How long until Sitemap pages index
I recently submitted an XML sitemap on Webmaster tools: http://www.uncommongoods.com/sitemap.xml Once Webmaster tools downloads it, how long do you typically have to wait until the pages index ?
Intermediate & Advanced SEO | | znotes0 -
New website : SEO approach and strategy
We are a small startup company looking at starting a complaints website in India having user generated content(complaints) . Would some one help me to draw overall strategies on how we can achieve good traffic over one year. We realise that there is no magic wand to improve positions in search ranking for a site which hosts user generated content esp. since we dont know what key words to target. In this context i was looking form some expert suggestion on how we can go ahead with the SEO for the next 1 year .. We are open to paying for the services if you prove that you have the required experience . Otherwise any suggestions from other who have experience in such situations are welcome ...
Intermediate & Advanced SEO | | ShoutOut0