Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New page disappeared from the ranks, under dubious circumstances
I've had an odd situation happen today. Published a blog post and it ranked No 6 within 2 or 3 hours, just come back now (About 12 hours later) and it has completely vanished! I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Any thoughts would be gratefully received.
Intermediate & Advanced SEO | | seoman100 -
Redirected Old Pages Still Indexed
Hello, we migrated a domain onto a new Wordpress site over a year ago. We redirected (with plugin: simple 301 redirects) all the old urls (.asp) to the corresponding new wordpress urls (non-.asp). The old pages are still indexed by Google, even though when you click on them you are redirected to the new page. Can someone tell me reasons they would still be indexed? Do you think it is hurting my rankings?
Intermediate & Advanced SEO | | phogan0 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
Review website - multiple pages of reviews of same item
Hi - I've been looking at review websites like tripadvisor - I looked at Tripadvisor's hotel reviews and the reviews may extend for several pages, yet they don't change title tags and so on, on each of the multiple pages? Surely it would be a good idea, from an SEO perspective, to change title tags (and perhaps other tags) for each of the multiple pages - even if the change was slight - e.g. "The White Swan Hotel - Reviews p 1" - "The White Swan Hotel - Reviews p 2" and so on? Am I missing something?
Intermediate & Advanced SEO | | McTaggart0 -
Want to merge high ranking niche websites into a new mega site, but don't want to lose authority from old top level pages
I have a few older websites that SERP well, and I am considering merging some or all of them into a new related website that I will be launching regardless. My old websites display real estate listings and not much else. Each website is devoted to showing homes for sale in a specific neighborhood. The domains are all in the form of Neighborhood1CityHomes.com, Neighborhood2CityHomes.com, etc. These sites SERP well for searches like "Neighborhood1 City homes for sale" and also "Neighborhood1 City real estate" where some or all of the query is in the domain name. Google simply points to the top of the domain although each site has a few interior pages that are rarely used. There is next to zero backlinking to the old domains, but each links to the other with anchor text like "Neighborhood1 Cityname real estate". That's pretty much the extent of the link profile. The new website will be a more comprehensive search portal where many neighborhoods and cities can be searched. The domain name is a nonsense word .com not related to actual key words. The structure will be like newdomain.com/cityname/neighborhood-name/ where the neighborhood real estate listings are that would replace the old websites, and I'd 301 the old sites to the appropriate internal directories of the new site. The content on the old websites is all on the home page of each, at least the content for searches that matter to me and rank well, and I read an article suggesting that Google assigns additional authority for top level pages (can I link to that here?). I'd be 301-ing each old domain from a top level to a 3rd level interior page like www. newdomain/cityname/neighborhood1/. The new site is better than the old sites by a wide margin, especially on mobile, but I don't want to lose all my top positions for some tough phrases. I'm not running analytics on the old sites in question, but each of the old sites has extensive past history with AdWords (which I don't run any more). So in theory Google knows these old sites are good quality.
Intermediate & Advanced SEO | | Gogogomez0 -
Significantly reducing number of pages (and overall content) on new site - is it a bad idea?
Hi Mozzers - I am looking at new site (not launched yet) - it contains significantly fewer pages than the previous site - 35 pages rather than 107 before - content on the remaining pages is plentiful but I am worried about the sudden loss of a significant "chunk" of the website - significantly cutting the size of a website must surely increase the risks of post-migration performance problems? Further info - the site has run an SEO contract with a large SEO firm for several years. They don't appear to have done anything beyond tinkering with homepage content - all the header and description tags are the same across the current website. 90% of site traffic currently arrives on the homepage. Content quality/volume isn't bad across most of the current site. Thanks in advance for your input!
Intermediate & Advanced SEO | | McTaggart0 -
Why Is Google Indexing These Product Pages On Shopify?
How can we communicate to Google the exact product pages we'd like indexed on our site? We're an apparel company that uses Shopify as our ecommerce platform. Website is sportiqe.com. Currently, Google is indexing all types of different pages on our site. **Example of a product page we want indexed: ** Product Page: sportiqe.com/products/PRODUCT-TITLE (Like This) **Examples of product pages being indexed: ** sportiqe.myshopify.com/products/PRODUCT-TITLE sportiqe.com/collections/COLLECTION-NAME/products/PRODUCT-TITLE See attached for an example of how two different "Boston Celtics Grateful Dead" shirts are being indexed. Any suggestions? We've used both Shopify and Google Webmaster tools to set our preferred domain (sportiqe.com). We've also added this snippet of code to our site three months ago thinking that would do the trick... {% if template == 'product' %}{% if collection %} {% endif %}{% endif %} sKwNZOl
Intermediate & Advanced SEO | | farmiloe0 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0