Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No Index No follow instead of Rel canoncical on product pages
Hi all, we handle our product pages no with rel canonical now, we have 1 url that is indexed http://www.prams.net/cam-combi-family the other colours have different urls like http://www.prams.net/cam-combi-family-3-in-1-pram-reversible-seat-car-seat-grey-d which canonicalize to the indexed page. Google still crawls all those pages. For crawl budget reasons we want to use "no index, no follow" instead on these pages (the pages for the other colours)? Google would then crawl fewer pages more often? Does this make sense? Are their any downsides doing it? Thanks in advance Dieter
Intermediate & Advanced SEO | | Storesco1 -
Should I write a new page or a blog post?
I am trying to rank for a local SEO term on a website for a national company. Should I write an optimized blog post, or optimized site page? Does it make a difference? Thanks!
Intermediate & Advanced SEO | | aj6130 -
How to check if the page is indexable for SEs?
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on. So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods: Check the URL in robots.txt file (if it's not disallowed) Check page metas (if there are not noindex meta) Check if page is the same for unregistered users (for those pages only available for registered users of the site) Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines? Thanks in advance!
Intermediate & Advanced SEO | | boostaman0 -
Is it worth redirecting an old domain name which was hacked to my new website?
I had a website which got hacked and malware added to it. I have since closed that website down but I still have the domain name. That domain name prior to the malware was incredibly well ranking for its niche and had a good range of high quality links to it and a domain age of 6 years. I'm now creating a new website which is similar to the old one (the same but with a different platform and layout). Is it a good or bad idea to redirect the old domain name to the new website?
Intermediate & Advanced SEO | | james.rose0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | | SeoMartin10 -
Duplicate content on index.htm page
How do I avoid duplicate content on the index.htm page . I need to redirect the spider from the /index.htm file to the main root of http://www.manandhisvan.com.au and hence avoid duplicate content. Does anyone know of a foolproof way of achieving this without me buggering up the complete site Cheers Freddy
Intermediate & Advanced SEO | | Fatfreddy0 -
Landing page indexed and ranking in less then 24 hours
Hi, I got a landing page which went up last night about 11pm. Its been indexed and ranked since then. Its a EMD and has about 600 words of unqiue content. It currently sits on page 9 for what I would say is a non competitive term (the top result is not an EMD and has 10 backlinks from the same site, which has no PR). Now my question is this: Would you say that page 9 is the given position Google thinks this website should sit at? Or because its so new could I very much expect some more movement? Basically up the rankings? Cheers
Intermediate & Advanced SEO | | activitysuper0 -
Creating new pages for geo targeted keywords
What's the best way to go about creating new pages for geo targeted keywords for a business? I ask because, these keywords are areas that they provide service to, but the brick and mortar business is not located. I've already run into problems trying to put too many locations onto one page. What's the best way to attack content for these new pages in order to get these geo keywords in there?
Intermediate & Advanced SEO | | MichaelWeisbaum0