Indexing a several millions pages new website
-
Hello everyone,
I am currently working for a huge classified website who will be released in France in September 2013.
The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it.
Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ?
The website will cover about 300 jobs :
- In all region (= 300 * 22 pages)
- In all departments (= 300 * 101 pages)
- In all cities (= 300 * 37 000 pages)
Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ?
More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ?
One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet.
Thanks for your help !
Best Regards,
Raphael
-
Hello everyone,
Thanks for sharing your experience and your answers, it's greatly appreciated.
The website is build in order to avoid cookie cutter pages : each page will have unique content from classifieds (unique because classifieds won't be indexed in the first place, to avoid having too much pages).
The linking is as well though in order for each page to have permanents internal links in a logical way.
I understand from your answers that it is better to take time and to index the site step by step : mostly according to the number and the quality of classifieds (and thus the content) for each jobs/locality. It's not worth to index pages without any classifieds (and thus unique content) as they will be cut off by Google in a near future.
-
I really don't think Google likes it when you release a website that big. It would much rather you build it slowly. I would urge you to have main pages and noindex the sub categories.
-
We worked in partnership with a similar large scale site last year and found the exact same. Google simply cut off 60% of our pages out of the index as they were cookie cutter.
You have to ensure that pages have relevant, unique and worthy content. Otherwise if all your doing is replacing the odd word here and there for the locality and job name its not going to work.
Focus on having an on going SEO campaign for each target audience be that for e.g. by job type / locality / etc.
-
If you plan to get a website that big indexed you will need to have a few things in order...
First, you will need thousands of deep links that connect to hub pages deep within the site. These will force spiders down there and make them chew their way out through the unindexed pages. These must be permanent links. If you remove them then spiders will stop visiting and google will forget your pages. For a 10 million page site you will need thousands of links hitting thousands of hub pages.
Second, for a site this big.... are you going to have substantive amounts of unique content? If your pages are made from a cookie cutter and look like this....
"yada yada yada yada yada yada yada yada SEO job in Paris yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada send application to Joseph Blowe, 11 Anystreet, Paris, France yada yada yada yada yada yada yada yadayada yada yada yada yada yada yada yada yada yada yada yada yada yada yada yada"
.... then Google will index these pages, then a few weeks to a few months later your entire site might receive a Panda penalty and drop from google.
Finally... all of those links needed to get the site in the index... they need to be Penguin proof.
It is not easy to get a big site in the index. Google is tired of big cookie cutter sites with no information or yada yada content. They are quickly toasted these days.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging website got indexed by google
Our staging website got indexed by google and now MOZ is showing all inbound links from staging site, how should i remove those links and make it no index. Note- we already added Meta NOINDEX in head tag
Intermediate & Advanced SEO | | Asmi-Ta0 -
New page disappeared from the ranks, under dubious circumstances
I've had an odd situation happen today. Published a blog post and it ranked No 6 within 2 or 3 hours, just come back now (About 12 hours later) and it has completely vanished! I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Any thoughts would be gratefully received.
Intermediate & Advanced SEO | | seoman100 -
How long should it take for indexed pages to update
Google has crawled and indexed my new site, but my old URLS appear in the search results. Is there a typical amount of time that it takes for Google to update the URL's displayed in search results?
Intermediate & Advanced SEO | | brianvest0 -
Website Indexing Issues - Search Bots will only crawl Homepage of Website, Help!
Hello Moz World, I am stuck on a problem, and wanted to get some insight. When I attempt to use Screaming Spider or SEO Powersuite, the software is only crawling the homepage of my website. I have 17 pages associated with the main domain i.e. example.com/home, example.com/sevices, etc. I've done a bit of investigating, and I have found that my client's website does not have Robot.txt file or a site map. However, under Google Search Console, all of my client's website pages have been indexed. My questions, Why is my software not crawling all of the pages associated with the website? If I integrate a Robot.txt file & sitemap will that resolve the issue? Thanks ahead of time for all of the great responses. B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Bing not indexing website for some weird quality reason
Hi,I have a strange problem. My website www.dealwithautism.com is just 2 months old and have 40+ high quality articles that are already beginning to see some organic traffic from Google without any off page SEO (link building, etc). By quality articles I mean:
Intermediate & Advanced SEO | | DealWithAutism
1. Each article is 1500+ words of unique and highly relevant content with solid on page SEO (images may be reused from Google images). Moz page grader=A for most pages 2. Pretty well structured (with good number of internal links) 3. Entire site (all pages) delivered over https SSL using 301 redirect 4. No malware or spammy backlinks 5. NAP details and social signals available 6. Already ranking top10 in google SERPs for long tail KWs 7. According to Google Webmasters, no crawl errors except for a few (less than 10) 404s 8. Fully responsive - all pages tagged as "Mobile Friendly" by Google However, since day 1, Bing has not indexed a single page on my website (xml sitemap was updated from day 1) even though they are crawling the site. I recently raised an Email ticket and this was their response: "Upon checking, it appears that your site did not meet the standards set by Bing to get indexed the last time it was crawled. However, we will be looking further into this issue along with the Product Group to review the content of your website for re-evaluation. We currently do not have an ETA for the update but please be assured that we will get back to you as soon as they become available." Now based on my previous experience, this could take months. Following are just a few sample pages on the website: https://www.dealwithautism.com/oppositional-defiant-disorder-treatment-and-odd-case-study/ https://www.dealwithautism.com/tourette-syndrome-symptoms-treatment-for-tourettes/ https://www.dealwithautism.com/autism-test-for-toddlers/ I believe the quality of these pages are quite good for a small new website.
Then what does Bing mean by "website not meeting standards"? Am I missing a piece of the puzzle? I would have thought that Google was more quality focused than Bing but my SEO performance in Google is currently exceeding my expectation. Can you experts please help me out here?0 -
Moving half my website to a new website: 301?
Good Morning! We currently have two websites which are driving all of our traffic. Our end goal is to combine the two and fold them into each other. Can I redirect the duplicate content from one domain to our main domain even though the URL's are different. Ill give an example below. (The domains are not the real domains). The CEO does not want to remove the other website entirely yet, but is willing to begin some sort of consolidation process. ABCaddiction.com is the main domain which covers everything from drug addiction to dual diagnosis treatment. ABCdualdiagnosis.com is our secondary website which covers everything as well. Can I redirect the entire drug addiction half of the website to ABCaddiction.com? With the eventual goal of moving everything together.
Intermediate & Advanced SEO | | HashtagHustler0 -
Thousands of Web Pages Disappered from Google Index
The site is - http://shop.riversideexports.com We checked webmaster tools, nothing strange. Then we manually resubmitted using webmaster tools about a month ago. Now only seeing about 15 pages indexed. The rest of the sites on our network are heavily indexed and ranking really well. BUT the sites that are using a sub domain are not. Could this be a sub domain issue? If so, how? If not, what is causing this? Please advise. UPDATE: What we can also share is that the site was cleared twice in it's lifetime - all pages deleted and re-generated. The first two times we had full indexing - now this site hovers at 15 results in the index. We have many other sites in the network that have very similar attributes (such as redundant or empty meta) and none have behaved this way. The broader question is how to do we get the indexing back ?
Intermediate & Advanced SEO | | suredone0 -
Our website scores A but on google we are still on 7th page
Hi all, I have run on page keyword optimizations with exact terminology used to find our company service or our competition on google. We have ranked A, with almost all points complete. I did the same for our main competitor and they ranked F. Then i did page positioning on Google and they get on page 1 fifth line and we get page 7. We have plenty of unique content and extensive website.
Intermediate & Advanced SEO | | EMGCSR
Could there be any other reason than reason for this other than backlinks? Many thanks for your help.0