Monday, October 4, 2010

Facebook how Not to manage infrastructure!

The Facebook Data Center





With more than 500 million active users, Facebook is the busiest site on the Internet and has built an extensive infrastructure to support this rapid growth. The social networking site was launched in February 2004, initially out of Facebook founder Mark Zuckerberg’s dorm room at Harvard University and using a single server. The company’s web servers and storage units are now housed in data centers around the country.



Each data center houses thousands of computer servers, which are networked together and linked to the outside world through fiber optic cables. Every time you share information on Facebook, the servers in these data centers receive the information and distribute it to your network of friends.



We’ve written a lot about Facebook’s infrastructure, and have compiled this information into a series of Frequently Asked Questions. Here’s the Facebook Data Center FAQ (or “Everything You Ever Wanted to Know About Facebook’s Data Centers”).



How Big is Facebook’s Internet Infrastructure?

Facebook is currently the world’s most popular web site, with more than 690 billion page views each month, according to metrics from Google’s DoubleClick service. Facebook currently accounts for about 9.5 percent of all Internet traffic, slightly more than Google, according to HitWise.



Facebook requires massive storage infrastructure to house its enormous stockpile of photos, which grows steadily as users add 100 million new photos every day. People share more than 30 billion pieces of content on Facebook each month. In addition, the company’s infrastructure must support platform services for more than 1 million web sites and 550,000 applications using the Facebook Connect platform.



To support that huge activity, Facebook operates at least nine data centers on both coasts of the United States, and is in the process of building its first company-built data center in Oregon. Although more than 70 percent of Facebook’s audience is in other countries, none of the company’s data centers are located outside the United States.



For most of its history, Facebook has managed its infrastructure by leasing “wholesale” data center space from third-party landlords. Wholesale providers build the data center, including the raised-floor technical space and the power and cooling infrastructure, and then lease the completed facility. In the wholesale model, users can occupy their data center space in about five months, rather than the 12 months needed to build a major data center. This has allowed Facebook to scale rapidly to keep pace with the growth of its audience.



In January 2010 Facebook announced plans to build its own data centers, beginning with a facility in Prineville, Oregon. This typically requires a larger up-front investment in construction and equipment, but allows greater customization of power and cooling infrastructure.



Where are Facebook’s Data Centers Located?



Facebook currently leases space in about six different data centers in Silicon Valley, located in Santa Clara and San Jose, and at least one in San Francisco. The company has also leased space in three wholesale data center facilities in Ashburn, Virginia. Both Santa Clara and Ashburn are key data center hubs, where hundreds of fiber networks meet and connect, making them ideal for companies whose content is widely distributed.



Facebook’s first company-built data center is nearing completion in Prineville, Oregon. If Facebook’s growth continues at the current rate, it will likely require a larger network of company-built data centers, as seen with Google, Microsoft, Yahoo and eBay.



How Big Are Facebook’s Server Farms?





A rendering of an aerial view of the Facebook data center in Prineville, Oregon.

As Facebook grows, its data center requirements are growing along with it. The new data center Oregon is a reflection of this trend.



In the data centers where it currently operates, Facebook typically leases between 2.25 megawatts and 6 megawatts of power capacity, or between 10,000 and 35,000 square feet of space. Due to the importance of power for data centers, most landlords now price deals using power as a yardstick, with megawatts replacing square feet as the primary benchmark for real estate deals.



Facebook’s new data center in Oregon will be much, much larger. The facility was announced as being 147,000 square feet. But as construction got rolling, the company announced plans to add a second phase to the project, which will add another 160,000 square feet. That brings the total size of the Prineville facility to 307,000 square feet of space – larger than two Wal-Mart stores.

 
 
 
 
 
 
How Many Servers Does Facebook Have?








This chart provides a dramatic visualization of Facebook’s infrastructure growth. It documents the number of servers used to power Facebook’s operations.



“When Facebook first began with a small group of people using it and no photos or videos to display, the entire service could run on a single server,” said Jonathan Heiliger, Facebook’s vice president of technical operations.



Not so anymore. Technical presentations by Facebook staff suggest that as of June 2010 the company was running at least 60,000 servers in its data centers, up from 30,000 in 2009 and 10,000 back in April 2008.



There are companies with more servers (see Who Has the Most Web Servers? for details). But the growth curve shown on the chart doesn’t even include any of the servers that will populate the Oregon data center – which may be the first of multiple data centers Facebook builds to support its growth.



What kind of servers does Facebook use?

Facebook doesn’t often discuss which server vendors it uses. In 2007 it was buying a lot of servers from Rackable (now SGI), and is also known to have purchased servers from Dell, which customizes servers for its largest cloud computing customers.



Facebook VP of Technical Operations Jonathan Heiliger has sometimes been critical of major server vendors’ ability to adapt their products to the needs of huge infrastructures like those at Facebook, which don’t need many of the features designed for complex enterprise computing requirements. “Internet scale” companies can achieve better economics with bare bones servers that are customized for specific workloads.



In a conference earlier this year, Heiliger identified multi-core server vendors Tilera and SeaMicro as “companies to watch” for their potential to provide increased computing horsepower in a compact energy footprint.



But reports that Facebook planned to begin using low-power processors from ARM - which power the iPhone and many other mobile devices - proved to be untrue. “Facebook continuously evaluates and helps develop new technologies we believe will improve the performance, efficiency or reliability of our infrastructure,” Heiliger said. “However, we have no plans to deploy ARM servers in our Prineville, Oregon data center.”





A look at the fully-packed server racks inside a Facebook data center facility.

What kind of software does Facebook Use?

Facebook was developed from the ground up using open source software. The site is written primarily in the PHP programming language and uses a MySQL database infrastructure. To accelerate the site, the Facebook Engineering team developed a program called HipHop to transform PHP source code into C++ and gain performance benefits.



Facebook has one of the largest MySQL database clusters anywhere, and is the world’s largest users of memcached, an open source caching system. Memcached was an important enough part of Facebook’s infrastructure that CEO Mark Zuckerberg gave a tech talk on its usage in 2009.



Facebook has built a framework that uses RPC (remote procedure calls) to tie together infrastructure services written in any language, running on any platform. Services used in Facebook’s infrastructure include Apache Hadoop, Apache Cassandra, Apache Hive, FlashCache, Scribe, Tornado, Cfengine and Varnish.



How much Does Facebook Spend on Its Data Centers?

An analysis of Facebook’s spending with data center developers indicates that the company is now paying about $50 million a year to lease data center space, compared to about $20 million when we first analyzed its leases in May 2009.



The $50 million a year includes spending is for leases, and doesn’t include the cost of the Prineville project, which has been estimated at between $180 million and $215 million. It also doesn’t include Facebook’s investments in server and storage hardware, which is substantial.



Facebook currently leases most of its data center space from four companies: Digital Realty Trust, DuPont Fabros Technology, Fortune Data Centers and CoreSite Realty.



Here’s what we know about Facebook’s spending on its major data center commitments:



•Facebook is paying $18.1 million a year for 135,000 square feet of space in data center space it leases from Digital Realty Trust (DLR) in Silicon Valley and Virginia, according to data from the landlord’s June 30 quarterly report to investors.

•The social network is also leasing data center space in Ashburn, Virginia from DuPont Fabros Technology(DFT). Although the landlord has not published the details of Facebook’s leases, data on the company’s largest tenants reveals that Facebook represents about 15 percent of DFT’s annualized base rent, which works out to about $21.8 million per year.

•Facebook has reportedly leased 5 megawatts of critical load – about 25,000 square feet of raised-floor space – at a Fortune Data Centers facility in San Jose.

•In March, Facebook agreed to lease an entire 50,000 square foot data center that was recently completed by CoreSite Realty in Santa Clara.

•Facebook also hosts equipment in a Santa Clara, Calif. data center operated by Terremark Worldwide (TMRK), a Palo Alto, Calif. facilityoperated by Equinix (EQIX) and at least one European data center operated by Telecity Group. These are believed to be substantially smaller footprints than the company’s leases with Digital Realty and DuPont Fabros.

That adds up to an estimated $40 million for the leases with the Digital Realty and DuPont Fabros, When you add in the cost of space for housing equipment at Fortune, CoreSite, Terremark, Switch and Data, Telecity and other peering arrangements to distribute content, we arrive at an estimate of at least $50 million in annual data center costs for Facebook.



Facebook’s costs remain substantially less than what some other large cloud builders are paying for their data center infrastructure. Google spent $2.3 billion on its custom data center infrastructure in 2008, while Microsoft invests $500 million in each of its new data centers. Those numbers include the facilities and servers.



How Many People Are Needed to Run Facebook’s Data Centers?

As is the case with most large-scale data centers, Facebook’s facilities are highly automated and can be operated with a modest staff, usually no more than 20 to 50 employees on site. Facebook has historically maintained a ratio of 1 engineer for every 1 million users, although recent efficiencies have boosted that ratio to 1 engineer for every 1.2 million users.



Facebook’s construction project in Prineville is expected to create more than 200 jobs during its 12-month construction phase, and the facility will employ at least 35 full-time workers and dozens more part-time and contract employees.