Amazon’s Cloud Raining Gifts for 2010



Loading...

By Adam Kraut

January 20, 2010 | Inside the Box | Over the 2009 holiday shopping season, Amazon.com processed an estimated 73 items per second and 6.3 million items on a typical day. That may not come as much of a surprise. But what you may not know is that it’s all powered by cloud computing from Amazon Web Services (AWS). And Amazon isn’t even the biggest AWS customer, according to CEO Werner Vogels. Amazon’s Infrastructure as a Service is so far ahead in the cloud that competitors are struggling to keep pace. AWS is not about to rest on its laurels. Fortunately for the rapidly growing number of AWS users, it is aggressively innovating and improving its services. One might even say that during the 2009 holiday season, it left us some presents under the tree.

One of the major barriers to cloud adoption in enterprise IT departments, as with running data through any hosted service, comes down to security. Most of it boils down to IT staff protecting their jobs or downright paranoia. As my colleague Chris Dagdigian said at Bio-IT World Europe (see “The C Word,” Bio•IT World, Nov 2009), “It’s very funny to see people demanding security practices on the cloud that they’re unable to run in-house.”

To address those customers’ concerns, Amazon recently announced Virtual Private Cloud (VPC) service. VPC offers a secure bridge between existing IT infrastructure and cloud resources through a Virtual Private Network (VPN) connection. With VPC enabled, your company’s security services and firewalls can extend to cover AWS compute resources running on Elastic Cloud Compute (EC2). While this should ease concerns at pharmaceutical companies and other security-conscious IT departments, it does come with the caveat of VPN overhead. If your internal IT bandwidth is poor, it will become the bottleneck when pushing terabytes of data through VPC.

The Data Flow Problem

Another major announcement from AWS is the Import/Export service for Amazon’s Simple Storage Service (S3). Let’s say you have 200 TB of data you want to load into S3 for analysis on EC2. No matter how fat your link to Amazon, it’s going to take a long time to move that data. Dagdigian alluded to this service last April at the Bio-IT World Expo, when he said: “If the ingest problem can be solved… I see petabyte-scale datasets that would flock to utility storage services.” Amazon has answered the call with S3 Import/Export. You put your data on USB or SATA hard drives and send it to Amazon through standard mail. Amazon takes those disks and physically loads them up in their datacenter running S3. After a couple of days, your data are available for processing on EC2 or distributing to customers and colleagues.

When it comes to raw data S3 is a fantastic solution, but what about all the data living in relational databases? Amazon knows that an infinitely scalable relational database is impossible to engineer. As an alternative, AWS has offered SimpleDB, a non-relational distributed database service. In addition there’s Elastic Block Store (EBS) providing elastic disk storage that many customers use underneath of their own managed MySQL, Postgres, and Oracle instances. Amazon realized their customers were spending too much time managing MySQL on top of EC2 and EBS. Relational Database Service (RDS) gives AWS users an API for a self-contained MySQL database instance without having to launch new EC2 servers or deal with EBS volumes and snapshots. Currently in beta, RDS supports up to 20 databases per customer each allowing up to 1TB of storage. There’s nothing to install, configure, or tune. Simply issue a few commands to launch a fully functional database server with the same on-demand pricing we’ve come to appreciate from AWS.

Perhaps the most interesting announcement from AWS is a new pricing model called Spot Instances. While the 10 cents/hour on-demand pricing of EC2 is what made the service initially so popular, James Hamilton, vice president of AWS, calls Spot Instances “a fundamental innovation in how computation is sold.” Spot pricing allows customers to bid on instances effectively balancing the peak and off-peak capacity of EC2. Under this model the spot market drives EC2 pricing. If demand is low you pay less, if demand is high you pay more. Workloads with soft time constraints such as compression, encryption, and exhaustive sampling can be processed at a potentially lower cost than standard EC2 rates.

Amazon is actively engaged in making life easier in the AWS world. According to Vogels, it’s not just about enterprise cost savings but agility in the cloud. “This is not a standard product that is finished. It’s been a continuous improvement process since Day 1.”

Adam Kraut is a scientific consultant at the Bio Team. He can be reached at kraut@bioteam.net

Click here to login and leave a comment.  

0 Comments

Add Comment

Text Only 2000 character limit

Page 1 of 1



White Papers & Special Reports

ClearTrial_BriefingOn
eClinical Trial Technologies Revolutionizing Clinical Development Efficiency
Sponsored by ClearTrial
This Bio-IT World BriefingON report, sponsored by ClearTrial, presents a selection of recent stories from Bio•IT World and sister publication, eCliniqua, that illustrate how new technologies and approaches can have a profound impact on the management and execution of clinical trials.


oracle_RDC
Remote Data Capture:Acquisition and Analysis
Sponsored by Oracle

See why Electronic Data Capture (EDC) is gaining traction in the pharmaceutical
clinical trials arena. Today approximately half of all clinical trials are conducted
electronically, and the figure is rapidly rising. Report includes contributions from
Oracle Health Sciences, Pfizer, PPD, and C3i.

 



bluearc_whitepaper0710
Breaking Through Real World Storage Barriers in Next Generation Sequencing
Sponsored by BlueArc

To effectively and efficiently manage the rapidly increasing needs of an NGS research environment numerous considerations for data management become important in moving today’s terabyte and petabyte levels of data. Some key concerns can include:

  • Maintaining enough  headroom to handle additional and unplanned data growth
  • How to address mixed workloads
  • Working with multiple file and network protocols
  • Dealing with aging data
  • Optimizing varied storage subsystems already in place while preparing for new floods of data to come

This paper investigates trends and solutions in addressing these issues, and more, for life science professionals.



Job Openings

mskc logo
Software Engineer – Computational Biology Center

Memorial Sloan-Kettering Cancer Center seeks an Engineer to design and develop complex data analysis systems in support of cancer genomics research projects at the Computational Biology Center. Qualified candidate will have a BA, 5+ years of software development experience and expert knowledge of Java, SQL, and HTML.

Apply: www.mskcciscareers.org.  Equal opportunity and affirmative action employer.

Web Symposia
Loading...

Bio-IT World proudly presents the Bio-IT World Web Symposia Series!

Covering a broad array of topics within the life sciences and drug development industries, these complimentary 90-minute web symposiums provide an interactive platform to learn more about cutting-edge bio-IT topics through expert analysis and discussions.

Leveraging BPM to Increase Efficiencies in Clinical Trial Case Management
Recorded on August 3, 2010
Sponsored by: Pegasystems
Program Details | Access Recording 

Next Gen Data Management for Next Gen Life Sciences
September 8, 2010 | 1:00pm - 2:30pm EST
Sponsored by Quantum
Program Details | Register Today 

 


Loading...

For reprints and/or copyright permission, please contact The YGS Group, 3650 West Market Street, York, PA;

(717) 505-9701 ext. 125, or via email to Ashley.Zander@theYGSgroup.com.