Contact Us | Request Support | Monitoring Portal | Customer Portal | *

1-650-964-9100

  • Home
  • What is Cloud Computing?
  • Services
    • PrimaCloud Enterprise Cloud Computing
      • Features & Benefits
      • Component Services
      • Virtual Private Data Centers
      • Performance
      • Reliability
      • Security
    • PrimaSys Managed Private Cloud Deployments
      • Choosing Private Cloud
      • Implementation
      • PrimaSys Case Studies
    • PrimaCare Operations-as-a-Service
      • OaaS Detailed Description
      • OaaS Plan Comparison
      • Professional Services
      • Highly Available Cloud Cpanel
    • PrimaView Enterprise Grade Remote Monitoring
      • PrimaView Features
      • PrimaView NimSoft Professional Services
    • Frequently Asked Questions
  • Who You Are
    • Growing Enterprise
    • Start-Up Company or Entrepreneur
    • Colocation or Cloud Computing Customer
    • Shared Hosting or Virtual Private Server User
    • Hosting or Managed Service Provider
    • IT Operations Manager
  • Why Choose ENKI
    • Comparing Cloud Options
    • Case Studies
      • Media Rights Management Company
      • Web Design and Hosting Company
      • Political Web Services Company
      • Media File Sharing Start-Up
      • Financial Services Company
      • Online Gaming Company
      • Internet Advertising Company
      • Hedge Fund
    • Key Benefits
    • Videos & Downloads
    • Buying from ENKI
    • Promotions
    • Testimonials
  • About ENKI
    • The Enki Way
    • Management
    • Partners
    • News
    • Investor Relations
    • Legal
    • Service Level Metrics
  • Enki Blog
Enki Blog

Managed Cloud Blog

  • Home
  • Feed
Tags >> Cloud Usage
Feb 22
2012

Why overallocation makes cloud computing services impossible to compare

Posted by: Eric Novikoff

Tagged in: Cloud Usage

Various recent user surveys and performance measurements done on cloud computing systems show great variability over time as well as between services, which affects both the perceived reliability of the system and the effective price paid for cloud computing.   The root cause of this performance variation is overallocation of resources.  This blog entry will explore what overallocation is and how to minimize its effects.

First of all, let's remember that cloud computing's biggest pluses - savings and scalability - come from the fact that customers are accessing shared resources.   Because resources are shared, any bottlenecks in the system will affect all users' performance when the system is stressed.   Studies done at the University of Adelaide showed that Amazon's performance can vary by up to 10x over the course of the day, possibly due to demands on the EBS system, or simply overtaxing the gigabit ethernet connections on the individual servers.

Let's look at how this affects pricing.  For example, let's assume that you are paying $0.10/hr for a cloud instance.  If you log in (linux) you can use the 'top' command to see the I/O latency, shown as iowait.   Perhaps it's 10%.  this would mean that your instance is spending 10% of its time waiting for the network, and effectively you are paying $0.11/hr for one promised unit of the instance's performance.   However at 3pm, when all the schoolkids come home and get on multiplayer games, the iowait now jumps to 90% (yes, our customers have measured this.)  Now, you are paying $1.00/hr for the instance's promised performance.  Perhaps you don't need that performance, but if you do, then you have to compensate by buying 9 more instances, and you will indeed be paying $1.00/hr.

The example above covered unintentional (or at least unplanned) network overallocation but the other three resources that cloud vendors sell - storage, CPU, and RAM memory - can also be overallocated.   Like network bandwidth, storage can be overallocated simply by placing too many demands on the centralized storage system or local disk (depending on the cloud architecture - see my blog on cloud architecture.)  The network and storage overallocations are hard for the vendor to address rapidly because they would require hardware changes.  However, depending on the hypervisor the vendor chooses, both CPU and RAM can  be intentionally overallocated for a variety of purposes, usually associated with reducing costs and/or reducing pricing.

Here's how this works.    If it supports overallocation, the hypervisor allows setting two allocation parameters, a "limit" or maximum size of the resource, and a "reservation" or minimum size.  When a new instance is allocated, it is given the reservation amount, and then as it needs more, its usage can grow up to the limit.    This allows many more instances to be allocated - actually over-allocated - on a server, than would actually fit based on the promised instance sizes (the limits).   Until the instance actually needs more resources, it operates the same way as if it were not overallocated.  But when it asks for more resources, the hypervisor has two choices: deny them (which can cause the software to crash or operate very slowly) or move the instance to another server that has the resources available.    Moving the instance (or "motioning" it as some vendors like to say) causes congestion on the network and uses up CPU resources, resulting in slower performance for instances on the respective machines.   And since both deciding that the move is necessary (generally monitoring a prolonged resource shortage) and doing the moving the takes time, there is a delay between when the resources are needed and when they are available, which may affect some applications, especially those that experience peaky loads.

Despite its shortcomings, overallocation is common practice among cloud providers, and is likely responsible for some of the extremely low prices that some providers offer.   Whether that overallocation is intentional or the result of overtaxed resource bottlenecks like networking or SAN controllers, performance can vary widely between cloud providers based on hardware design and overallocation policies, yet these policies affect performance in complex ways and are rarely if ever documented by the provider.

How do you avoid suffering the penalties of low performance and excessive cost per unit of usable resources that overallocation can cause?   If you choose a cloud provider based on price,  you will likely be suffering from overallocation, which requires that you set up an auto-scaling capability in  your software as well as the cloud management system, so that additional instances can be allocated automatically as the load exceeds the available resources, in order to keep performance constant.   While this may address performance issues, it will not solve the problem if cost per usable unit of compute.    It can also lead to some very complex architectures, which I have seen deployed in Amazon to get around its widely varying performance.  In the end you may not see any cost savings from such workarounds since they inevitably have a cost of implementation and operation.   

The alternative is to choose a provider that does not intentionally overallocate resources, and addresses performance bottlenecks aggressively.    

Comment (0)
May 24
2011

Want to get a big bang out of the Cloud? Don't think linearly!

Posted by: Eric Novikoff

Tagged in: Cloud Usage

One of the huge advantages of the cloud is its ability to cut the time to market for new ideas, creating more agile enterprises.   While rapid provisioning and access to a larger set of resources than you might otherwise have contributes to this, the biggest win comes from realizing that the cloud eliminates the need to think linearly.

Over the last 10 years or so, software development methodologies have evolved from a "waterfall" analogy where design was followed by development then by test, then by deployment (to put it simply) to an agile, concurrent model where software releases became much smaller and all of the activities above happened at the same time.  So instead of:

design->develop->test->deploy

organizations realized that they can be much more customer responsive and get more done by developing software as follows, in which their team is broken up into smaller groups that work on multiple releases at once, speeding up release times by 4 (in this example):

design4->develop4->test4->deploy4
develop3->test3->deploy3->design7
test2->deploy2->design6->develop6
deploy1->design5>develop5->test5

However, the cloud allows you to apply this same discovery to your deployments, not just your development cycle.  What does this mean?   Well, for most growing internet enterprises, deployment isn't static: as their code evolves, so does their deployment architecture.   

Let's look at a little example.  We have a customer who is bringing their internet service from beta through ramp-up to large volumes.   As they do so, their deployment architecture has changed from single-instance MySQL to multi-instance MySQL to Mongo+MySQL and finally Mongo-only.   Here's how their IT staff laid out their IT architecture plans:

prototype MySQL -> test MySQL -> production MySQL -> prototype MySQLmulti
->test MySQLmulti -> production MySQLmulti -> prototype mongo -> test Mongo
-> production Mongo

If there were any problems in any of the prototyping or test phases for their deployment architecture, they'd run the risk of falling behind their code or the demands of their customer base and having a site meltdown or constraining their business.   But they don't need to think this way.

Even with a small team, they can simply develop each of the deployment architectures concurrently by creating a separate cloud instance for each one and making progress as their teams' knowledge base evolves, rather than being constrained by limited amounts and configurations of hardware for hosting the deployments.

prototype MySQL -> test MySQL -> production MySQL

|-->prototype MySQLmulti ->test MySQLmulti -> production MySQLmulti

|-->prototype mongo -> test Mongo -> production Mongo

The organization has the option of trading off staffing for project schedule on the deployment phase, while avoiding any potential budget problems by using small cloud instances when needed to speed development of their deployments.   As a result, they no longer have to worry about being "taken by surprise" by a sudden hockey-stick in their usage that overwhelms their current-generation of deployment architecture.   And their IT operations staff no longer needs to worry about taking the blame for falling behind the market.

This applies equally well to testing new markets with mini-versions of new products, and whole host of other processes that used to be linear but can now be done in parallel, thanks the cloud.

Comment (0)
Apr 26
2011

PC World Learns SLAs Matter When Buying Cloud

Posted by: Eric Novikoff

Tagged in: Cloud Usage

PC World wrote about what your business can learn from the Amacon Cloud Outage, noting that you should examine the SLAs you get from your cloud provider as an indicator of the level of reliability of their cloud product, as well as looking at diversification and simply deciding what is mission critical in your company.

"And while you're negotiating those deals with one or more cloud providers, take a minute to examine your service level agreements (SLAs) with any provider. SLAs should set out how your providers are rewarded when things go right, and how you're compensated when things go wrong."

"Especially if you're working with a local service provider which is working with an Amazon, a Google, or another major public cloud infrastructure vendor, make sure those SLAs spell out who is responsible for what should things go awry. It's worth the extra time and effort early in the relationship to make sure those SLAs are clear, comprehensive and iron-clad."

"If something goes wrong, you don't want your business to languish offline while your vendors pass the buck for responsibility for the outage. This is the very definition of when you want one throat to choke, and you want to make sure it's clear to whom that throat belongs."

I'd like to offer a few summary thoughts that expand on the article based on our experience here at ENKI:

- For truly mission critical applications, going onsite is an expensive and unnecessary step... a return to a past most of us would rather put behind us. Instead, diversify your cloud deployments across different geographies.

- True fault-tolerant diversification requires at a minimum that your application be set up to maintain data currency across multiple deployments. You will want to look at the way databases work, files are stored, and how you will compensate for delays in replication whether you choose an active/active or active/standby DR solution.

- As the article points out, SLAs matter, but are you getting the right SLAs? The companies who suffered from this outage didn't get any disaster recovery/business continuance SLAs from Amazon and that should have been a red flag for them. To solve this problem, you will need to develop in-house IT expertise on DR/BC and dedicate resources to it, or choose a different cloud vendor or operations services provider that can do this for you.

- DR/BC is expensive because it requires some thought (i.e., labor) and duplicated hardware. This means it isn't a no-brainer and you'll want to assess exactly how much protection each of your applications needs and what your budget is to provide it.


Cloud is, unfortunately, not yet a panacea that offers enterprise-class reliability for free. Instead, it is a way to reduce the cost and headaches of managing your own hardware. You still need experienced IT staff to manage your deployment - whether they are in-house or outsourced.  ENKI's outsourced operations services are designed to help companies match their cloud deployment to their business needs and then managed that deployment to deliver the SLAs that are required.

Comment (0)
Feb 10
2011

RSA Conference study to reveal cloud frustration

Posted by: Eric Novikoff

Tagged in: Cloud Usage

"Overall, 73 percent of respondents said they need new skills to deal with cloud" claims a study interviewing 10,000 security professionals by Frost & Sullivan to be presented at the upcoming RSA conference.  

I think a big part of the confusion is the hiding of infrastructure design that cloud vendors engage in.    Part of the hiding is clearly to retain intellectual property ownership, but when you take into account discoveries like the recent one at Stonybrook that Rackspace cloud compute power doesn't scale with instance size (in fact, it's flat), the dark underbelly of the obfuscation comes into view, which is sowing FUD (fear, uncertainty, and doubt) among customers for competitive advantage.   There are cloud providers (ENKI included) that are transparent about infrastructure design and resource allocation, but they are in the minority and don't include "former bulk hosting companies" like Rackspace.  

For security to be effective, the security design cannot be based on assumptions about the infrastructure but rather an accurate understanding of what elements are involved.  There's no way to secure unknown infrastructure as a user or consultant - you have to rely on the cloud vendor's promises and guarantees, which in many cases are nebulous.

Comment (0)
Oct 29
2010

Is Amazon's Free Computing Really Free?

Posted by: Eric Novikoff

Tagged in: Cloud Usage

Amazon.com announced that it is giving some cloud computing away for free, presumably to get developers hooked on its service. 

What I've seen with countless startup/entrepreneur customers in our company's cloud services is that this kind of offer can be more of a curse than a boon for the startup:

  • The successful startups inevitably need far more resources that a fraction of a physical core which the Amazon offer of a micro-instance provides, so this offer really wouldn't have any effect on the finances of a startup.  If the startup used this offer as a criteria to choose a provider, they would be ignoring the factors that correlate with successful deployments.
  • The startup customers of ours that scaled successfully understood how to architect their systems for growth and institute processes that allowed them to provide a reliable service, such as change control, deployment management, incident response, resource management, and performance management. These skills came from staff members who were not developers, so creating an offer to appeal to developers works against the long-term success of a startup by placing operations in the hands of people without the skill-set and motivations that result in successful production IT management. Until self-service managed cloud offerings appear that eliminate the needs for these skill-sets and experience, they are still required to scale successfully, or they have to be outsourced.
  • By limiting the offer with restrictions in so many dimensions (due to Amazon's rather complex offering), this "free" offer places the burden on the user to compute how to stay within these bounds (actually, an impossible task) which ends up "rewarding" the successful startup with large and unexpected bills.   I've seen people at countless Cloud Camps complaining about this "surprise bill" effect.  To some extent, this is inevitable with cloud computing and is even a benefit in the sense that customers can take advantage of scalability (if they design for it and enable it) but the "freeness" of the offer is fundamentally misleading. The alternative for the user would be to resolve to stay within the bounds of the "free" instance, which throws away the benefits of cloud computing over say, colocation (not to mention that you can get a lot more horsepower from a colocated server per dollar despite its other limitations.)
  • The aim of this offer - to lock customers in with a free tidbit - works out in Amazon's favor but some of our most successful startup customers (some of whom came to us from Amazon) ended up deploying architectures that were not compatible with Amazon's architectural restrictions and performance limitations.  In other words, vendor lock-in wouldn't or didn't serve them. One very successful customer of ours is currently spending $1M/mo with Amazon and desperately struggling with how to move out of its dependency on Amazon.

None of my points are meant to say Amazon is a bad choice since that's clearly not the case, but the right answer to what cloud provider to choose for a company is "it depends" and what it depends on shouldn't be reduced to the price of one free micro-instance.

What ENKI would like to offer instead of a "free micro-instance" is our SSP (Startup Success Program) which offers up to 50% discounts on the first 32 cores (equivalent to 96 of Amazon's ECUs) as well as other benefits, in exchange for allowing us to assist you in managing your services so that they are reliable, cost-effective, and appropriately performing for your target market, as well as participating in joint marketing which can benefit your brand.   If you are interested, please contact us.

Comment (0)
May 05
2010

Why Cloud hasn't made "going down" a thing of the past

Posted by: Eric Novikoff

Tagged in: Cloud Usage

Yesterday I read about how Foursquare had major downtime.   They're a customer of Amazon AWS's cloud, yet they had downtime comparable to the conspicuous disaster in 2007 at 365 Main that took down non-cloud-hosted Craiglist, Yelp, Technorati and SixApart for up to day.   What happened?  Wasn't cloud computing supposed to solve this problem?

in this case, there was apparently a problem in the AWS datacenter that hosts Foursquare's services.   However, as in the case in 2007, Amazon's downtime was far less than that of Foursquare.   This brings up two myths of cloud computing: 1) Cloud computing never goes down; and 2) Your site will never go down if it's hosted on cloud computing.   Both are terribly and dangerously false.

Hardware will always fail, suffer from misconfigurations, or otherwise be the victim of human failures, so from the user's point of view cloud computing will never reach 100% availability.   In fact, when Amazon says that it's cloud is reliable, they speak of their entire cloud, not a particular end-customer's application hosting.   The key is what your cloud computing provider does when the hardware fails.  Amazon does very little, simply offering you the ability to restart your software on another machine.  Others - such as us - will restart it for you.  How quickly the restart happens, and how much demand on your software that it presents depends on the cloud you choose.

But that's only the beginning of true application reliability. As we saw with Foursquare yesterday and the 365 Main event in 2007, if your software isn't written and deployed in a way to be tolerant of failure, the time needed to bring it back up can be a major disaster for your business.  This preparation for downtime is called DR, or disaster recovery, and people realized that the 365 Main event was highlighting that many companies haven't given it much thought.  DR can start with as simple a preparation as writing your software to be resistant to database corruption caused by downtime or perhaps adding monitoring that shows whether it's working properly, or it can extend all the way to having a live "warm" standby site that can take over if your primary site fails.  Cloud computing can make these options possible or affordable, but it does not guarantee them.  So simply placing your site in a cloud doesn't guarantee uptime, even if it does put it in a datacenter that is professionally managed.

Every day, as i meet potential customers, read advertisements from other cloud companies, or catch up on cloud computing blogs, I've been making a mental list of the "myths of cloud computing" that I've been hearing from them.  These myths are dangerous: they produce a mismatch of expectations between cloud customers and vendors that can injure everyone - especially the cloud user who expected that their cloud-hosted web site would produce far more professional results than it actually does.

A partial list of the myths I plan to explore are:

  1. Cloud computing never goes down (I'll follow up on this article in more depth and make it part of my Cloud 101 class.)
  2. Computing resources in the cloud are infinite
  3. Cloud computing is nearly free
  4. Cloud computing will make my software compliant with regulations and certifications
  5. if you have any more you'd like me to discuss, I'd love to hear from you.

Like any myths, there is a kernel of partial truth to these assertions, which is the reason cloud computing is so attractive.  But how much of these benefits you actually get depends both on you and your cloud vendor.

Comment (0)
Oct 08
2009

Moving your Development and Test to the Cloud

Posted by: Eric Novikoff

Tagged in: Cloud Usage

I got an email yesterday from some cloud vendors offering a seminar on "Why you should move dev and test to the cloud."  The two vendors specifically suggested moving them to the cloud even if you weren't deployed to the cloud. For our cloud customers, I have been advocating this for quite a while, but I would never suggest moving test, and to a lesser degree, dev to the cloud if your deployment platform wasn't the cloud.  My disagreement is based on the importance that I put on keeping your live site or other service up and running.

The real recommendation I have, based on our experience supporting our customers, is that your dev and test environments should be as much like production as possible (with any obvious required differences such as different levels of access control.)   So, if you're deployed to cloud, you should test in an identical cloud environment.  If in colo, you should test in an identical colo environment.  That way, any environment-dependent problems will surface during test, instead of during deployment.  And we find that more than 50% of the failed deployments that our customers have are the result of having a difference in their environments, whether it's a different PHP config file, different memory size, or different set of libraries on the systems.    This applies to development as well.

Yes, the cloud offers some great advantages for "disposable infrastructure," letting you set up and destroy test or dev environments at your leisure.  But if their use leads to a bad deployment and downtime, any advantages of cloud dev or test are lost.    The nice thing about our AppLogic-based cloud or PrimaCloud is that you can copy an entire virtual datacenter from your production environment and make it your test or dev environment very easily.

All of this ease of creating and destroying infrastructure doesn't eliminate the need for good process.  Do you have configuration management and version control tools in place?   Do you have a process with approvals for releasing from dev to test?   How about from test to your production environment.  How do you decide how much testing is enough, and of what type?  What process is in place if you discover critical bugs after release? Can you roll back your production environment easily?   The cloud facilitates answering these questions, but as you can surmise, it doesn't answer them!  

Comment (0)
Share to Facebook Share to Twitter Stumble It Share to Reddit Share to Delicious Share to Google Buzz 
Social Widgets Ultimate Edition - Copyright © 2010 by Turnkeye.com

Free Cloud Buyer's Guide

Our informative guide is full of best practices to help you choose the right Cloud vendor for your business and to make your cloud application deployment successful.

Download Now

Latest Blog Entries

  • Going beyond compliance: achieving true security in the Cloud
  • The Straight Dope About Cloud Downtime and the Myth of Perfection
  • The two basic types of cloud architecture
  • Why overallocation makes cloud computing services impossible to compare
  • Does Cloud Computing Drive Vendor Lock-in?
  • Is Amazon "all that?"
  • Report From VMWorld: is the cloud industry getting ahead of itself?
  • Is Cloud Hype Beneficial?
Business Strategy Case Studies Cloud 101 Cloud Industry Cloud Usage Commentary ENKI Information Events First Person Infrastructure News Philosophy Pricing Techniques Technology

Blog Archive

  • March 2012(2)
  • February 2012(2)
  • January 2012(1)
  • September 2011(2)
  • August 2011(2)
  • May 2011(3)
  • April 2011(4)
  • March 2011(1)
  • February 2011(2)
  • January 2011(5)
  • October 2010(1)
  • September 2010(5)
  • August 2010(2)
  • June 2010(1)
  • May 2010(1)
  • April 2010(1)
  • March 2010(1)
  • February 2010(1)
  • January 2010(1)
  • October 2009(2)
  • September 2009(7)
  • August 2009(3)
  • July 2009(3)
  • June 2009(6)
  • May 2009(2)
  • April 2009(4)
  • March 2009(2)
  • February 2009(1)
  • January 2009(1)
  • November 2008(1)
  • October 2008(2)
  • August 2008(4)
  • July 2008(2)
  • June 2008(1)
  • May 2008(1)
  • April 2008(1)
  • February 2008(3)
  • January 2008(3)
  • December 2007(2)
  • November 2007(1)
  • September 2007(1)
  • August 2007(3)
  • June 2007(1)
  • May 2007(1)
  • March 2007(1)
  • February 2007(4)
  • January 2007(3)
OVERVIEW
  • About PrimaCloud
  • About PrimaCare
  • Key Benefits
  • Comparing Cloud Options
HELP CENTER
  • Frequently Asked Questions
  • Contact Us For Support
  • Terms and Conditions
SELF SERVICE PORTALS
  • PrimaCloud
  • Monitoring
  • Customer Portal
  • Discount Domains & Certificates
Follow @enkicloud
LOGO_CoFounderWebsite
Copyright © 2011 ENKI LLC