Contact Us | Request Support | Monitoring Portal | Customer Portal | *

1-650-964-9100

  • Home
  • What is Cloud Computing?
  • Services
    • PrimaCloud Enterprise Cloud Computing
      • Features & Benefits
      • Component Services
      • Virtual Private Data Centers
      • Performance
      • Reliability
      • Security
    • PrimaSys Managed Private Cloud Deployments
      • Choosing Private Cloud
      • Implementation
      • PrimaSys Case Studies
    • PrimaCare Operations-as-a-Service
      • OaaS Detailed Description
      • OaaS Plan Comparison
      • Professional Services
      • Highly Available Cloud Cpanel
    • PrimaView Enterprise Grade Remote Monitoring
      • PrimaView Features
      • PrimaView NimSoft Professional Services
    • Frequently Asked Questions
  • Who You Are
    • Growing Enterprise
    • Start-Up Company or Entrepreneur
    • Colocation or Cloud Computing Customer
    • Shared Hosting or Virtual Private Server User
    • Hosting or Managed Service Provider
    • IT Operations Manager
  • Why Choose ENKI
    • Comparing Cloud Options
    • Case Studies
      • Media Rights Management Company
      • Web Design and Hosting Company
      • Political Web Services Company
      • Media File Sharing Start-Up
      • Financial Services Company
      • Online Gaming Company
      • Internet Advertising Company
      • Hedge Fund
    • Key Benefits
    • Videos & Downloads
    • Buying from ENKI
    • Promotions
    • Testimonials
  • About ENKI
    • The Enki Way
    • Management
    • Partners
    • News
    • Investor Relations
    • Legal
    • Service Level Metrics
  • Enki Blog
Enki Blog

Managed Cloud Blog

  • Home
  • Feed
Tags >> Business Strategy
Apr 27
2011

How Amazon's Cloud Failure Shows The Way For Future Use Of The Cloud

Posted by: Eric Novikoff

Tagged in: Business Strategy

In my last blog article, I wrote about what had happened to cause so many customers at Amazon's EC2 service to go down last week.   Now, I'd like to share my observations and learnings about this event gleaned from running our own cloud computing service and practice.

I have written extensively in the past about how cloud services have not progressed to the point where companies can do without IT operations, despite the apparent ease of starting an instance in the cloud.   Unfortunately, this observation is at the root of the problems that Amazon's customers experienced last week.

Let's look at how cloud computing has turned traditional SaaS provisioning upside-down in the last few years.   Back when I worked at NetSuite, and at other companies before that, once an application was completed, a deployment organization was built to provide it reliably and capably to customers.   While the pyramid on the left below is a bit exaggerated, a mature organization providing a reliable SaaS offering would grow its customer base over time by expanding the in-house deployment (IT) group which managed its systems, and by buying and maintaining more and more hardware in expensive purpose-built datacenters.  This provided good to excellent reliability because the company built its IT group with expensive hard-to-find (especially during the dot com boom) experienced IT experts.

DeploymentOrgs

However, today a software startup or even a growing SaaS company can "forget" about worrying much about IT since it can be managed at the click of a mouse.   The eye-hand-mind cycle of selecting a server instance, clicking to get it started, and being rewarded with an apparently perfectly functioning system leads the user to think everything is perfect; that all eventualities have been handled.  Especially if that hand is that of a developer or other staff not experienced in production IT concerns.  As a result, the company's management allows resources to go into developing the application instead of the IT infrastructure.  So far, so good.

The problems with this approach appear when management thinks that Cloud infrastructure is its IT strategy and eliminates the "Deployment Strategy" layer of the traditional IT stack.  After all, why keep all those people around when you can just tell Amazon what to do on the web?  This has resulted in the deployment organization you see on the right, which looks really good to the developer-founders and their investors, since more resources go to developing the app and everyone saves money and keeps equity compared to doing things the traditional way.

Of course, cloud vendors have not been remiss in taking up this value proposition and selling it back to entrepreneurs, who in turn think that it is an IT strategy, when in fact it is merely a cloud sales strategy.  Cloud computing has literally turned IT "upside down", but as we saw last week, many of Amazon's customers didn't realize what the consequences could be.  WIthout anyone in their organizations thinking about application deployment management, they became vulnerable to failures, performance problems, and downtime due to release management issues.

The solution is simple - isn't it?   Just go back to the way things were (left pyramid) but replace the infrastructure with cloud.  In fact, this is what the more successful users of cloud - who of course you don't hear about since they haven't experienced disastrous failures - have done.  However, in talking to prospects and customers, this hybrid approach doesn't get them the savings they expected from cloud computing.   They may save 10%-30% on infrastructure, and they definitely reduce their time to market, but they still have a heavy burden of IT operations, resulting in net savings in the 10%-20% range that leaves them disappointed with cloud economics.

Instead what is wanting here is a more 21st century approach to building an IT strategy for your SaaS company, similar to the hardware-eliminating infrastructure cloud revolution.  This comes down to two using two approaches, which can be combined: outsourcing, and automation.

For the smaller company or entrepreneur who still needs access to the level of IT competence that will ensure successful deployments, outsourcing is a godsend.  There are an increasing number of break-fix operations services companies that will manage your applications in the cloud for you, though the number who will actually advise you and fully manage the deployment and ongoing maintenance are fewer.  This is why we introduced our PrimaCare service to complement our own cloud services, which is essentially an insurance policy on your deployment to keep it from failing and make it successful.  By outsourcing IT on a per-deployment/per-VM basis, IT consts can be controlled through the same on-demand and pay-per-use methods that made cloud computing successful.

The other tool available to the SaaS provider is automation.  Amazon's Cloud Formation is great, but doesn't automate business processes associated with managing the cloud.    While commercial solutions like RightScale or open-source tools like Puppet and Chef can help, there is as yet no fully automated cloud management workflow engine, though the up-and-coming Kaavo Imod system is very promising.  However, the first order of business is to determine what processes you use for code updates, platform updates, security management, backup, disaster recovery, etc.   While some of these processes look like something that might come from your cloud vendor, unless you determine what you need and then assess what the vendor provides, you will never know the holes that need to be plugged.  An operations services vendor like ENKI can assist you with professional services to guide you through process discovery for your critical processes and the best way to automate them.

The resulting deployment pyramid is a hybrid of the old-school and Web 2.0 models, in which the middle layer - focused on operations services - has been restored but modified to complement the cloud delivery model and deliver the capability and maturity model (CMM) that SaaS customers have come to expect from their providers:

FutureDeploymentOrgs

If it sounds like I'm leaning heavily on outsourced services, it is because in general I've found that rapidly growing entrepreneurial companies don't have the resources to solve these problems on their own, and the recent Amazon meltdown underscores that observation.   While it's nice to be able to solve every problem yourself, it's clearly better to get them solved sooner than you would waiting until you've grown enough (or had enough bad experiences) to do so yourself.  If your organization can leverage the skills, processes, and staff of a mature IT organization, on-demand, why develop it yourself, especially if the costs are lower.   Outsourced operations services are considerably more expensive than not having any operations services in your deployment model - but quite a bit less expensive that going it alone or suffering a repeating series of downtime events.

These observations are confirmed by the reducation in downtime we have seen ENKI's operations services customers experience, typically adding another half-nine or more of reliablity to their deployments.

Comment (0)
Apr 26
2011

Did the Failure at Amazon Reveal a Problem with Cloud Computing?

Posted by: Eric Novikoff

Tagged in: Business Strategy

It's been a few days now since Amazon's Virginia datacenter failed and took down hundreds of internet companies, including serveral high-visibility startup companies and platform-as-a-service providers.  There has been a lot of talk about how this failure, which "was never supposed to happen" is a blow to cloud computing.  However, I think it's more of a blow to the freewheeling marketing practices for cloud computing, and to those consumers of cloud who thought that it eliminated their responsibility to worry about infrastructure reliability.   These words are a bit harsh, I know, but they reflect the reality that many of Amazon's customers have been remiss in managing their IT responsibilities.   The silver lining to this event is that I think it will bring about changes in how cloud is deployed and more importantly used, that will continue to drive its successful adoption.

To review what we know about the event, Amazon's storage subsystem encountered a failure (which I surmise was due to a code update) that caused it to replicate customer's data, causing storage slowdowns, until the storage system was full which then caused complete failure.  Somehow this failure crossed multiple "availability zones" which were not supposed to all fail at the same time.   With no storage accessible to them, applications failed.  Even as late as today, some companies were saying that they were still missing data.

How could this happen?  Well from an infrastructure point of view, the problem is clearly due to monoculture.  Monoculture is a term used in agriculture to describe planting your entire farm with the same crop.  If an insect that likes that crop attacks your farm, you lose your whole crop.   Similarly, Amazon, in order to get good economy of scale and minimize manual labor, has filled their datacenter with the same hardware, hooked up the same way, loaded with the same software, replicated over and over.   Introduce a software bug into a release of their infrastructure management systems, and the automated distribution of the release will spread to the entire datacenter - or wider, depending on maintenance policies. 

However, this does not explain why so many of Amazon's customers were taken by surprise by the failure.  From their expressions of surprise, it's pretty clear that many assumed that Amazon's assurances that its infrastructure was reliable meant that they wouldn't have any problems.  However, from the first day that Amazon offered their service, they were quite clear that while their service was likely to remain up almost all the time, individual servers or even availability zones were not guaranteed to be that reliable, and that Amazon expected its customers to engineer around this limitation.  Amazon's weak uptime guarantee of three and a half nines backed by a small percentage discount that users could apply for at the end of the year communicates even more clearly that they had no plans in place to keep individual servers running at high availability, nor did they plan to shoulder heavy financial responsibility for long outages.   Even the introduction of automated failover features did little to improve this, since their extremely long provisioning times and questionable availability of resources at peak times meant that customers could experience noticeable failures.

Of course, nobody expected multiple availability zones to fail at once, so even if their applications had failover to another zone in the datacenter, they expected to enjoy high uptime.   But they should have expected a full-datacenter failure: there are many modes of systems failure that take out an entire state-of-the-art datacenter, including accidental destruction of fiber lines to the datacenter, systemic power failures such as the one a few years ago in the generators at 365 Main in San Francisco that took out a raft of popular startups, or a more recent Amazon full-datacenter failure due to a car crashing into a power line.  I think the message here is that modern data centers are reliable, but not infinitely reliable.  And similarly, while Amazon's services simplify the access to virtual infrastructure, that ease of access masks the fact that it is built out of physical infrastructure that has definite vulnerabilities which have not been automated away as completely as the hassle of setting up your own physical server.

Puzzlingly, it seems that this simple understanding did not reach the management teams at the companies that were suffering the most from last week's outage.   This could be because the marketing of cloud seems to follow an unwritten rule that the drawbacks and vulnerabilities will not be discussed by vendors, who prefer to emphasize the low prices of cloud computing - which paradoxically are enabled by ignoring vulnerabilities that experienced IT shops would not let pass.   When pressed, vendors often say that they won't release proprietary information for security purposes or to protect their intellectual property.  However, experienced IT folk will know that variying degrees of these limitations are a basic feature of any infrastructure deployment, and must be engineered around at the deployment and application software layers.  So the inescapable conclusion is that the management teams at many of Amazon's customers either chose to ignore the problem or felt that the costs of mitigating it were not worth the potentially small improvement in reliability.  In other words, there was a gap between what Amazon could reasonably provide in the way of reliability, and what these companies needed, that was not filled by them, either intentionally or unintentionally.    In my next blog, I will explore how this has come to be, and how this problem will be solved by a combination of changes in both cloud customers and cloud vendors.

Comment (0)
Jan 09
2011

The virtual physical server problem and why it's so expensive for cloud customers

Posted by: Eric Novikoff

Tagged in: Business Strategy

In my experience doing sales for ENKI, I've begun to form an idea of why companies are not seeing the savings from virtualization and cloud computing that they have been expecting.  I call it the "virtual physical server problem" because I think they're treating their virtual servers like physical ones, from specifying them all the way through managing them.

It all begins with specifying the server.  I find that most of my prospects don't know how much computing they really need.  But they think they do ... and the conversation goes something like this:

Me: "How much horsepower do you need?"

Prospect: "Oh, I'm running on an 8-core, 16GB server and it works well, so I really need 8 cores and 16GB."

Me: "Have you done any monitoring or used your systems performance tools to check that?

Prospect: "No, everything is working fine, so why should I do that?"

Since the industry average even for virtualized servers is less than 30% utilization, this prospect will end up spending three time what he needs to in the cloud.   Even considering that cloud is often 50% more expensive than leasing a server (in part because it delivers other benefits like scalability and reliability), this prospect is throwing away a potential 55% savings by specifying his cloud deployment to look like his physical deployment.

The next place that the virtual physical server problem costs organizations is in deployment.   On physical servers, applications are designed avoid hitting the resource limits of the server and crashing, and scaling is a arduous affair in which reserve servers must always be running, since they can't be allocated on demand.  Surprisingly, prospects often order the same amount of "extra" servers for the cloud as they would in co-location.   If, for example, an application only needed 10x the base server count for 10% of the time, the deployment would cost over ten times what would actually be necessary if the scalability of the cloud were appropriately used.   With PrimaCloud, it is even possible to scale on end-user SLA compliance rather than the common CPU utilization, allowing you only to pay for what you need to meet your SLAs.

The last place that the virtual physical server problem impacts cost is in management.    This is the single largest cost factor that prevents cloud from delivering savings, and it's invisible to most consumers of cloud, because they're locked into physical virtual thinking.  Companies that move to the cloud from colo - or even move from one cloud to another - all end up hiring administrators for their applications and "cloud datacenters" which cost about the same as managing physical servers.    Let's face it, most applications don't need a true 24x7 Ops team, which can often consist of as many as 6 or more people if you want to avoid burnout.  Instead, they just need someone to be there when there is a problem or a code push.   These companies have virtualized their computers, but not their IT, and so the bulk of their costs are still there.  I can't tell you how many times I've spoken to a prospect who ended up going with Amazon and heard that they didn't save any money.

The solution is to treat your cloud deployment as Virtual IT - everything is in the cloud, and everything scales as you need it: servers, IT operations, costs.

Comment (0)
Aug 29
2010

Cloud Superman or Business Superstar?

Posted by: Eric Novikoff

Tagged in: Business Strategy

As we head into VMWorld week, the focus of my attention is going to be on all the cool new cloud technology that will shown off there, and what our competition has come up with.   However, I've been talking to a number of prospects for our managed cloud computing services in the last week, and it struck me that none of them asked me anything about all the wonderful technology ENKI deploys into its PrimaCloud service.   How can this be: the world is drowning in hype about cloud computing, and our next slate of customer partners don't care about cloud at all?   (Of course, they DO care about what the technology can give them!)

At the same time, I was interviewing a potential VP of Bus Dev for ENKI who asked me to characterize our customers.   This made me realize that the central question for our prospects is:

Do you want to be a Cloud Superman or a Business Superstar?

Our customers want to be business superstars. Most of them view the necessary evil of operations as something they haven't been able to get away from until they found ENKI, and that if they had their choice, they'd be focused on imagining the best way to grow their business 10x in the next year, not what the infrastructure needs to be to accomplish that, or how much knowledge of cloud computing they have amassed in the process.  They know that their total cost of operations includes the consequences of all the mistakes one can make in managing software deployments in cloud or colo infrastructure.  Often, they are experienced executives who have already made those mistakes, much like we did at startups and enterprises many years ago, and are not eager to repeat them.

We have a new customer, Krome Photos, who has an amazing outsourced photo-retouching/editing service that you can use to get what you previously needed a professional photographer to give you - but with your own pictures, and at a fraction of the cost.   He gave us his entire operations responsibility, and we've expanded his infrastructure to keep up with his ever-accelerating growth.  Now, he's preparing to charge for his beta product - almost unheard of in the Web2.0 world.   I think of him as a Business Superstar!

To be sure, we get a lot of contacts from people who are worried about a 5-cent difference in CPU-per-hour cost, or whether we use Sun or NetApp storage systems, or whether they might lose some control over their production environment if they can't physically touch the servers or log in to the IPMI interface.   Often those people don't become our customers or are impossible for us to provide our exceptional services to, since they aren't asking us to help them with what we do best, which is to take responsibility for their production IT deployments.

For example, we have one customer who thought his business was "too large" for a smaller cloud company like ENKI.   But he wanted to try us out anyway to compare us to Amazon.   So, he put his database in our cloud and his business logic layer into Amazon.  The result is that if there is network congestion or Amazon suffers one of its regular "5pm doldrums", his application goes down.   And he points the finger at ENKI.  Unfortunately, as much as we'd like to, there is little we can do to save him from this situation because it's an architectural nightmare.  He gets to be a Cloud Superman, but his business suffers.   And he's not our biggest customer, but our smallest!

So, are you a cloud superman, or a business superstar?   Drop me a line with our comment form below!


If you happen to be at VMWorld, come visit us a breakfast seminar we're putting on with one of our vendor-partners, Voltaire, which makes outstanding hardware as well as truly understanding cloud infrastructure.  We'll be discussing "Designing Managed Clouds for Growth" on Wednesday September 1 at the Marriot next door to VMWare.   Register at www.voltaire.com/vmworld-seminar

Comment (0)
Share to Facebook Share to Twitter Stumble It Share to Reddit Share to Delicious Share to Google Buzz 
Social Widgets Ultimate Edition - Copyright © 2010 by Turnkeye.com

Free Cloud Buyer's Guide

Our informative guide is full of best practices to help you choose the right Cloud vendor for your business and to make your cloud application deployment successful.

Download Now

Latest Blog Entries

  • Going beyond compliance: achieving true security in the Cloud
  • The Straight Dope About Cloud Downtime and the Myth of Perfection
  • The two basic types of cloud architecture
  • Why overallocation makes cloud computing services impossible to compare
  • Does Cloud Computing Drive Vendor Lock-in?
  • Is Amazon "all that?"
  • Report From VMWorld: is the cloud industry getting ahead of itself?
  • Is Cloud Hype Beneficial?
Business Strategy Case Studies Cloud 101 Cloud Industry Cloud Usage Commentary ENKI Information Events First Person Infrastructure News Philosophy Pricing Techniques Technology

Blog Archive

  • March 2012(2)
  • February 2012(2)
  • January 2012(1)
  • September 2011(2)
  • August 2011(2)
  • May 2011(3)
  • April 2011(4)
  • March 2011(1)
  • February 2011(2)
  • January 2011(5)
  • October 2010(1)
  • September 2010(5)
  • August 2010(2)
  • June 2010(1)
  • May 2010(1)
  • April 2010(1)
  • March 2010(1)
  • February 2010(1)
  • January 2010(1)
  • October 2009(2)
  • September 2009(7)
  • August 2009(3)
  • July 2009(3)
  • June 2009(6)
  • May 2009(2)
  • April 2009(4)
  • March 2009(2)
  • February 2009(1)
  • January 2009(1)
  • November 2008(1)
  • October 2008(2)
  • August 2008(4)
  • July 2008(2)
  • June 2008(1)
  • May 2008(1)
  • April 2008(1)
  • February 2008(3)
  • January 2008(3)
  • December 2007(2)
  • November 2007(1)
  • September 2007(1)
  • August 2007(3)
  • June 2007(1)
  • May 2007(1)
  • March 2007(1)
  • February 2007(4)
  • January 2007(3)
OVERVIEW
  • About PrimaCloud
  • About PrimaCare
  • Key Benefits
  • Comparing Cloud Options
HELP CENTER
  • Frequently Asked Questions
  • Contact Us For Support
  • Terms and Conditions
SELF SERVICE PORTALS
  • PrimaCloud
  • Monitoring
  • Customer Portal
  • Discount Domains & Certificates
Follow @enkicloud
LOGO_CoFounderWebsite
Copyright © 2011 ENKI LLC