Contact Us | Request Support | Monitoring Portal | Customer Portal | *

1-650-964-9100

  • Home
  • What is Cloud Computing?
  • Services
    • PrimaCloud Enterprise Cloud Computing
      • Features & Benefits
      • Component Services
      • Virtual Private Data Centers
      • Performance
      • Reliability
      • Security
    • PrimaSys Managed Private Cloud Deployments
      • Choosing Private Cloud
      • Implementation
      • PrimaSys Case Studies
    • PrimaCare Operations-as-a-Service
      • OaaS Detailed Description
      • OaaS Plan Comparison
      • Professional Services
      • Highly Available Cloud Cpanel
    • PrimaView Enterprise Grade Remote Monitoring
      • PrimaView Features
      • PrimaView NimSoft Professional Services
    • Frequently Asked Questions
  • Who You Are
    • Growing Enterprise
    • Start-Up Company or Entrepreneur
    • Colocation or Cloud Computing Customer
    • Shared Hosting or Virtual Private Server User
    • Hosting or Managed Service Provider
    • IT Operations Manager
  • Why Choose ENKI
    • Comparing Cloud Options
    • Case Studies
      • Media Rights Management Company
      • Web Design and Hosting Company
      • Political Web Services Company
      • Media File Sharing Start-Up
      • Financial Services Company
      • Online Gaming Company
      • Internet Advertising Company
      • Hedge Fund
    • Key Benefits
    • Videos & Downloads
    • Buying from ENKI
    • Promotions
    • Testimonials
  • About ENKI
    • The Enki Way
    • Management
    • Partners
    • News
    • Investor Relations
    • Legal
    • Service Level Metrics
  • Enki Blog
Enki Blog

Managed Cloud Blog

  • Home
  • Feed
Apr 28
2011

The Cloud Ecosystem's Conspiracy of Silence

Posted by: Eric Novikoff

Tagged in: Commentary

After last week's meltdown at Amazon, a lot of people (including me) are talking about what needs to change in cloud computing to provide users with a greater degree of confidence in cloud and the vendors that provide it.   So far, I have focused on the customers of cloud as having a great deal of influence over the levels of service they experience, since ultimately they have the power whether it is in how they use the service or which provider they choose.  The more informed they are, the better the cloud ecosystem will perform.

However, there is a major problem with this approach (nothing's simple, right?)  Customers can relatively easily inform themselves (or hire employees, consultants, or professional services) to help them make the best use of cloud services.  However, what they cannot easily do is find out all the limitations, gotchas, constraints, tradeoffs, and misrepresentations that cloud vendors suffer from, much like the colo and hosting vendors have for over a decade.   For example, Amazon is essentially a "black box" so even a very IT-literate customer can't effectively engineer around their limitations.   And Amazon has a very supportable position in not releasing all its secrets to the world for fear of losing its competitive edge.  This obfuscation in the name of protection of intellectual property and market position goes well beyond Amazon, however.  Based on my experiences, every cloud vendor and managed services provider also is part of this conspiracy of silence.  And amazingly, what I hear from investors and analysts about problems with various cloud technologies is not published on the internet or spoken of in conferences.  There are many reasons for this, not the least of which are gag clauses that equipment and software vendors write into their contracts.

For example, here at ENKI we use a number of branded products and technologies to provide our services.    The sales/service/evaluation contracts we signed with these vendors specifically prevent us from sharing lists of bugs, performance analyses, or other damaging information about their products with the public.   Again, it's quite reasonable that vendors we do business with be protected from incorrect or malicious information or misinformation about their products.  However, if their products have serious flaws - something that is going to be universally true about anything that is new or changing rapidly - then our customers have a right to know the risks they are taking with using those products in our services.  But we have no way to let them know except to call them individually.   Our only choice is to stop or avoid using the vendor's product, which is something we have done a few times in our history.  We can't even announce why we are discontinuing the use of the product, according to most of our contracts.

Every day, I see ENKI's competitors touting this or that product or technology, many of which we have already discarded as fatally flawed from the perspective of reliability, security, or usability, yet we can not have an open discussion about them on a public forum (though we're happy to do so individually!)   And I think about the hapless customers signing up to use their services, only to face business-critical limitations later.   Just today I got three marketing emails in my inbox from competitors using software systems to provide cloud services that suffer from horrific bugs.

While all this secrecy is understandable, and for the moment legally correct and enforceable, it is a disservice to the cloud-using and cloud-selling community.   You can't really choose a technology based on only its positive features!  Let's face it: every piece of software and hardware has flaws, but the ones that persist still offer enough value to keep people using them.  This isn't just true of cloud, but many IT products, in particular large, expensive software systems (which I'll also have to let remain unnamed) suffer from long-running outstanding bugs and terrible service.   I only see two ways out of this dilemma: either something blows up as it did last week, or software/cloud/hardware vendors permit and encourage a more open dialog as many Web2.0 companies have bravely begun doing. 

What can you, the cloud/IT/software-buying public do about it?  Not a lot, but you can start by letting go of the expectation of perfection, which drives vendors to try to hide bugs and problems.   An easy way to do this is to look at your vendor from a relationship point of view: when the inevitable problems crop up, are they willing and capable of responding?    This is no more - and no less - than you'd expect from your own IT department, right?

All this secrecy puts us at ENKI in an uncomfortable bind, since our corporate values are based on openness and transparency as a means of honoring our customers.   So if you have questions about a cloud technology please contact me.  I'll be happy to share what I know.  But please, don't expect me to sign my name to it!

Comment (0)
Apr 27
2011

How Amazon's Cloud Failure Shows The Way For Future Use Of The Cloud

Posted by: Eric Novikoff

Tagged in: Business Strategy

In my last blog article, I wrote about what had happened to cause so many customers at Amazon's EC2 service to go down last week.   Now, I'd like to share my observations and learnings about this event gleaned from running our own cloud computing service and practice.

I have written extensively in the past about how cloud services have not progressed to the point where companies can do without IT operations, despite the apparent ease of starting an instance in the cloud.   Unfortunately, this observation is at the root of the problems that Amazon's customers experienced last week.

Let's look at how cloud computing has turned traditional SaaS provisioning upside-down in the last few years.   Back when I worked at NetSuite, and at other companies before that, once an application was completed, a deployment organization was built to provide it reliably and capably to customers.   While the pyramid on the left below is a bit exaggerated, a mature organization providing a reliable SaaS offering would grow its customer base over time by expanding the in-house deployment (IT) group which managed its systems, and by buying and maintaining more and more hardware in expensive purpose-built datacenters.  This provided good to excellent reliability because the company built its IT group with expensive hard-to-find (especially during the dot com boom) experienced IT experts.

DeploymentOrgs

However, today a software startup or even a growing SaaS company can "forget" about worrying much about IT since it can be managed at the click of a mouse.   The eye-hand-mind cycle of selecting a server instance, clicking to get it started, and being rewarded with an apparently perfectly functioning system leads the user to think everything is perfect; that all eventualities have been handled.  Especially if that hand is that of a developer or other staff not experienced in production IT concerns.  As a result, the company's management allows resources to go into developing the application instead of the IT infrastructure.  So far, so good.

The problems with this approach appear when management thinks that Cloud infrastructure is its IT strategy and eliminates the "Deployment Strategy" layer of the traditional IT stack.  After all, why keep all those people around when you can just tell Amazon what to do on the web?  This has resulted in the deployment organization you see on the right, which looks really good to the developer-founders and their investors, since more resources go to developing the app and everyone saves money and keeps equity compared to doing things the traditional way.

Of course, cloud vendors have not been remiss in taking up this value proposition and selling it back to entrepreneurs, who in turn think that it is an IT strategy, when in fact it is merely a cloud sales strategy.  Cloud computing has literally turned IT "upside down", but as we saw last week, many of Amazon's customers didn't realize what the consequences could be.  WIthout anyone in their organizations thinking about application deployment management, they became vulnerable to failures, performance problems, and downtime due to release management issues.

The solution is simple - isn't it?   Just go back to the way things were (left pyramid) but replace the infrastructure with cloud.  In fact, this is what the more successful users of cloud - who of course you don't hear about since they haven't experienced disastrous failures - have done.  However, in talking to prospects and customers, this hybrid approach doesn't get them the savings they expected from cloud computing.   They may save 10%-30% on infrastructure, and they definitely reduce their time to market, but they still have a heavy burden of IT operations, resulting in net savings in the 10%-20% range that leaves them disappointed with cloud economics.

Instead what is wanting here is a more 21st century approach to building an IT strategy for your SaaS company, similar to the hardware-eliminating infrastructure cloud revolution.  This comes down to two using two approaches, which can be combined: outsourcing, and automation.

For the smaller company or entrepreneur who still needs access to the level of IT competence that will ensure successful deployments, outsourcing is a godsend.  There are an increasing number of break-fix operations services companies that will manage your applications in the cloud for you, though the number who will actually advise you and fully manage the deployment and ongoing maintenance are fewer.  This is why we introduced our PrimaCare service to complement our own cloud services, which is essentially an insurance policy on your deployment to keep it from failing and make it successful.  By outsourcing IT on a per-deployment/per-VM basis, IT consts can be controlled through the same on-demand and pay-per-use methods that made cloud computing successful.

The other tool available to the SaaS provider is automation.  Amazon's Cloud Formation is great, but doesn't automate business processes associated with managing the cloud.    While commercial solutions like RightScale or open-source tools like Puppet and Chef can help, there is as yet no fully automated cloud management workflow engine, though the up-and-coming Kaavo Imod system is very promising.  However, the first order of business is to determine what processes you use for code updates, platform updates, security management, backup, disaster recovery, etc.   While some of these processes look like something that might come from your cloud vendor, unless you determine what you need and then assess what the vendor provides, you will never know the holes that need to be plugged.  An operations services vendor like ENKI can assist you with professional services to guide you through process discovery for your critical processes and the best way to automate them.

The resulting deployment pyramid is a hybrid of the old-school and Web 2.0 models, in which the middle layer - focused on operations services - has been restored but modified to complement the cloud delivery model and deliver the capability and maturity model (CMM) that SaaS customers have come to expect from their providers:

FutureDeploymentOrgs

If it sounds like I'm leaning heavily on outsourced services, it is because in general I've found that rapidly growing entrepreneurial companies don't have the resources to solve these problems on their own, and the recent Amazon meltdown underscores that observation.   While it's nice to be able to solve every problem yourself, it's clearly better to get them solved sooner than you would waiting until you've grown enough (or had enough bad experiences) to do so yourself.  If your organization can leverage the skills, processes, and staff of a mature IT organization, on-demand, why develop it yourself, especially if the costs are lower.   Outsourced operations services are considerably more expensive than not having any operations services in your deployment model - but quite a bit less expensive that going it alone or suffering a repeating series of downtime events.

These observations are confirmed by the reducation in downtime we have seen ENKI's operations services customers experience, typically adding another half-nine or more of reliablity to their deployments.

Comment (0)
Apr 26
2011

Did the Failure at Amazon Reveal a Problem with Cloud Computing?

Posted by: Eric Novikoff

Tagged in: Business Strategy

It's been a few days now since Amazon's Virginia datacenter failed and took down hundreds of internet companies, including serveral high-visibility startup companies and platform-as-a-service providers.  There has been a lot of talk about how this failure, which "was never supposed to happen" is a blow to cloud computing.  However, I think it's more of a blow to the freewheeling marketing practices for cloud computing, and to those consumers of cloud who thought that it eliminated their responsibility to worry about infrastructure reliability.   These words are a bit harsh, I know, but they reflect the reality that many of Amazon's customers have been remiss in managing their IT responsibilities.   The silver lining to this event is that I think it will bring about changes in how cloud is deployed and more importantly used, that will continue to drive its successful adoption.

To review what we know about the event, Amazon's storage subsystem encountered a failure (which I surmise was due to a code update) that caused it to replicate customer's data, causing storage slowdowns, until the storage system was full which then caused complete failure.  Somehow this failure crossed multiple "availability zones" which were not supposed to all fail at the same time.   With no storage accessible to them, applications failed.  Even as late as today, some companies were saying that they were still missing data.

How could this happen?  Well from an infrastructure point of view, the problem is clearly due to monoculture.  Monoculture is a term used in agriculture to describe planting your entire farm with the same crop.  If an insect that likes that crop attacks your farm, you lose your whole crop.   Similarly, Amazon, in order to get good economy of scale and minimize manual labor, has filled their datacenter with the same hardware, hooked up the same way, loaded with the same software, replicated over and over.   Introduce a software bug into a release of their infrastructure management systems, and the automated distribution of the release will spread to the entire datacenter - or wider, depending on maintenance policies. 

However, this does not explain why so many of Amazon's customers were taken by surprise by the failure.  From their expressions of surprise, it's pretty clear that many assumed that Amazon's assurances that its infrastructure was reliable meant that they wouldn't have any problems.  However, from the first day that Amazon offered their service, they were quite clear that while their service was likely to remain up almost all the time, individual servers or even availability zones were not guaranteed to be that reliable, and that Amazon expected its customers to engineer around this limitation.  Amazon's weak uptime guarantee of three and a half nines backed by a small percentage discount that users could apply for at the end of the year communicates even more clearly that they had no plans in place to keep individual servers running at high availability, nor did they plan to shoulder heavy financial responsibility for long outages.   Even the introduction of automated failover features did little to improve this, since their extremely long provisioning times and questionable availability of resources at peak times meant that customers could experience noticeable failures.

Of course, nobody expected multiple availability zones to fail at once, so even if their applications had failover to another zone in the datacenter, they expected to enjoy high uptime.   But they should have expected a full-datacenter failure: there are many modes of systems failure that take out an entire state-of-the-art datacenter, including accidental destruction of fiber lines to the datacenter, systemic power failures such as the one a few years ago in the generators at 365 Main in San Francisco that took out a raft of popular startups, or a more recent Amazon full-datacenter failure due to a car crashing into a power line.  I think the message here is that modern data centers are reliable, but not infinitely reliable.  And similarly, while Amazon's services simplify the access to virtual infrastructure, that ease of access masks the fact that it is built out of physical infrastructure that has definite vulnerabilities which have not been automated away as completely as the hassle of setting up your own physical server.

Puzzlingly, it seems that this simple understanding did not reach the management teams at the companies that were suffering the most from last week's outage.   This could be because the marketing of cloud seems to follow an unwritten rule that the drawbacks and vulnerabilities will not be discussed by vendors, who prefer to emphasize the low prices of cloud computing - which paradoxically are enabled by ignoring vulnerabilities that experienced IT shops would not let pass.   When pressed, vendors often say that they won't release proprietary information for security purposes or to protect their intellectual property.  However, experienced IT folk will know that variying degrees of these limitations are a basic feature of any infrastructure deployment, and must be engineered around at the deployment and application software layers.  So the inescapable conclusion is that the management teams at many of Amazon's customers either chose to ignore the problem or felt that the costs of mitigating it were not worth the potentially small improvement in reliability.  In other words, there was a gap between what Amazon could reasonably provide in the way of reliability, and what these companies needed, that was not filled by them, either intentionally or unintentionally.    In my next blog, I will explore how this has come to be, and how this problem will be solved by a combination of changes in both cloud customers and cloud vendors.

Comment (0)
Apr 26
2011

PC World Learns SLAs Matter When Buying Cloud

Posted by: Eric Novikoff

Tagged in: Cloud Usage

PC World wrote about what your business can learn from the Amacon Cloud Outage, noting that you should examine the SLAs you get from your cloud provider as an indicator of the level of reliability of their cloud product, as well as looking at diversification and simply deciding what is mission critical in your company.

"And while you're negotiating those deals with one or more cloud providers, take a minute to examine your service level agreements (SLAs) with any provider. SLAs should set out how your providers are rewarded when things go right, and how you're compensated when things go wrong."

"Especially if you're working with a local service provider which is working with an Amazon, a Google, or another major public cloud infrastructure vendor, make sure those SLAs spell out who is responsible for what should things go awry. It's worth the extra time and effort early in the relationship to make sure those SLAs are clear, comprehensive and iron-clad."

"If something goes wrong, you don't want your business to languish offline while your vendors pass the buck for responsibility for the outage. This is the very definition of when you want one throat to choke, and you want to make sure it's clear to whom that throat belongs."

I'd like to offer a few summary thoughts that expand on the article based on our experience here at ENKI:

- For truly mission critical applications, going onsite is an expensive and unnecessary step... a return to a past most of us would rather put behind us. Instead, diversify your cloud deployments across different geographies.

- True fault-tolerant diversification requires at a minimum that your application be set up to maintain data currency across multiple deployments. You will want to look at the way databases work, files are stored, and how you will compensate for delays in replication whether you choose an active/active or active/standby DR solution.

- As the article points out, SLAs matter, but are you getting the right SLAs? The companies who suffered from this outage didn't get any disaster recovery/business continuance SLAs from Amazon and that should have been a red flag for them. To solve this problem, you will need to develop in-house IT expertise on DR/BC and dedicate resources to it, or choose a different cloud vendor or operations services provider that can do this for you.

- DR/BC is expensive because it requires some thought (i.e., labor) and duplicated hardware. This means it isn't a no-brainer and you'll want to assess exactly how much protection each of your applications needs and what your budget is to provide it.


Cloud is, unfortunately, not yet a panacea that offers enterprise-class reliability for free. Instead, it is a way to reduce the cost and headaches of managing your own hardware. You still need experienced IT staff to manage your deployment - whether they are in-house or outsourced.  ENKI's outsourced operations services are designed to help companies match their cloud deployment to their business needs and then managed that deployment to deliver the SLAs that are required.

Comment (0)
Share to Facebook Share to Twitter Stumble It Share to Reddit Share to Delicious Share to Google Buzz 
Social Widgets Ultimate Edition - Copyright © 2010 by Turnkeye.com

Free Cloud Buyer's Guide

Our informative guide is full of best practices to help you choose the right Cloud vendor for your business and to make your cloud application deployment successful.

Download Now

Latest Blog Entries

  • Going beyond compliance: achieving true security in the Cloud
  • The Straight Dope About Cloud Downtime and the Myth of Perfection
  • The two basic types of cloud architecture
  • Why overallocation makes cloud computing services impossible to compare
  • Does Cloud Computing Drive Vendor Lock-in?
  • Is Amazon "all that?"
  • Report From VMWorld: is the cloud industry getting ahead of itself?
  • Is Cloud Hype Beneficial?
Business Strategy Case Studies Cloud 101 Cloud Industry Cloud Usage Commentary ENKI Information Events First Person Infrastructure News Philosophy Pricing Techniques Technology

Blog Archive

  • March 2012(2)
  • February 2012(2)
  • January 2012(1)
  • September 2011(2)
  • August 2011(2)
  • May 2011(3)
  • April 2011(4)
  • March 2011(1)
  • February 2011(2)
  • January 2011(5)
  • October 2010(1)
  • September 2010(5)
  • August 2010(2)
  • June 2010(1)
  • May 2010(1)
  • April 2010(1)
  • March 2010(1)
  • February 2010(1)
  • January 2010(1)
  • October 2009(2)
  • September 2009(7)
  • August 2009(3)
  • July 2009(3)
  • June 2009(6)
  • May 2009(2)
  • April 2009(4)
  • March 2009(2)
  • February 2009(1)
  • January 2009(1)
  • November 2008(1)
  • October 2008(2)
  • August 2008(4)
  • July 2008(2)
  • June 2008(1)
  • May 2008(1)
  • April 2008(1)
  • February 2008(3)
  • January 2008(3)
  • December 2007(2)
  • November 2007(1)
  • September 2007(1)
  • August 2007(3)
  • June 2007(1)
  • May 2007(1)
  • March 2007(1)
  • February 2007(4)
  • January 2007(3)
OVERVIEW
  • About PrimaCloud
  • About PrimaCare
  • Key Benefits
  • Comparing Cloud Options
HELP CENTER
  • Frequently Asked Questions
  • Contact Us For Support
  • Terms and Conditions
SELF SERVICE PORTALS
  • PrimaCloud
  • Monitoring
  • Customer Portal
  • Discount Domains & Certificates
Follow @enkicloud
LOGO_CoFounderWebsite
Copyright © 2011 ENKI LLC