|
Apr 27
2011
|
|
In my last blog article, I wrote about what had happened to cause so many customers at Amazon's EC2 service to go down last week. Now, I'd like to share my observations and learnings about this event gleaned from running our own cloud computing service and practice.
I have written extensively in the past about how cloud services have not progressed to the point where companies can do without IT operations, despite the apparent ease of starting an instance in the cloud. Unfortunately, this observation is at the root of the problems that Amazon's customers experienced last week.
Let's look at how cloud computing has turned traditional SaaS provisioning upside-down in the last few years. Back when I worked at NetSuite, and at other companies before that, once an application was completed, a deployment organization was built to provide it reliably and capably to customers. While the pyramid on the left below is a bit exaggerated, a mature organization providing a reliable SaaS offering would grow its customer base over time by expanding the in-house deployment (IT) group which managed its systems, and by buying and maintaining more and more hardware in expensive purpose-built datacenters. This provided good to excellent reliability because the company built its IT group with expensive hard-to-find (especially during the dot com boom) experienced IT experts.

However, today a software startup or even a growing SaaS company can "forget" about worrying much about IT since it can be managed at the click of a mouse. The eye-hand-mind cycle of selecting a server instance, clicking to get it started, and being rewarded with an apparently perfectly functioning system leads the user to think everything is perfect; that all eventualities have been handled. Especially if that hand is that of a developer or other staff not experienced in production IT concerns. As a result, the company's management allows resources to go into developing the application instead of the IT infrastructure. So far, so good.
The problems with this approach appear when management thinks that Cloud infrastructure is its IT strategy and eliminates the "Deployment Strategy" layer of the traditional IT stack. After all, why keep all those people around when you can just tell Amazon what to do on the web? This has resulted in the deployment organization you see on the right, which looks really good to the developer-founders and their investors, since more resources go to developing the app and everyone saves money and keeps equity compared to doing things the traditional way.
Of course, cloud vendors have not been remiss in taking up this value proposition and selling it back to entrepreneurs, who in turn think that it is an IT strategy, when in fact it is merely a cloud sales strategy. Cloud computing has literally turned IT "upside down", but as we saw last week, many of Amazon's customers didn't realize what the consequences could be. WIthout anyone in their organizations thinking about application deployment management, they became vulnerable to failures, performance problems, and downtime due to release management issues.
The solution is simple - isn't it? Just go back to the way things were (left pyramid) but replace the infrastructure with cloud. In fact, this is what the more successful users of cloud - who of course you don't hear about since they haven't experienced disastrous failures - have done. However, in talking to prospects and customers, this hybrid approach doesn't get them the savings they expected from cloud computing. They may save 10%-30% on infrastructure, and they definitely reduce their time to market, but they still have a heavy burden of IT operations, resulting in net savings in the 10%-20% range that leaves them disappointed with cloud economics.
Instead what is wanting here is a more 21st century approach to building an IT strategy for your SaaS company, similar to the hardware-eliminating infrastructure cloud revolution. This comes down to two using two approaches, which can be combined: outsourcing, and automation.
For the smaller company or entrepreneur who still needs access to the level of IT competence that will ensure successful deployments, outsourcing is a godsend. There are an increasing number of break-fix operations services companies that will manage your applications in the cloud for you, though the number who will actually advise you and fully manage the deployment and ongoing maintenance are fewer. This is why we introduced our PrimaCare service to complement our own cloud services, which is essentially an insurance policy on your deployment to keep it from failing and make it successful. By outsourcing IT on a per-deployment/per-VM basis, IT consts can be controlled through the same on-demand and pay-per-use methods that made cloud computing successful.
The other tool available to the SaaS provider is automation. Amazon's Cloud Formation is great, but doesn't automate business processes associated with managing the cloud. While commercial solutions like RightScale or open-source tools like Puppet and Chef can help, there is as yet no fully automated cloud management workflow engine, though the up-and-coming Kaavo Imod system is very promising. However, the first order of business is to determine what processes you use for code updates, platform updates, security management, backup, disaster recovery, etc. While some of these processes look like something that might come from your cloud vendor, unless you determine what you need and then assess what the vendor provides, you will never know the holes that need to be plugged. An operations services vendor like ENKI can assist you with professional services to guide you through process discovery for your critical processes and the best way to automate them.
The resulting deployment pyramid is a hybrid of the old-school and Web 2.0 models, in which the middle layer - focused on operations services - has been restored but modified to complement the cloud delivery model and deliver the capability and maturity model (CMM) that SaaS customers have come to expect from their providers:

If it sounds like I'm leaning heavily on outsourced services, it is because in general I've found that rapidly growing entrepreneurial companies don't have the resources to solve these problems on their own, and the recent Amazon meltdown underscores that observation. While it's nice to be able to solve every problem yourself, it's clearly better to get them solved sooner than you would waiting until you've grown enough (or had enough bad experiences) to do so yourself. If your organization can leverage the skills, processes, and staff of a mature IT organization, on-demand, why develop it yourself, especially if the costs are lower. Outsourced operations services are considerably more expensive than not having any operations services in your deployment model - but quite a bit less expensive that going it alone or suffering a repeating series of downtime events.
These observations are confirmed by the reducation in downtime we have seen ENKI's operations services customers experience, typically adding another half-nine or more of reliablity to their deployments.









