Opsani: What's Next? Continuous Optimization
You may not have heard of Opsani, but their launch of Optune at KubeCon this week is a big deal, and if you build software that runs in the cloud, you should be paying attention.
At Zetta, we invest in early stage, AI-first startups with B2B business models. We spend a lot of time thinking about the potential (and risks) associated with AI. While much attention goes into the debate about whether AI will destroy jobs or merely enhance human productivity, we get more excited about applying AI to tackle problems that were too hard for humans in the first place. Think of problems like personalized medicine, climate change, urban transformation… and perhaps more prosaically, but still profound: application performance tuning.
Performance tuning is a problem that is too complex for human beings to do well. Ubiquitously, cloud and mobile applications that delight us, consume us, and support our every working hour — applications with millions or even billions of users — chronically run with less performance and more cost than is ideal and possible for that workload: why?
Partially, it’s because no human has all the right expertise. Choosing the right infrastructure configuration is at the intersection of two kinds of knowledge.
Infrastructure knowledge alone could take a lifetime to master: compute, memory, cache, storage, network (bandwidth and latency), thread management, job placement, database config, application runtime, java garbage collector, the list goes on and on. Even DevOps experts who’ve made a career out of managing infrastructure won’t have expertise across all the layers in the application infrastructure, data infrastructure, and cloud infrastructure stack.
The other kind of knowledge is the application workload itself, because there is no “one true infrastructure” that performs best for all applications. The developer who wrote the code has the best chance of profiling its performance, debugging, and eliminating bottlenecks in the code itself. But very rarely is that software engineer also an expert in infrastructure — they’ll usually optimize their code in the context of a fixed infrastructure dictated by DevOps. Top tech companies have performance engineers who bridge both kinds of knowledge — but there are never enough of them to go around, and as generalists, their knowledge is incredibly broad but usually not incredibly deep.
Risk aversion is another obstacle. Most tech companies routinely (and correctly!) prioritize reliability over performance and cost. By over-spending and over-provisioning, we buy peace of mind. But the truth is that well-optimized infrastructure personalized to the workload can deliver the same or better reliability, at higher performance, for less cost — if it were only possible to discover the formula.
A final issue is the pace of change. Modern application workloads change constantly: daily or weekly releases of new code, constant user growth, seasonal or unpredictable changes in behavior patterns, and new availability of infrastructure options from our favorite cloud providers. If you spent the weeks or months necessary to precision tune your infrastructure, it would be obsolete as soon as you finished.
Opsani offers the solution: continuous optimization with Optune. Optune is an engine based on deep reinforcement learning techniques which continuously examines millions of combinations of configurations to identify the optimal combination of resources and parameter settings. The result is infrastructure tuned precisely to the workload and goals of the application, whether that’s cost, performance, or the sweet spot in between, targeting precisely the way you think about your business metrics.
Importantly, Optune isn’t one and done. If you already believe in Continuous Integration and Continuous Delivery, the natural next step is Continuous Optimization: CI/CD/CO. Optune lives with your application, continuously monitoring workload, performance, and infrastructure options, tweaking and tuning for the maximum price/performance.
Why is Zetta so excited about optimization? In part, because if tech companies can be more efficient with our spending, we can collectively as an industry achieve more and serve our customers and users better. But we actually believe performance boosts will matter far more than spending efficiency. Small differences in performance make immense differences in how easy or hard an application is to use. Faster applications are more delightful, more convenient, and more engaging. The customers and users we serve find more value in applications that respond instantly and intuitively. Performance is a tide that lifts all business metrics, and propels achievement of a company’s core mission.
As an industry, we’ve been handicapped in achieving the best performance for our customers because performance tuning is a problem too hard for human beings. It’s time to give the machines a chance to solve the problem for us!