Chris Buckley

Chris is never satisfied with just achieving success, he is constantly looking for the next edge and is always focused on business outcomes, in a commercial context. Having solved perplexing problems within the most complex global environments for nearly two decades.

Since joining us at Virtual Clarity he has developed distributed systems platform strategy, and served as lead architect for global enterprise private cloud.

What you might not expect is that he combined being a part-time racing driver with being the founding co-chair of the Open Data Center Alliance’s infrastructure workgroup (he still races by the way). Chris is guided by a passion for identifying the right problems to solve from a business perspective, feel free to challenge him.

Connect with Chris.

View articles by Chris

Service Management in a containerised environment: what is the impact?

As a business, we are often asked if the use of containers by enterprises drives a need for change in the approach to service management.

In this series of blogs, we seek to answer this by considering:

  1. The nature of containers
  2. The technology needed to run containerised applications
  3. The characteristics of the types of applications which are a good fit to run in containers
  4. The impact this all has on service management

In the first blog, we addressed the nature of containers and the technology needed to run containerised applications. Having a container platform is all very well, but what sort of applications are likely to run on it?

Whilst containers as a concept have been around for many years, the current surge of adoption has largely been driven by the existence of Docker (software for containerising applications and then running those containers, first released in 2013) and, more recently, Kubernetes (a way of managing containers at scale).

Docker originated as a way of packaging and deploying micro-service-based applications, particularly those that aligned to the “12 factor” principles. One of the key relevant principles is “VI. Processes - Execute the app as one or more stateless processes” – here each container can be considered ephemeral with all persisted state being held in some other backing store. Applying the Cattle vs Pets analogy, 12 factor containerised applications tend to give us lots of cattle. If one container has a failure, you just fire up another. If it has a bug, you fix at source, release a new container to the library and replace all your existing instances with that one. More recently, Docker has begun to address use cases where people want state persistence inside each container, or to fix issues in place (i.e. treat as pets), but its roots are firmly in the world of cattle farming.

Kubernetes was created by Google to enable the management of containers at scale. In the world of cloud scale services, and containerised micro-service architected software, where you might have had tens of thousands of servers you now have hundreds of thousands of containers, removing any hope of managing these manually. Kubernetes provides the container platform management orchestration to make this scale of operation even possible – and hence makes it easier to build and operate applications in this way.

So, in summary:

  • Container platforms are complex, though the building and running of them may be someone else’s problem (e.g. in Communications as a Service environments)
  • But they also enable a style of application architecture (micro-services at internet scale) that hitherto has been prohibitively difficult to manage for all except the most technically adept organisations. This is certainly not the only architecture possible, but for many organisations it is likely a new one
  • None of this eliminates the need for all the old favourites, such as patch management, capacity planning, etc. but it may well change how, when and by whom these need to be undertaken

But how does this all impact service management?

For argument’s sake, we’ll start from the premise that the desired operating model includes a separation of responsibilities, with some infrastructure-oriented function owning and operating the container platform itself and, in general, individual developers or development teams being responsible for the construction and maintenance of the containers that will run upon it. Also, in order to enable agility and dynamic scaling, etc., platform services will be exposed via self-service APIs.

As an aside, however, note that just having a container platform, and using containers, doesn’t create any fundamental change to how you approach operations – that must be planned and implemented as an addition to the new technology. Formalising your RACI here is key.

Theoretically, at least, this separation provides a clean boundary between the concerns of infrastructure and development teams. The job of the infrastructure team being to ensure that containers can be initiated on an appropriate compute host (with access to appropriate storage, networking, etc.) and that of the development team being to ensure that once it is there it provides the intended application functionality. Old problems such as the provision of required application software libraries are now clearly the responsibility of development, whilst the underlying platform environment can be maintained largely independently of the applications that run on it.

Now, in reality, containers and container platforms don’t provide quite such a clean interface (Netflix discuss some of the problems they have seen here. Note their recommendation that if you are running a container platform yourself then having kernel expertise available is critical to managing some forms of performance issue, etc.)

That aside, based on our operating premise above and the types of architecture that containers are particularly well suited to, there are some consequences:

  • The “development deliverable” should now be a validated and complete software container rather than individual artefacts that need further installation on an OS
  • The development tool chain will need to provide some, ideally standardised and repeatable, way of integrating software and configuration artefacts into container templates and then testing them as a unit. There is clearly an opportunity here for a centrally provided container “bakery” service that constructs containers that also meet desired security controls, etc.
  • Unless every development team has its own container platform (which would seem inefficient at best for most applications) even in DevOps oriented environments individual development teams should not have any direct control over the configuration of the container platform itself. That is not to say that configuration can’t be changed to meet needs, rather that in a multi-tenant platform there needs to be an expectation that the requests of all teams must be considered holistically.
  • Monitoring and management systems need to become container aware, particularly with respect to the potential for ephemeral containers or auto-scaling systems. A traditional centralised CMDB trying to record every instance is unlikely to be viable here, though we may still need to be aware of what was impacted if a particular server were to go down. Information is best sourced, when needed, from the source of truth for each platform – and rarely is that the CMDB (which is typically downstream from some domain specific system). Also, aggregated platform logs are likely just as, if not more, useful for determining platform health than those from a single node.
  • KPIs used for assessing platform performance likely need to change. For example, a traditional % host up time KPI makes little sense with a large-scale container hosting system. Perhaps more relevant, however, would be KPIs addressing how long it takes to instantiate a new container instance once a request has been submitted, how often the required resources for those instantiations aren’t available, availability of interface APIs, etc.
  • At scale, configuration policy management is key. Container management systems are essentially policy engines. Policy needs to be version-managed as much as application code does.
  • Infrastructure must be designed and configured to be robust to real-time activities e.g. a process asking for more container instances to be started than there is capacity for. container

Core to all of these are the need for a well-structured approach to governance and technical standards to ensure that the container platform doesn’t just become another sprawling source of long-term technical debt and excess cost.

In summary, using containers can have a significant impact on how you approach service management, but that impact is more closely linked to accompanying (and, in our view, necessary) operating model change than it is to the technology per se.

At Virtual Clarity, we’ve successfully implemented thousands of containers in large enterprises, let us help.

Related Reading: