Entries in docker (3)

Monday
Apr252016

The Joy of Deploying Apache Storm on Docker Swarm

This is a guest repost from Baqend Tech on deploying and redeploying an Apache Storm cluster on top of Docker Swarm instead of deploying on VMs. It's an interesting topic because of the experience Wolfram Wingerath called it "a real joy", which is not a phrase you hear often in tech. Curious, I asked what made using containers such a good experience over using VMs? Here's his reply:

Being pretty new to Docker and Docker Swarm, I'm sure there are many good and bad sides I am not aware of, yet. From my point of view, however, the thing that makes deployment (and operation in general) on top of Docker way more fun than on VMs or even on bare metal is that Docker abstracts from heterogeneity and many issues. Once you have Docker running, you can start something like a MongoDB or a Redis server with a single-line statement. If you have a Docker Swarm cluster, you can do the same, but Docker takes care of distributing the thing you just started to some server in your cluster. Docker even takes care of downloading the correct image in case you don't have it on your machine right now. You also don't have to fight as much with connectivity issues, because every machine can reach every other machine as long as they are in the same Docker network. As demonstrated in the tutorial, this even goes for distributed setups, as long as you have an _overlay_ network.

 

When I wrote the lines you were quoting in your email, I had a situation in the back of my head that had occurred a few months back when I had to set up and operate an Apache Storm cluster with 16+ nodes. There were several issues such as my inexperience with AWS (coming from OpenStack) and strange connectivity problems relating to Netty (used by Storm) and AWS hostname resolution that had not occurred in my OpenStack setup and eventually cost us several days and several hundred bucks to fix. I really think that you can shield from problems like that by using Docker, simply because your environment remains the same: Docker.

On to the tutorial...

Click to read more ...

Wednesday
Dec162015

How Does the Use of Docker Effect Latency?

A great question came up on the mechanical-sympathy list that many others probably have as well: 

I keep hearing about [Docker] as if it is the greatest thing since sliced bread, but I've heard anecdotal evidence that low latency apps take a hit. 

Who better to answer than Gil Tene, Vice President of Technology and CTO, Co-Founder, of Azul Systems? Like Stephen Curry draining a deep transition three, Gil can always be counted on for his insight:

And here's Gil's answer:

Putting aside questions of taste and style, and focusing on the effects on latency (the original question), the analysis from a pure mechanical point of view is pretty simple: Docker uses Linux containers as a means of execution, with no OS virtualization layer for CPU and memory, and with optional (even if default is on) virtualization layers for i/o. 

CPU and Memory

From a latency point of view, Docker's (and any other Linux container's) CPU and memory latency characteristics are pretty much indistinguishable from Linux itself. But the same things that apply to latency behavior in Linux apply to Docker.

If you want clean & consistent low latency, you'll have to the same things you need to do on non-dockerized and non-containerized Linux for the same levels of consistency. E.g. if you needed to keep the system as a whole under control (no hungry neighbors), you'll have to do that at the host level for Docker as well.

If you needed to isolate sockets or cores and choose which processes end up where, expect to do the same for your docker containers and/or the threads within them.

If you were numactl'ing or doing any sort of directed numa-driven memory allocation, the same will apply.

And some of the stuff you'll need to do may seem counter-style to how some people want to deploy docker, but if you are really interested in consistent low latency, you'll probably need to break out the toolbox and use the various cgroups, tasksets and other cool stuff to assert control over how things are laid out. But if/when you do, you won't be able to tell the difference (in terms of CPU and memory latency behaviors) between a dockeriz'ed process and one that isn't.

I/O

Disk I/O

I/O behavior under various configurations is where most of the latency overhead questions (and answers) usually end up. I don't know enough about disk i/o behaviors and options in docker to talk about it much. I'm pretty sure the answer to anything throughput and latency sensitive for storage will be "bypass the virtualization and volumes stuff, and provide direct device access to disks and mount points".

Networking

The networking situation is pretty clear: If you want one of those "land anywhere and NAT/bridge with some auto-generated networking stuff" deployments, you'll probably pay dearly for that behavior in terms of network latency and throughput (compared to bare metal dedicated NICs on normal linux). However, there are options for deploying docker containers (again, may be different from how some people would like to deploy things) that provide either low-overhead or essentially zero-latency-overhead network links for docker. Start with host networking and/or use dedicated IP addresses and NICs, and you'll do much better than the bridged defaults. But you can go to things like Solarflare's NICs (which tend to be common in bare metal low latency environments already), and even do kernel bypass, dedicated spinning-core network stack things that will have a latency behavior no different (on Docker) than if you did the same on bare metal Linux.

 

Docker (which is "userland as a unit") is not about packing lots of thing into a box. Neither is guest-OS-as-a-unit virtualization. Sure, they can both be used for that (and often are), but the biggest benefit they both give is the ability to ship around a consistent, well captured configuration. And the ability to develop, test, and deploy that exact same configuration. This later turns into being able to easily manage deployment and versioning (including roll backs), and being able to do cool things like elastic sizing, etc. There are configuration tools (puppet/chef/...) that can be used to achieve similar results on bare metal as well, of course (assuming they truly control everything in your image), but the ability to pack up your working stuff as a bunch of bits that can "just be turned on" is a very appealing.

I know people who use virtualization even with a single guest-per-host (e.g. an AWS r3.8xlarge instance type is probably that right now). And people who use docker the same way (single container per host). In both cases, it's about configuration control and how things get deployed, and not at all about packing things in a smaller footprint.

The low latency thing then becomes a "does it hurt?" question. And Docker hurts a lot less than hypervisor or KVM based virtualization does when it comes to low latency, and with the right choices for I/O (dedicated NICs, cores, and devices), it becomes truly invisible.

On HackerNews

Wednesday
Aug192015

The Microsoft Take on Containers and Docker

This is a guest repost by Mark Russinovich, CTO of Microsoft Azure (and novelist!). We all benefit from a vibrant competitive cloud market and Microsoft is part of that mix. Here's a good container overview along with Microsoft's plan of attack. Do you like their story? Is it interesting? Is it compelling?

You can’t have a discussion on cloud computing lately without talking about containers. Organizations across all business segments, from banks and major financial service firms to e-commerce sites, want to understand what containers are, what they mean for applications in the cloud, and how to best use them for their specific development and IT operations scenarios.

From the basics of what containers are and how they work, to the scenarios they’re being most widely used for today, to emerging trends supporting “containerization”, I thought I’d share my perspectives to better help you understand how to best embrace this important cloud computing development to more seamlessly build, test, deploy and manage your cloud applications.

Containers Overview

In abstract terms, all of computing is based upon running some “function” on a set of “physical” resources, like processor, memory, disk, network, etc., to accomplish a task, whether a simple math calculation, like 1+1, or a complex application spanning multiple machines, like Exchange. Over time, as the physical resources became more and more powerful, often the applications did not utilize even a fraction of the resources provided by the physical machine. Thus “virtual” resources were created to simulate underlying physical hardware, enabling multiple applications to run concurrently – each utilizing fractions of the physical resources of the same physical machine.

We commonly refer to these simulation techniques as virtualization. While many people immediately think virtual machines when they hear virtualization, that is only one implementation of virtualization. Virtual memory, a mechanism implemented by all general purpose operating systems (OSs), gives applications the illusion that a computer’s memory is dedicated to them and can even give an application the experience of having access to much more RAM than the computer has available.

Containers are another type of virtualization, also referred to as OS Virtualization. Today’s containers on Linux create the perception of a fully isolated and independent OS to the application. To the running container, the local disk looks like a pristine copy of the OS files, the memory appears only to hold files and data of a freshly-booted OS, and the only thing running is the OS. To accomplish this, the “host” machine that creates a container does some clever things.

The first technique is namespace isolation. Namespaces include all the resources that an application can interact with, including files, network ports and the list of running processes. Namespace isolation enables the host to give each container a virtualized namespace that includes only the resources that it should see. With this restricted view, a container can’t access files not included in its virtualized namespace regardless of their permissions because it simply can’t see them. Nor can it list or interact with applications that are not part of the container, which fools it into believing that it’s the only application running on the system when there may be dozens or hundreds of others.

For efficiency, many of the OS files, directories and running services are shared between containers and projected into each container’s namespace. Only when an application makes changes to its containers, for example by modifying an existing file or creating a new one, does the container get distinct copies from the underlying host OS – but only of those portions changed, using Docker’s “copy-on-write” optimization. This sharing is part of what makes deploying multiple containers on a single host extremely efficient.

Second, the host controls how much of the host’s resources can be used by a container. Governing resources like CPU, RAM and network bandwidth ensure that a container gets the resources it expects and that it doesn’t impact the performance of other containers running on the host. For example, a container can be constrained so that it cannot use more than 10% of the CPU. That means that even if the application within it tries, it can’t access to the other 90%, which the host can assign to other containers or for its own use. Linux implements such governance using a technology called “cgroups.” Resource governance isn’t required in cases where containers placed on the same host are cooperative, allowing for standard OS dynamic resource assignment that adapts to changing demands of application code.

The combination of instant startup that comes from OS virtualization and reliable execution that comes from namespace isolation and resource governance makes containers ideal for application development and testing. During the development process, developers can quickly iterate. Because its environment and resource usage are consistent across systems, a containerized application that works on a developer’s system will work the same way on a different production system. The instant-start and small footprint also benefits cloud scenarios, since applications can scale-out quickly and many more application instances can fit onto a machine than if they were each in a VM, maximizing resource utilization.

Comparing a similar scenario that uses virtual machines with one that uses containers highlights the efficiency gained by the sharing. In the example shown below, the host machine has three VMs. In order to provide the applications in the VMs complete isolation, they each have their own copies of OS files, libraries and application code, along with a full in-memory instance of an OS. Starting a new VM requires booting another instance of the OS, even if the host or existing VMs already have running instances of the same version, and loading the application libraries into memory. Each application VM pays the cost of the OS boot and the in-memory footprint for its own private copies, which also limits the number of application instances (VMs) that can run on the host.

App Instances on Host

The figure below shows the same scenario with containers. Here, containers simply share the host operating system, including the kernel and libraries, so they don’t need to boot an OS, load libraries or pay a private memory cost for those files. The only incremental space they take is any memory and disk space necessary for the application to run in the container. While the application’s environment feels like a dedicated OS, the application deploys just like it would onto a dedicated host. The containerized application starts in seconds and many more instances of the application can fit onto the machine than in the VM case.

Containers on Host

Docker’s Appeal

Click to read more ...