back to article Docker: Sorry, you're just going to have to learn about it. Today we begin

Docker, meet hype. Hype, meet Docker. Now: Let's have a sit down here and see if we can work through your neuroses. For those of you who don't yet know about Docker, it is a much-hyped Silicon Valley startup productising (what a horrible unword) Linux containers into something that's sort of easy to use. Containers aren't a …

  1. Graham 24

    Differences from virtualisation?

    "Done properly, applications that are jailed cannot impinge upon the resources of other applications that reside in other jails, cannot see their storage and shouldn't even be aware that those other jails exist."

    For a hypervisor, I could say:

    "Done properly, operating systems that are virtualised cannot impinge upon the resources of other operating systems that reside in other virtual machines, cannot see their storage and shouldn't even be aware that those other virtual machines exist."

    In the spirit of enquiry, how does a container differ from a "full fat" virtual machine?

    I'm thinking of this from the point of view of the application - presumably each application "sees" a certain number of CPU cores, a certain amount of RAM, a certain amount of disk space and so on. That sounds a lot like a standard virtual machine to me. There must be a difference though.

    Is it just that the containers share the RAM associated with the "parent" operating system, so there's some efficiency and performance gains, or are there some specific technical differences aside from performance?

    1. DanDanDan

      Re: Differences from virtualisation?

      "Is it just that the containers share the RAM associated with the "parent" operating system, so there's some efficiency and performance gains, or are there some specific technical differences aside from performance?"

      Effectively, that's my understanding of it. There's much less overhead with a container vs a full-fat OS. The difference is likely very substantial, if my peddlings in virtualisation are anything to go by.

      1. Warm Braw

        Re: Differences from virtualisation?

        The big difference is that for every system call in a virtual machine (leaving optimisations aside), the hypervisor has to intercept the mode-change and then attempt to emulate kernel mode for the guest operating system while leaving the processor actually running in user mode (this involves a lot of mode switching as the guest OS will typically be executing privileged instructions at this point).

        In a container, the real (only) OS executes the system call and emulates nothing - big performance win there alone (never mind memory for multiple copies of operating system, page tables, competing attempts to manipulate the address translation lookaside, etc, etc).

        The downside, of course, is that you can't run, say Windows + Linux on the same machine - but that's not really an issue except in development environments.

      2. etmsreec

        Re: Differences from virtualisation?

        That was certainly the explanation I heard from Red Hat this week - the CPU overhead for docker is less than 10% (typically about 5%, I think the figure was) and the memory overhead is much less too.

        Plus, if there are multiple docker instances running and all looking at the same library then it'll use the library in the host rather than the docker image. If it's a different one on each instance, the instances will load their own.

    2. Trevor_Pott Gold badge

      Re: Differences from virtualisation?

      There are three more parts to the series I penned. One of those parts is "containers versus hypervisors." It got rather long to be all one article, sorry mate. It'll be in the followup pieces!

    3. Tom 38

      Re: Differences from virtualisation?

      Historically, the difference has been that VMs are easier to manage the resources that they use, since everything is virtual. With containers, nothing is virtual, there is simply a management layer that prevents you from doing certain things.

      When you write a file on a VM, the virtual disk drive is called from the VM, the hypervisor eventually translates that into a write(2) call. When you write a file on a container, there is no intermediate step, the process in the container directly invokes write(2). The same applies to every syscall.

      This directness is what makes containers so much more efficient than VMs, but it is also what prevented them from being used as much - with a VM, you can more closely constrain how much resource each VM uses, where as with early containers, a single container can easily use up all the resources on the box.

      More modern containers like libcontainer and FreeBSD jails virtualise access to certain resources in order to allow controlling how much of that resource each container can use. This gives vastly less control than a VM would, eg in Docker and Jails you can control the cpuset and cpu shares that the processes in the container can see, and Docker can additionally control how much memory the container can see, but it cannot do things like ballooning or overallocation (iirc). Interestingly, Docker with LXC gives you far more resource controls than Docker with libcontainer.

      Since the resource control is quite basic, the overhead of providing it is much less than on a VM. However, since the resource control is quite limited, you cannot do things like allocating a far share of IOPS, so if you have a container that uses all of your disk IOPS, you will still starve all your other containers. Hypervisors like KVM allow you to specify IOPS and throughput on virtual disk devices.

    4. Nate Amsden

      Re: Differences from virtualisation?

      one major one in my experience is networking. In my experience deploying LXC it seemed not practical to have multiple network segments (and thus routing tables) existing on a host that has LXC. So for example we have our nonproduction VMs on one VLAN and production on another, both have different default gateways(IPs on the switches). In my testing I was not able to have both co-exist on a single LXC host. I read that it *might* be possible at the time(6 months ago on Ubuntu 12.04) but highly experimental and complex, so I decided to not even bother as those two things are not something I am interested in trying to support.

      Another difference is fully abstracted isolation in a VM vs container which gives you things like VM mobility between storage and servers w/o downtime - at least I haven't seen this happen with containers of any type.

      We have deployed 6 containers(across 3 LXC hosts) for a very specific production workload (all are load balanced running the same application stack). I could see us expanding container support in very specific production workloads in the future to better leverage the hardware, though I don't see a point in using it for non production stuff as it is not flexible enough(routing tables) and our non production stuff isn't redundant since it is non prod, so a failure would be more of a pain to deal with if it can't be transparently recovered, also scheduled downtime(e.g. kernel update since there is only 1 kernel across all containers on a given host) would be a pain as well for the same reason.

      I also do not like how I cannot query CPU/memory usage on a per container basis using regular linux tools they are not container aware (there may be tools that can get this info I haven't seen them myself). Same goes for network traffic - if I query SNMP on a linux container for network data I get 0 data. Also do not like how the LXC "parent" host shows all of the PIDs from all of the containers, makes it very difficult to sort through things. For us it is manageable since the workloads are very specific but I wouldn't want to deploy a dozen different random containers on a host that weren't tightly controlled for this reason too.

      Since deploying containers, aside from the annoying things from a management perspective they have been flawless and have served their original purpose - unlock a massive amount of CPU(24 cores (48 threads) x Xeon E5-2695v2 per host) for our main e-commerce platform in production that comes with a $15k/server/year license. Average CPU went from ~40% to about 3%(new systems have much more horsepower than previous VMs). Peak CPU hasn't gone above about 25%, so these have a good 2 or 3 years of growth for them I bet, which was the plan. Massively under subscribed most of the time but that's fine - the cost of the license alone justifies that easily, as does knowing we'll not run out of capacity on that application for a VERY long time under any circumstances.

      Unlike some orgs we are not one of the ones that likes to destroy and re-create resources on a regular basis. These containers will have a life span measured in years, like our VMs, ideally anyway.

      1. Trevor_Pott Gold badge

        Re: Differences from virtualisation?

        Insightful as usual, Mr. Amsden. Thanks!

  2. thondwe

    MainFrame

    So how long a journey have we got to go before we realise what we always wanted was a Main Frame?

    1. I Am Spartacus
      Coat

      Re: MainFrame

      Have an up-vote. When can I get VM-CMS on my pc and relive the glory days in the 80's?

      Mine the one with JCL for fun and profit in the pocket.

      1. Geoff Stevens

        Re: MainFrame

        I Am Spartacus wrote:

        [...]When can I get VM-CMS on my pc and relive the glory days in the 80's?[...]

        You can do it right now:

        http://gunkies.org/wiki/Installing_VM/370_on_Hercules

      2. Mike Schwab

        Re: MainFrame

        Hercules and VM/370 should run just fine on your Raspberry Pi or better.

    2. Anonymous Coward
      Anonymous Coward

      Re: MainFrame

      I don't think there was ever a question that we all wanted mainframe. IBM just refused to price it reasonably, so we've had to try to recreate it on open systems.

      1. Michael Wojcik Silver badge

        Re: MainFrame

        IBM just refused to price it reasonably

        Considering how much money IBM has made from the 360-to-z line, I think they priced it very reasonably. Those may not have been prices everyone was willing (or able) to pay, but clearly they were what a large market would accept.

        Could IBM have sold (or leased) mainframe hardware and software for less? Sure. Would that have been closer to the optimal price for maximizing their profits over the long term? Impossible to say. What we can say is that they did pretty well.

        And yes, that left enough of a market to provide the incentives to create commercial and free competitors.

        That's how capitalism works.

        1. Trevor_Pott Gold badge

          Re: MainFrame

          Capitalism doesn't work so much as "devolves". But that's a discussion for another time...

  3. This post has been deleted by its author

    1. Charlie Clark Silver badge

      Re: FreeBSD for servers

      FreeBSD is already in use for a lot of servers. But it's never had the mindshare of the me-too crowd so it's traditionally been less appealing to the outsourcers who are looking to deskill, and thus reduce wages, wherever possible. Maybe the systemd mess will encourage a few more frustrated sysadmins to give it a go.

      Similar story is Postgres vs. MySQL.

    2. Doctor Syntax Silver badge

      Re: FreeBSD for servers

      I've had a brief look at the FreeBSD derivative PC-BSD. My initial reaction was that something needs to be done about the Byzantine software management. There are packages and PBIs (push button installers) and a stack of stuff that just sits in the ports tree and isn't visible to the local system's software management. Clearly that stuff needs to be compiled up into PBIs and made accessible via a single management tool that will list the whole of what's available irrespective of whether it's a package or a PBI.

      As Debian on BSD kernels seems to have been one of the casualties of systemd maybe some of the folk who were working on that will turn their attention to either PC-BSD or an alternative derivative. After all, they should have the aptitude for it.

      1. DanDanDan
        Thumb Up

        Re: FreeBSD for servers

        "After all, they should have the aptitude for it."

        heh, I see what you did there.

    3. Tom 38

      Re: FreeBSD for servers

      I'm a huge FreeBSD advocate, I run it at home, on my desktop, on my PVR, on my firewall, on my laptop, and on 800+ servers at work, where we heavily use FreeBSD jails.

      We're ditching it at work :(

      The main reason is that it's not possible to find sysadmin with good enough FreeBSD experience (well, with "any" FreeBSD experience tbh), and the Linux sysadmins we do hire do not like working with FreeBSD, find it difficult to update and maintain the machines - anything that goes wrong ever, they just shrug and say "oh its BSD".

      However, the next biggest reason is that FreeBSD jails don't provide enough resource limitation technologies (see my post above), and so frequently you can have one poorly running application negatively affecting all the others.

      Our new platform is Linux (Centos) + KVM, deploying a single application to a single VM. Docker (currently) seems to have most of the issues that jails have, but perhaps a combination of the two will be in my future, using docker to deploy multiple applications to a single VM.

  4. Charlie Clark Silver badge

    Nice succinct history of containers but it seems to be missing some of the excellent work that Sun did with its containers.

  5. Anonymous Coward
    Anonymous Coward

    For the non-sysadmins, like your friendly local developers, Docker comes with some major advantages over traditional isolation techs (which, for most people, means virtualisation):

    1) Containers as text files

    Dockerfiles are easy to write, powerful and intuitive for anyone with a background in Vagrant or Chef/Puppet. Of course, being text, they plug straight into your source management suite.

    2) Cached builds

    Containers are built iteratively, with each step cached. Each being a full container, and each being able to start near-instantly, means iterative changes to environments during development are incredibly fast. No mucking about with snapshots or expensive rebuilds.

    3) Interactive building

    Unsure how you should build the container? Just do it interactively. Then you commit the changes, and hey presto you've got a working container ready to ship and guaranteed to work as it did when you committed it.

    This makes working with Docker, as a developer, an absolute pleasure. It's very rapid and it aligns to the conventions we're all used to. That's why the silicon valley hipster types are excited.

    1. Skymonrie

      This comment alone has given me the final push I need to give docker a proper try. Why don't more people say this compared to the "one day, it will even do your laundry" style reviews.

    2. Graham 24

      Define an entire operating environment with a text file?

      Sounds interesting...

      Can you post an example text file?

      1. This post has been deleted by its author

    3. PushF12

      And...

      5) Docker disintermediates distribution people and packaging subsystems.

      Software built for Docker runs on many Linux platforms with a reduced overhead investment in any particular Linux distribution. Many engineering teams have a DEB guy or a RPM guy, which is a payroll slot that can be recovered for primary development.

      Ever had an upstream package maintainer refuse to fix a bug in one of your dependencies because "reasons" or put their hand out when you ask for a distro package to be updated? Many expensively uncooperative middlemen can now be removed from your engineering lifecycle.

      This is why CoreOS is a thing and why Microsoft is eagerly interested in supporting this technology. Docker has the potential to greatly reduce the value and competitive advantage that companies like Canonical and RedHat provide in the Linux ecosystem.

      1. Anonymous Coward
        Anonymous Coward

        Or as Linux Torvalds put it (roughly) during the last DebConf: Docker has the potential to reduce fragmentation in the Linux corner, because if you get most applications (especially the more complex ones) to ship in Docker containers, the choice of distribution is really just that - a choice, not a dependency.

      2. Havin_it
        Thumb Up

        @PushF12

        I haven't even read all of your post yet, but I'm giving you an instant uppy for the ten-dollar word "disintermediates".

  6. Henry Wertz 1 Gold badge

    Difference between this and virtualization

    In a virtual environment, you either have Type I (bare metal) hypervisor or Type II (runs on top of an OS). VirtualBox for instance is Type II. Either way, you end up having a speed hit for any kernel code, although with modern tech like VT-X it's much lower.

    First assume zero overhead. Your application generates requests. The requests are processed by the virtual machine kernel and passed to the virtual machine drivers. The drivers get the data to the hypervisor, the hypervisor passes along requests to the kernel and the real kernel's drivers finish the requests. There could also be dual caching as the VM kernel and real kernel both cache data.

    In a container, your app generates requests; there's a little overhead while some layer vets the requests to ensure one doesn't break out of the jail; they're passed to the kernel and the driver finishes the requests. Much fewer steps.

    In reality, virtio network and disk drivers can cut the virtual machine driver overhead down quite a bit; without it the virtual machine drivers and hypervisor are faffing about with various registers and whatever emulating a real network card, SATA controller, IDE controller, or SCSI controller. You also usually have to statically allocate RAM to VMs, whereas with containers you can set RAM usage limits but you otherwise just have a pool of available RAM.

    1. Alan_Peery

      VM memory not completely static

      > You also usually have to statically allocate RAM to VMs

      Not entirely true. With techniques like memory ballooning the OS may think it has 4Gb, but really it's only being given 2Gb -- and memory ballooning isn't terribly damaging to VM performance done correctly. There's also memory compression, and VM swapping. After a quick glance through, this article (http://media.kingston.com/images/branded/MKP_339_VMware_vSphere4.0_whitepaper.pdf) looks like a good if somewhat dated survey.

      There is also the possibility of hot-adding memory (or CPUs) to a running VM. It takes a conjunction of hypervisor and guest OS support, but it's shipping in VMware now: http://searchvmware.techtarget.com/tip/VMware-vSphere-hot-add-RAM-and-hot-plug-CPU-Not-so-hot-but-still-cool

  7. W. Anderson

    Trevor Pott needs to clarify - technically - when he claims that Virtuozzo is the "best" containerization, unless he is referring just to it's working in Microsoft Windows environment (supposedly) better than others.

    I have implemented FreeBSD Jails containerization for several years, in various configurations - especially with individual ip addresses and Networking setups - with excellent results. Experience with Solaris "Containers" have similar results.

    Maybe Mr. Pott has little or no "real world" experience with high level containeriztion outside his Microsoft PC playground.

    1. Trevor_Pott Gold badge

      No, I mean it in an absolute sense. Virtuozzo provide the best container tech at the moment. It's the most fully instrumented, the most stable, the most secure, the easiest to use. Even more so than Docker. Currently, they set the bar for excellence.

      For the record, I'm a Linux admin by trade. I was a Windows admin for 20 years, but we largely parted ways about 3 years back. I'd been using Linux for about 15 years in production at that point, but about three years ago it became over 90% of new installs. Today, Windows administration makes up less than 20% of the systems I oversee. And that is dropping.

      The sad part is, it's the Windows customers who bring in the real money. Linux customers are - in my experience - cheap barstwards who don't call you in until something is right good and broken. Windows clients are quite used to the idea of needing regular monthly maintenance.

  8. Peter Johnston 1

    Are you aware that you can do this natively in Google Cloud Platform? It has a Container Engine based on Kubernetes.

    1. Trevor_Pott Gold badge

      American Public Cloud evangelism is outside the scope of this series. Though if you've a yen to be the NSA's plaything, by all means, assist in the destruction of the privacy and civil liberties of your customers.

  9. PAT MCCLUNG

    Docker

    Don't worry about these money freaks. We'll run over them in a few months.

  10. launcap Silver badge
    Unhappy

    OpenVZ

    I use Proxmox (http://pve.proxmox.com/wiki/Main_Page) at home. Initially, most of my linux stuff was done via OpenVZ.

    After several years, I now have no OpenVZ containers - only KVM virtual machines. Why, I hear you cry?

    OpenVZ sucks for using a regular linux distro in - they all seemed to want special tweaks and post-install hacks to work properly. And don't even try to use the various update mechanisms in an OpenVZ container - they *will* break and leave your container an unsupportable mess that can no longer be patched.

    KVM uses more CPU/RAM/IOPS but at least the distributions tend to work correctly. And some even support virtio properly..

  11. The Original Steve

    AppV

    I'm sure I'm wrong, but what is the difference between Docker and AppV exactly?

    1. Trevor_Pott Gold badge

      Re: AppV

      Funnily enough, I am working on an article about exactly that.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like