Docker 1.10: Security and User Namespaces
I’ve had some amount of ownership and code in the Docker engine since the 1.3.0 release way back in October 2014. But there is definitely something very special about the latest 1.10 release given a feature I’ve been working on since last spring is officially part of the Docker runtime! I’ve already written a long blog post about user namespaces in Docker when it became part of the 1.9-era experimental builds a few months ago, so I won’t bore you with all the details here. To be clear, that information still holds true, with a small update at the top to reflect some final design changes that were merged for the graduation from experimental to master.
In general it seems as if user namespaces (among other great security additions) is definitely still a very hot topic. The blog post I referenced above has the highest number of readers of anything I’ve ever written, with still on average 75-100 hits a day, even several months after publishing it! And as much as it would be nice to say “we’re done,” there is still work to do to add a Phase 2 implementation of user namespaces that allows for custom mappings to be provided per-container. One of the strongest use cases is for support of a multi-tenant container cloud where each tenant could receive a non-overlapping user and group ID space unique from any other tenant. The key sticking point, as I mentioned in the prior blog, is the ownership of layers which are shared between containers today. For each container to have it’s own “view” of what the file ownership is, we are looking to the Linux filesystem community for some notion of “ID shift” mount capability. An interesting proof-of-concept is underway with the overlay filesystem in this systemd issue on GitHub. Hopefully you’ll be hearing progress on this front as we move through 2016.
As Stefan Berger, IBM security researcher, and I shared in our Tokyo OpenStack Summit talk in October, the security features of Docker—and containers in general—should be viewed as a layered set of capabilities, with different features providing different aspects of isolation and protection. Maybe the most important takeaway from the Docker 1.10 release, as well as analogous advancements in the Linux kernel community, is that security is incrementally improving for containers as time marches on. With the addition of seccomp and, more importantly, a default seccomp profile, future PID cgroup controls, user namespaces, and the existing AppArmor/SELinux LSMs, cgroups and namespace isolation mechanisms, having a compelling answer for the age-old “but are containers as secure as VMs?” is getting easier all the time.
Make sure to read Jessie Frazelle’s blog post on Docker.com with more detail about all the security features you’ll find in Docker 1.10, including a demonstration of the user namespaces feature.
As a final plug, if you’re looking for a container service which is already using user namespaces by default, the IBM Bluemix container service has already enabled this feature in production offering the current phase 1 implemented isolation from host root
. If you haven’t had a chance, check out the Bluemix container service and let us know what you think; you can easily sign up for a free 30-day trial!
If you want to use Seccomp with Docker there’s an open source tool [1] that will auto-generate profiles for your apps based on what system calls your app needs.
[1] DockerSlim – http://dockersl.im
2018. Need that phase 2.