User Namespaces: 2017 Status Update and Additional Resources
Maybe you ended up here by following the link from the Docker Captain’s video series entry, “User Namespaces, Part 1“. Or maybe you just happened across it as you were on my blog. Either way, this post will update you on the current status of user namespace support in Docker as well as provide links to additional resources that are available to learn more.
Not much has changed over the past year since Docker 1.10 was released with user namespaces support promoted out of experimental. Just as we called it the “phase 1” implementation at the time, very little has happened in the engine itself to lead towards a “phase 2” because of reliance on Linux kernel upstream work which is still underway. As a quick reminder, in the video as well as in past blog posts on the topic, “phase 2” is focused on the requested capability to provide a unique user namespace mapping per container rather than per daemon instance as it is implemented today.
However, even with the delay on making progress towards “phase 2”, there have been a few nice improvements and reduction in restrictions that are worth mentioning since that initial support in Docker 1.10:
- As long as you have a kernel newer than 3.19, the
--read-onlyflag is now compatible with user namespaces. The client UI restriction has been removed from the code; however, if your kernel still prevents a remount with changed mount flags (required for this feature) you will get an error when using
--read-onlywith user namespaces enabled on your daemon.
- The Docker client UI will no longer prevent sharing namespaces with other containers when user namespaces are enabled. This means that you can share the network or IPC namespace with other containers using the flags already provided in the Docker client and API. A rewrite of the namespace joining code in
runcwas required to make this possible. You still will not be able to use host namespace capabilities like
--pid=hostbecause the host and container are not in the same user namespace.
- The Docker daemon itself is now able to be run inside a user namespace. Thanks to Serge Hallyn for doing much of the work to make this possible.
- Privileged containers are now available even when the daemon is running with user namespaces enabled. As you can imagine, the privileged containers will not be user namespaced processes. To make sure this is understood, you must provide the flag
--userns=hostto clearly delineate that the container will be running in the daemon process user namespace (which, unless you are using the feature from the last bullet with be the host system “default” user namespace that is not remapped at all). Another caveat is that the filesystem of the container will already have its files remapped to the user namespace ranges being used by the daemon. Changes (new files,
chownoperations, etc.) will be “zero-based” and if that is then committed (e.g.
docker commit) there will be a mix of remapped and non-remapped ownership in the resultant container filesystem. The same would be true for any mounted volumes as well. This is a known issue and is only truly solved with the work happening upstream in the Linux kernel for “phase 2.” Thanks to Liron Levin from Twistlock for providing this PR and getting it through the process.
- In addition to these more significant changes, a lot of bug fixes went in to the past few releases to clean up corner cases with user namespaces and various graphdrivers or other use cases. We also added the string “userns” to the security options section of
The rest of the restrictions on a user-namespaced process are detailed in the documentation and remain in effect at this time. Most if not all of them are related to known Linux kernel restrictions on user namespaces, so it is unlikely that work can happen in the Docker engine (or lower layers) to effectively remove them at this time.
I’ve tried to collect useful resources that exist on the topic or that provide further details on current status of ongoing work. Feel free to comment below with any other resources you think might be useful to add and I can update the post with additional links.
- My original blog post on the topic from October 2016 when user namespace support went into experimental around the Docker 1.9 release. Some design changes were made by the time Docker 1.10 released the capability outside of experimental, but for better or worse it is still the most read blog post on my site!
- The updated blog post from February 2016 with corrections and changes to the functionality when user namespaces graduated from experimental and was released officially in Docker 1.10.
- The official Docker engine documentation on user namespace support.
- The Linux man page on user namespaces. This man page has important information on Linux kernel restrictions around the use of user namespaces. Related man page: the subordinate ID range system, broken into pages for /etc/subuid and /etc/subgid.
- The current Linux kernel discussion on a filesystem shifting driver (shiftfs) for using in concern with user namespaces to provide a shifted view of a underlying filesystem. This is required for “phase 2” support as we need a way to cache all container filesystem layers under normal ownership ranges, but allow there to be “views” on these filesystems via different ID ranges (the ID range of a user namespaced process running in a container).
- An additional blog post from me on the complexity of access to the Docker daemon UNIX socket from a user namespaced container process. This was the second of two posts on the topic after getting some valuable feedback on the initial approach in this post.
Meet Me At DockerCon!
Do you want to discuss user namespaces (or any other container engine topic really) in more detail? I’d love to meet up at DockerCon in Austin the week of April 17th. If you haven’t registered yet, please feel free to use my Docker Captain code “CaptainPhil” to receive 10% off your registration cost. Make sure to register soon before DockerCon sells out!