User Namespaces: Sharing the Docker UNIX socket
4/20/2016 Note: Please read my follow-up blog post on this topic based on the comments regarding open access to all Docker APIs via the methods discussed below.
A recent question I received asked for ideas on sharing the Docker UNIX socket when you have user namespaces enabled in the Docker daemon. Given sharing the Docker daemon’s UNIX socket is the recommended and preferred method for allowing in-container tools to interact with the Docker daemon, it’s an important question to try and answer.
Why is this a problem?
So, the UNIX socket created by the daemon, located by default at /var/run/docker.sock
, is owned by host root, with docker group ownership. On your host, adding a user to the docker group allows you to have read/write access to the socket for API communication with the daemon. If you use the usual path of mounting the daemon’s UNIX socket in your container (using -v /var/run/docker.sock:/var/run/docker.sock
) when user namespaces are enabled on the daemon, your container’s root uid (or any other container uid/gid) will have no access at all to the UNIX socket. This is because the entire ID space inside the container will not match host root or host group membership in the docker group. Since the access bits on the socket are rw-rw----
there is no easy solution without making the UNIX socket unprotected via user/group ownership on the host–a bad idea!
What can I do?
There are several potential solutions, but one interesting solution if you are open to having a proxy container running is to use a new feature about to arrive in Docker 1.11, allowing the use of --privileged
with a new parameter, --userns=host
, while user namespaces are enabled on the Docker daemon. [Thanks to Liron Levin of Twistlock for working on this feature and getting it through the review process!]
With this feature, I can run a privileged container alongside my unprivileged containers which need access to the Docker daemon UNIX socket. The privileged container can expose a TCP endpoint only for other containers to use (not portmapped to the host) [Note that this is still accessible to someone with rudimentary Linux skills–see the comments below from Brian Krebs], and use a simple tool like socat
to pass traffic from the TCP endpoint to the mounted UNIX socket.
Rather than explain it fully in this post, I put together a simple GitHub project which contains a Dockerfile
for building a small container with this capability. The README.md
explains how to run it and link to it from other unprivileged containers. The concept is quite simple, but this solution requires only a small change to the containers which need to use the Docker API: setting the DOCKER_HOST
environment variable to point to the proxy/privileged container.
I’m not going to have Docker 1.11 available; what other options do I have?
You could set up a TCP endpoint for your host-located Docker daemon that can be accessed from any container. The downside is that the Docker API is now accessible from any system which has a route to your host. You could protect it with firewall/iptables
rules, but containerizing the proxied TCP endpoint effectively handles this for you. If you want to use a TCP endpoint, make sure you consider using TLS certificate access to the endpoint, which I describe in another post here on my blog.
If you are more adventurous you might be interested in setting up a proxy UNIX socket on the host system which is owned by the remapped root that will be used inside containers. As long as you are sure you will be using root inside the container to access the “proxy” UNIX socket, then we can run a socat
process on the host, telling it the uid
, gid
, and mode
bits to set on the listening UNIX socket. This will pass traffic to the “real” Docker daemon UNIX socket, and of course requires root privileges on the host to do so.
To find the remapped root UID that will be used inside all containers, we simply look at the range assigned to the remapping user—the username provided to --userns-remap
on the Docker daemon invocation. This user will have an entry in /etc/subuid
; the first numeric entry will be the root UID. In my case I can do the following command and see that the dockremap
user (the user created if you provide default
to the --userns-remap
flag) has the following range assigned:
$ cat /etc/subuid dockremap:231072:65536
Now the command shown below will start socat
with a container root-owned socket listening and passing traffic to the real Docker daemon UNIX socket:
sudo socat UNIX-LISTEN:/var/run/docker-userns.sock,user=231072,group=231072,mode=0600,fork \ UNIX-CLIENT:/var/run/docker.sock
I can now mount the “userns” socket file into an unprivileged container and have full access to the Docker API:
$ docker run -ti -v /var/run/docker-userns.sock:/var/run/docker.sock dockerclient docker version Client: Version: 1.10.0-dev API version: 1.22 Go version: go1.5.2 Git commit: 8ed14c2 Built: Wed Dec 9 01:04:08 2015 OS/Arch: linux/amd64 Experimental: true Server: Version: 1.11.0-rc3 API version: 1.23 Go version: go1.5.3 Git commit: eabf97a Built: Fri Apr 1 22:26:46 2016 OS/Arch: linux/amd64
Success! Our socat
proxy on the host is allowing us access to the Docker API via a socket owned by the remapped root inside the container.
In conclusion, there are a few reasonable ways to handle restricted access to the Docker daemon UNIX socket when user namespaces are enabled for containers. I would be interested if anyone else has other ideas or solutions; send me feedback in the comments section below!
Do you have other questions about practical use of user namespaces in Docker? I’d be happy to do a continuing “user namespaces Q&A” series here on my blog. Post your questions in the comments or point me at your question on Twitter.
> I can run a privileged container alongside my unprivileged containers which need access to the Docker daemon UNIX socket.
Doesn’t this implicitly provide docker.sock access to all containers on the same docker network?
Access to docker.sock allows compromise of the host itself, by running privileged containers or containers with sensitive volume mounts.
> The privileged container can expose a TCP endpoint only for other containers to use
Note that container ports are available to processes on the host itself; not only could all other containers access this docker socket proxy, but all other local users as well (by sending traffic to the privileged container’s TCP endpoint). This effectively removes the “root user / root group only” access control applied by the file system.
These are excellent and reasonable points, and maybe I should have prefixed this entire blog post with the same caveat that the official Docker documentation has about protecting docker.sock appropriately no matter what method for providing access is used.
Specifically to your points, the first can be mitigated by not allowing privileged containers to be spawned (more on the “how” at the end of my comment). If only non-privileged containers can be spawned, then user namespaces will protect access to any filesystem or any exploit that requires real administrative privilege on the host. Specifically, I could use “-v /:/host” with “rm -fR” as the command, and nothing would happen as the remapped root in the container would have no access to those files.
To your second point, I meant a TCP listener in the container that is not exposed to the host (specifically, I don’t use -p PORT:PORT). This prevents local host users from sending traffic without their own iptables magic, which means they already have root on the host, which means they already have access to docker.sock anyway. It does mean other containers who “know” this endpoint now have Docker daemon access, so again, this requires understanding your environment and the controls that exist outside of what is covered in the blog post.
However, given these are important issues, and there are features appearing in Docker to start to mitigate this, I think it makes sense for me to write an addendum post, and hopefully even some demonstration code, to use the authorization plugin feature now in the engine to show how filtering criteria can be used in a plugin to restrict functionality via the API endpoint. A simple demonstration of this would be to expand the “socat”-style proxy forwarder to add a header/marker to the request that this is an “unprivileged” request. Then an appropriately-written AuthZ plugin can use this header to restrict the API access to a subset of the commands appropriate for the use case. In some cases, this desire to access the Docker socket is for read-only/information purposes, and therefore a very narrow “inspect+ps”-like API filter could be placed via the AuthZ plugin. I will hopefully be able to pull that together soon to make sure readers understand the need to strongly consider access to docker.sock, as without any filtering it is most definitely “root” on the host.
I look forward to your next post. This level of technical discussion is a pleasure.
Response in kind,
> protecting docker.sock appropriately no matter what method for providing access
Ah yes, TLS or an auth proxy on the socket would enable full protection. If that is in place, and you’re willing to restart the docker engine, you can enable TCP support directly via the -H flag to the docker daemon. That does require modifying the host and not just running a container, so I see the purity value of your approach.
Note that to initialize the AuthZ plugin, you are also required to restart the engine.
Regarding your comment,
> TCP listener in the container that is not exposed to the host
Container ports running on that host are always routable from the host, even if not published via -p or –expose. Example follows.
environment: docker run -d –privileged –volume /var/run/docker.sock:/var/run/docker.sock sh -c ‘nc -lk 8080 | socat – /var/run/docker.sock’
I’m a user on your hypothetical system, without root or docker access:
$ ps aux | grep root | grep nc
root 25464 0.0 0.0 9132 932 pts/12 S+ 21:32 0:00 nc -l 8080
$ cat /proc/25464/net/fib_trie | grep -B1 “/32 host LOCAL”
|– 172.17.0.2
/32 host LOCAL
$ nmap 172.0.0.2 | grep open
8080/tcp open http-proxy
$ nc 172.0.0.2 8080
GET /events HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
Server: Docker/1.10.3 (linux)
Date: Thu, 14 Apr 2016 04:35:44 GMT
Transfer-Encoding: chunked
14c
{“status”:”create”,”id”:”9e54c1299c745a458bc1b73f0ad3396788de2d0c5fcf6f1f737a51f158f2ea65″,”from”:”python”,”Type”:”container”,”Action”:”create”,”Actor”:{“ID”:”9e54c1299c745a458bc1b73f0ad3396788de2d0c5fcf6f1f737a51f158f2ea65″,”Attributes”:{“image”:”python”,”name”:”focused_euclid”}},”time”:1460608546,”timeNano”:1460608546411825413}
Access to docker.sock without root, sudo, or group access via your TCP endpoint confirmed.
Oh man, I stand corrected! :) So, I was forgetting that of course there is a default route via docker0 (in the standard bridge mode) to the container IP subnet(s), which is of course what the iptables rules route traffic via when -p or EXPOSE *is* used.
So, even without an exposed port someone with some rudimentary skills on Linux can potentially find the endpoint as you show. I will add that as a note to the blog as I definitely contradict that in the current text. Remediation is possible with either more advanced networking (overlay with more complex traffic rules between host and container network) or some brute force blocking of host traffic to a known “endpoint for Docker API”, but those are not going to be naturally available/achievable via the casual user.
Thanks for your response; I may not be extremely fast with the next iteration of this idea (the AuthZ example), but it will be hopefully sooner rather than later!