In a restricted setup YARN executes task of computation frameworks like Spark in a secured Linux or Window Container. The task are being executed in the local context of the user submitting the application and are not being executed in the local context of the yarn or some other system user. With this come certain constraints for the system setup.
How is YARN actually able to impersonate the calling user on the local OS level? This posts aims to give some background information to help answer such questions about secure containers. Only Linux systems are considered here, no Windows.
YARN uses the LinuxContainerExecutor.java class for secure container execution on Linux systems. The class uses a native executable, container-executor.sh to launch a container as the submitting user of the program.
During container launch the executor broadly does the following:
- Create the container work dir (yarn.nodemanager.local-dirs) and log dir (yarn.nodemanager.log-dirs) accessible by the child process (calling user)
- Copy the script files from the NM to the work directory
- Setup the environment
- Executes with execlp to switch from the current image of the NM election to the image of the environment
The execlp() function replaces the current process image with a new process image specified by file. The new image is constructed from a regular, executable file called the new process image file. No return is made because the calling process image is replaced by the new process image.
For the user container-execute.sh ensures that:
- It is NOT a root user
- The UID is above the minimum (typically above 1000)
- Not a user on the banned user list
So root and other “privileged” users are not able to create container resources through YARN. Privileged users in this context means users with an UID above 1000, which excludes users like http and the like. Further banned users are also not allowed to execute containers. Typical default banned users are hdfs ,yarn ,mapped ,bin
For YARN to be able to create and execute the image context with privileges of the submitting user, it leverages the SETUID capabilities of Linux. In order to work the executable container-executor needs to be owned by root and the group Hadoop. The permissions of the file need to be set to —Sr-s—, which is the numerical value of 6050.
Through the setuid the unprivileged yarn user is able to run the executable with root privileges.
Important to note is, that if the file is stored on a mount point that has setuid disabled (nosuid), the NM will not be able to obtain root privileges with that script. A typical error you would obtain is:
Error setting supplementary groups for user ambari-qa: Operation not permitted
If the filesystem is mounted or set in /etc/fstab to nosuid the operation suid is being blocked.
See here for details around the fstab entries: https://en.wikipedia.org/wiki/Fstab
The complete error message:
Diagnostics: Application application_1485879664852_0006 initialization failed (exitCode=255) with output: main : command provided 0 main : run as user is ambari-qa main : requested yarn user is ambari-qa Error setting supplementary groups for user ambari-qa: Operation not permitted
Another constraint for secure YARN containers that comes from this behavior is that the submitting user needs to be local to the machine running the NodeManager. There is no need for local home directories for this users, they just simply need a uid. Executable and logs of the container are stored in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs directories.