本文作者Yang Huahui是hulu北京大数据平台研发工程师, Voidbox的主要开发者, 目前专注于Docker和YARN的相关技术. 他加入hulu仅一年, 如此快速的成长令人刮目相看.
1. Voidbox Motivation
YARN is the distributed resource management system in Hadoop 2.0, which is able to schedule cluster resources for diverse high-level applications such as MapReduce, Spark. However, nowadays, all existing framework on top of YARN are designed with assumption of specific system environment. How to support user applications with arbitrary complex environment dependencies is still an open question. Docker gives the answer.
Docker is a very popular container virtualization technology. It provides a way to run almost any application isolated in a container. Docker is an open platform for developing, shipping, and running applications. Docker automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.
In order to integrate the unique advantages of Docker and YARN, the Hulu engineering team developed Voidbox. Voidbox enables any application encapsulated in docker image running on YARN cluster along with MapReduce and Spark. Voidbox brings the following benefits:
Ease creating distributed application
Voidbox handles most common issues in distributed computation system, say it, cluster discovery, elastic resource allocation, task coordination, disaster recovery. With its well-designed interface, it’s easy to implement a distributed application.
Without Voidbox, we need to create and maintain dedicated VM for application with complex environment even though the VM image is huge and not easy to deploy. With Voidbox, we could easily get resource allocated and make app run right the time we need it. Additional maintenance work is eliminated.
Improve cluster efficiency
As we could deploy Spark/MR and all kinds of Voidbox applications from different department together, we could maximize cluster usage.
Thus, YARN as a big data operating platform has been further consolidated and enhanced.
Voidbox supports Docker container-based DAG(Directed Acyclic Graph) tasks in execution. Moreover, Voidbox provides several ways to submit applications considering demands of the production environment and the debugging environment. In addition, Voidbox can cooperate with Jenkins, GitLab and private Docker Registry to set up a set of developing, testing, automatic release process.
2.1 YARN Architecture Overview
YARN enables multiple applications to share resources dynamically in a cluster. Here is the architecture of applications running in YARN cluster:
Figure 1. YARN Architecture
As shown in figure 1, a client submits a job to Resource Manager. The Resource Manager performs its scheduling function according to the resource requirements of the application. Application Master is responsible for the application tasks scheduling and execution of an application’s lifecycle.
Functionality of each modules:
Resource Manager: Responsible for resource management and scheduling in cluster.
NodeManager: Running on the compute nodes in cluster, taking care of task execution in the individual machine, collecting informations and keeping heartbeat with Resource Manager.
Application Master: Takes care of requesting resources from YARN, then allocates resources to run tasks in Container.
Container: Container is an abstract notion which incorporates elements such as memory, cpu, disk, network etc.
HDFS: Distributed file system in YARN cluster.
2.2 Voidbox Architecture Design
In Voidbox architecture, YARN is responsible for the cluster’s resource management. Docker acts as the task execution engine above of the operating system, cooperating with Docker Registry. Voidbox helps to translate user programming code into Docker container-based DAG tasks, apply for resources according to requirements and deal with DAG in execution.
Figure 2. Voidbox Architecture
As shown in figure 2, each box stands for one machine with several modules running inside. To make the architecture more clearly, we divide them into three parts, and functionality of Voidbox modules and Docker modules:
Voidbox Client: The client program. Through Voidbox Client, users can submit a Voidbox application, stop it, and so on. By the way, Voidbox application contains several Docker jobs and a Docker job contains one or more Docker tasks.
Voidbox Master: Actually, it’s an application master in YARN, and takes care of requesting resources from YARN, then allocates resources to Docker tasks.
Voidbox Driver: Responsible for task scheduling of a single Voidbox application. Voidbox supports Docker container-based DAG task scheduling and between tasks we can insert some other codes. So Voidbox Driver should handle the order scheduling of DAG task dependencies and execute the user’s code.
Voidbox Proxy: The bridge between YARN and Docker engine, responsible for transiting commands from YARN to Docker engine, such as start or kill Docker container, etc.
State Server: Maintaining the informations of Docker engine’s health status, providing the list of machines which can run Docker container. So Voidbox Master can apply for resources more efficiently.
Docker Registry: Docker image storage, acting as an internal version control tool of Docker image.
Docker Engine: Docker container execution engine, obtaining specified Docker image from Docker Registry and launching Docker container.
Jenkins: Cooperating with GitLab, when application codes update, Jenkins will take care of automated testing, packaging, generating the Docker image and uploading to Docker Registry, to complete the application automatically release process.
2.3 Running Mode
Voidbox provides two application running modes: yarn-cluster mode and yarn-client mode.
In yarn-cluster mode, the control component and resource management component are running in the YARN cluster. After we submit the Voidbox application, Voidbox Client can quit at any time without affecting the running time of application. It’s for the production environment.
In yarn-client mode, the control component is running in Voidbox Client, and other components are in the cluster. Users can see much more detailed logs about the application’s status. When Voidbox Client quits, the application in cluster will exit too. So it’s more convenient for debugging.
Here we briefly introduce the implementation architecture of the two modes:
Figure 3. yarn-cluster mode
As shown in figure 3, Voidbox Master and Voidbox Driver are both running in the cluster. Voidbox Driver is responsible for controlling the logic and Voidbox Master takes care of application resource management.
Figure 4. yarn-client mode
As shown in figure 4, Voidbox Master is running in the cluster, and Voidbox Driver is running in Voidbox Client. Users can submit Voidbox application in IDE for debugging.
2.4 Running Procedure
Here are the procedures of submitting a Voidbox application and its lifecycle:
Users write a Voidbox application by Voidbox SDK and generate a java archive, then submit it to the YARN cluster by Voidbox Client;
After receiving Voidbox application, Resource Manager will allocate resources for Voidbox Master, then launch it.
Voidbox Master starts Voidbox Driver, the latter will decompose Voidbox application into several Docker jobs(a job contains one or more Docker tasks). Voidbox Driver calls Voidbox Master interface to launch the Docker tasks in compute nodes.
Voidbox Master requests resources from Resource Manager, and Resource Manager allocates some YARN containers according to the YARN cluster status. Voidbox Master launches Voidbox Proxy in YARN container, and the latter is responsible for communication with Docker engine to start the Docker container.
User’s Docker task is running in Docker container, and the log output to a local file. User can see real-time application logs through YARN Web Portal.
After all Docker tasks are done, the logs will be aggregate to HDFS, so user still can get the application logs by history server.
2.5 Docker integrating with YARN in resource management
YARN acts as a uniform resource manager in the cluster, and is responsible for resource management on all machines. Docker as a container engine also has the function of resource management. So how to integrate their resource management function is particularly important.
In YARN, the user task can only run in the YARN container, while Docker container can only be handled by Docker engine. This case would get out of the management of YARN and damage the unified management and scheduling principle of YARN, which could produce resource leaks risk issue. In order to enable YARN to manage and schedule Docker container, we need to build a proxy layer between YARN and Docker engine. This is why Docker Proxy is introduced. Through Voidbox Proxy, YARN can manage the container lifecycle including start, stop, etc.
In order to understand Voidbox Proxy more clearly, we take stopping Voidbox application as an example. When a user needs to kill Voidbox application, YARN will recycle all the resources of the application. At this point, YARN will send a kill signal to the related machines. The corresponding Voidbox Proxy will catch the kill signal, then stop Docker container in Docker engine to do the resource recycling. So with the help of Voidbox Proxy, it can not only stop YARN container, but also stop the Docker container to avoid resources leaks issue(This is the problem existing in open source version, see YARN-1964).
3. Fault Tolerance
Although Docker has some stable releases, the enterprise production environment has a variety versions of operating system or kernel, so it brings unstable factors. We consider multiple levels in Voidbox fault-tolerant design to ensure Voidbox’s high availability.
Voidbox Master fault tolerance
If Resource Manager finds Voidbox Master crashes, it will notify NodeManager to recycle all the YARN containers belonging to this Voidbox application, then restart Voidbox Master.
Voidbox Proxy fault tolerance
If Voidbox Master finds Voidbox Proxy crashes, it will recycle Docker containers on behalf of Voidbox Proxy.
Docker container fault tolerance
Each Voidbox application can configure the maximum retry times on failure, when the Docker container crashes, Voidbox Master will do some work according to the exit code of Docker container.
4. Programming model
4.1 DAG Programming model
Voidbox Provides Docker container-based DAG programming model. A sample would look similar to this:
Figure 5. Docker container-based DAG programming model
As shown in figure 5, there are four jobs in this Voidbox application, and each job can configure its requirements of CPU, Memory, Docker image, parallelism and so on. Job3 will start when job1 and job2 both complete. Job1, job2 and job3 make a stage, so user can insert their codes after this stage is done, and finally start running job4.
4.2 Shell mode to submit one task
In most cases, we would like to run a single Docker container-based task without programming. So Voidbox supports shell mode to describe and submit the Docker container-based task, actually it’s a implementation based on DAG programming mode.The example usage of Voidbox in shell mode:
-docker_image centos \
-shell_command “echo Hello Voidbox” \
-container_memory 1000 \
The shell script above will submit a task to run “echo Hello Voidbox” in a docker image named ‘centos’, and the resource requirement is 1000Mb memory, 2 cpu virtual cores.
5. Voidbox in Action
At present we can run Docker, MapReduce, Spark and other applications in YARN cluster. There has been lots of short tasks using Voidbox within HULU.
Automation testing process
Cooperating with Jenkins, GitLab and private Docker registry, when the application codes update, Jenkins will complete automatic test, package program, regenerate Docker image and push it to the private Docker Registry. It’s a process of development, testing and automatically release.
Complex tasks in parallel
Test Framework is used to do some testings to detect the availability of some components. The project is implemented by Ruby/Java and has complex dependencies. So we maintain two layers of Docker image, the first layer is the system software as a base image, and the second layer is the business level. We publish a test framework Docker image and use some timing scheduling software to start Voidbox application regularly. Thanks to Voidbox, we solve the issues such as the complex dependencies and the multitasking parallelism.
Facematch(link:http://tech.hulu.com/blog/2014/05/03/face-match-system-overview/) is a video analysis application. It’s implemented by C and has lots of graphics libraries. That can be optimized by Voidbox: first of all we need to package all face match program into a Docker image, then write Voidbox application to handle the multiple videos. Voidbox solves the complex machine environment and the parallelism control problem.
Building complex workflow
Some tasks have a dependent with each other, such as it needs to load user behaviors first, then do the analysis of user behaviors. These two steps have successively dependencies. We use Voidbox container-based programming model to handle this case easily.
6. Different from DockerContainerExecutor in YARN 2.6.0
DockerContainerExecutor(link:https://issues.apache.org/jira/browse/YARN-1964) is released in YARN 2.6.0 and it’s alpha version. Not mature enough, and it is only an encapsulation layer above the default executor.
DockerContainerExecutor is difficult to coexist with other ContainerExecutor in one YARN cluster.
DAG programming model
Configurable container level of fault tolerance
A variety of running modes, considering development environment and production environment
Share YARN cluster resources with other Hadoop job
Graphical log view tool
7. Future work
Support more versions of YARN
Voidbox would like to support more versions in the future besides YARN 2.6.0.
Voidbox Master fault tolerance, persistent metadata to reduce the cost in case of retry
Currently, if a Voidbox Master crashes, YARN will recycle resources belonging to this Voidbox application and restart Voidbox Master to do some tasks from the very beginning. It’s not necessary to impact tasks which are already done or running. We might keep some metadatas in the State Server to reduce the cost in case of Voidbox Master on-failure.
Voidbox Master as a permanent service
Voidbox will support long running Voidbox Master to receive streaming tasks.
Support long service
Voidbox will support long running service if Voidbox Master’s downtime doesn’t influence running task.