A Complex Task Scheduling Scheme for Big Data Platforms Based on Boolean Satisfiability Problem



Journal Title

Journal ISSN

Volume Title


Institute of Electrical and Electronics Engineers Inc.



In the big data processing systems, the amount of data is increasing. At the same time, the real-time requirement of data processing and analysis is higher and higher. Therefore, it is required that the big data processing and analysis systems have better performance. Job scheduling plays an important role in improving the overall system performance in big data processing frameworks. However, job scheduling is a difficult NP-hard problem. There are many factors that need to be considered for job scheduling. For example, jobs have dependencies among stages, therefore we should not allocate resources to tasks that are not ready. Sometimes, there are constraints between jobs. These are a challenge to the scheduling performance of big data processing and analysis systems. In this paper, we try to solve the problem by translating it into Boolean Satisfiability Problem (SAT) which is an exact method. SAT-based scheduling algorithm is not a new approach, but in the past it mainly used to solve the static scheduling problems. For dynamic scheduling system, it requires all problems to be solved within a limited time, which is a challenge for SAT encoding. In this paper, we refer to the previous SAT solution to the Job Shop Scheduling Problem, and adjust the algorithm to meet the requirements of the big data processing system. At the same time, we optimized the coding approach and reduced the number of clauses. Thus, the efficiency of the problem solved is improved to meet the performance requirements. The experimental results show that the number of clauses is reduced by more than 30%, and the processing time of the SAT solver to get the solution can be reduced by more than 50%. To demonstrate its effectiveness, we have also implemented our new job scheduler in Apache Hadoop YARN, and validated its effectiveness.



Big data--Social aspects, Computational complexity--Data processing, Computer software, Data integration (Computer science), Production scheduling, Computer multitasking, Combinatorial optimization, Problem solving, Electronic data processing, Job shop scheduling problems, Performance requirements, Scheduling performance, Task-scheduling, Big data, Apache Hadoop YARN



©2018 IEEE