posted on 2024-10-31, 20:11authored byReza Hoseiny Farahabady, Hamid Dehghani Samani, Yidan Wang, Albert Zomaya, Zahir TariZahir Tari
Apache Storm has recently emerged as an attractive fault-tolerant open-source distributed data processing platform that has been chosen by many industry leaders to develop realtime applications for processing a huge amount of data in a scalable manner. A key aspect to achieve the best performance in this system lies on the design of an efficient scheduler for component execution, called topology, on the available computing resources. In response to workload fluctuations, we propose an advanced scheduler for Apache Storm that provides improved performance with highly dynamic behavior. While enforcing the required Quality-of-Service (QoS) of individual data streams, the controller allocates computing resources based on decisions that consider the future states of non-controllable disturbance parameters, e.g. arriving rate of tuples or resource utilization in each worker node. The performance evaluation is carried out by comparing the proposed solution with two well-known alternatives, namely the Storm's default scheduler and the best effort approach (i.e. the heuristic that is based on the first-fit decreasing approximation algorithm). Experimental results clearly show that the proposed controller increases the overall resource utilization by 31% on average compared to the two others solutions, without significant negative impact on the QoS enforcement level.
Funding
Energy-Efficient Computing: Expanding the Role of Scheduling in Cloud Data Centres