RMIT University
Browse

A Dynamic Resource Controller for Resolving Quality of Service Issues in Modern Streaming Processing Engines

conference contribution
posted on 2024-11-03, 13:53 authored by M.Reza HoseinyFarahabady, Javid Taheri, Albert Zomaya, Zahir TariZahir Tari
Devising an elastic resource allocation controller of data analytical applications in virtualized data-center has received a great attention recently, mainly due to the fact that even a slight performance improvement can translate to huge monetary savings in practical large-scale execution. Apache Flink is among modern streamed data processing run-times that can provide both low latency and high throughput computation in to execute processing pipelines over high-volume and high-velocity data-items under tight latency constraints. However, a yet to be answered challenge in a large-scale platform with tens of worker nodes is how to resolve the run-time violation in the quality of service (QoS) level in a multi-tenant data streaming platforms, particularly when the amount of workload generated by different users fluctuates. Studies showed that a static resource allocation algorithm (round-robin), which is used by default in Apache Flink, suffer from lack of responsiveness to sudden traffic surges happening unpredictably during the run-time. In this paper, we address the problem of resource management in a Flink platform for ensuring different QoS enforcement levels in a platform with shared computing resources. The proposed solution applies theoretical principals borrowed from close-loop control theory to design a CPU and memory adjustment mechanism with the primary goal to fulfill the different QoS levels requested by submitted applications while the resource interference is considered as the critical performance-limiting factor. The performance evaluation is carried out by comparing the proposed resource allocation mechanism with two static heuristics (round robin and class-based weighted fair queuing) in a 80-core cluster under multiple traffic patterns resembling sudden changes in the incoming workloads of low-priory streaming applications. The experimental results confirm the stability of the proposed controller to regulate the underlying platform resources to smoothly follow the target values (QoS violation rates). Particularly, the proposed solution can achieve higher efficiency compared to the other heuristics by reducing the response-time of high priority applications by 53% while maintaining the enforced QoS levels during the burst traffic periods.

Funding

A Unified Framework for Resource Management in Edge-Cloud Data Centres

Australian Research Council

Find out more...

History

Number

9306697

Start page

1

End page

8

Total pages

8

Outlet

IEEE 19th International Symposium on Network Computing and Applications, NCA 2020

Editors

A. Gkoulalas-Divanis, M. Marchetti, D.R. Avresky

Name of conference

IEEE 19th International Symposium on Network Computing and Applications, NCA 2020

Publisher

Institute of Electrical and Electronics Engineers

Place published

United States

Start date

2020-11-24

End date

2020-11-27

Language

English

Copyright

© 2020 IEEE.

Former Identifier

2006106222

Esploro creation date

2021-05-22

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC