posted on 2025-07-18, 04:30authored byVeronika Stephanie
<p dir="ltr">The rapid growth of the Internet of Things (IoT) and Artificial Intelligence (AI) has transformed industries by enabling efficient data analytics and intelligent decision-making. Among AI techniques, deep learning models often require large datasets for effective training. However, this poses challenges in privacy-sensitive domains such as healthcare and government due to strict data-sharing regulations. Limited access to high-quality data reduces the efficacy and generalization of the AI model, making collaborative learning techniques such as Federated Learning (FL) and Split Learning (SL) essential. These approaches enable multiple entities to train models without sharing raw data, preserving privacy while improving AI model performance. As a result, collaborative learning has gained popularity across industries, but significant challenges remain. </p><p dir="ltr">To begin with, privacy concerns persist even with the use of collaborative techniques. Model parameters, although they replace direct data sharing, can still reveal insights into the underlying data, posing risks of data reconstruction or information extraction by adversaries. To mitigate these risks, we propose a differential privacy-based, privacy-preserving collaborative learning model for resource-constrained IoT devices. The proposed method involves perturbing the locally trained model to prevent adversaries from inferring the data used during training.</p><p dir="ltr">Moreover, ensuring data integrity during exchange processes in collaborative learning is crucial. Data, including model parameters, are vulnerable to alterations during transmission over networks, which could compromise the accuracy and reliability of collaborative learning outcomes. To address this, we developed a blockchain-based, privacy-preserving IoT data analysis model for heterogeneous collaborative learning. In this work, blockchain technology is introduced to mitigate data tampering during the exchange of trained models. Additionally, a model cross-validation process among participants is implemented to ensure the equitable contribution of each participant's model. Blockchain technology is also used to protect the integrity of cross-validation results.</p><p dir="ltr">In addition, the presence of resource heterogeneity among edge devices in IoT systems introduces complexities in collaborative learning. Variations in computing resources and network bandwidth between devices may cause delays or even prevent model convergence, thereby affecting overall learning efficiency. In response to this challenge, we propose weight-based asynchronous model aggregation techniques for collaborative learning. These techniques enable more efficient learning by ensuring that stalled devices do not hinder fast model convergence, leading to improved model performance.</p><p dir="ltr">Finally, data heterogeneity poses significant challenges to collaborative learning frameworks, particularly in IoT data analytics scenarios. Imbalances in data distribution, where certain entities possess disproportionate amounts of specific data types, can skew model predictions and hinder generalization capabilities. To address this challenge, we introduce clustering-based collaborative learning techniques tailored for non-independent and identically distributed (non-IID) data. We redefine the objective function in collaborative learning for non-IID scenarios, focusing on minimizing data disparity among participants. By clustering clients according to data statistics, we aim to optimize this new objective function. </p><p dir="ltr">All in all, this thesis examines collaborative learning through four key challenges: privacy preservation, data integrity, resource heterogeneity, and data imbalance. To address these challenges, we propose algorithms leveraging privacy preservation techniques, blockchain-based validation, asynchronous AI model aggregation, and clustering techniques. We believe that this research represents a significant step towards building a more secure, robust, and efficient collaborative learning framework in decentralized and resource-constrained settings.</p>