Version 2 2024-08-07, 02:35Version 2 2024-08-07, 02:35
Version 1 2024-08-07, 02:33Version 1 2024-08-07, 02:33
thesis
posted on 2024-08-07, 02:35authored byFutoon Abu Shaqra
Rapid technological advances continuously generate vast amounts of real-world temporal data, playing a vital role in IoT and smart city applications. Despite the promising capabilities of time series analysis, it faces significant challenges when applied to complex real-world systems like healthcare or environmental monitoring. These systems produce irregular, high-dimensional, heterogeneous, and non-stationary data, posing difficulties for traditional machine learning models. Traditional models struggle to capture intricate relationships of this data without extensive preprocessing that consumes time and resources and distorts temporal dependencies, ultimately affecting prediction accuracy and timely access to information. Our research tackles this challenge by developing efficient modelling processes to seamlessly handle irregular, sporadic, highly dimensional, and heterogeneous time series data. Our goal is to advance temporal learning for modelling multi-source inconsistent time series in real-world applications, eliminating unnecessary preprocessing steps.
This thesis makes significant contributions towards effectively learning and modelling complex data. The primary objective is to bridge the gap between time series models and real-world big data, characterized by high levels of variety, volume, and velocity. These attributes present various challenges in data modelling, particularly since real-world data is typically collected in irregular environments from diverse sources. Consequently, this work focuses on three primary areas to enhance the modelling process for real-world datasets. First, we focus on modelling heterogeneous multi-source time series, introducing Parallelised Irregularity Encoders for Forecasting with Heterogeneous Time Series (PIETS) and Parallelised Irregularity Encoders for Multi-step Forecasting with Heterogeneous Time Series (PIETS+). These novel frameworks are specifically designed to tackle the complexities of multi-source time series analysis, which pose a significant challenge in real-world applications. Traditionally, the fusion of multi-source time series has been approached either through ensemble learning models that overlook temporal patterns and correlations within features or by defining a fixed-size window to select specific parts of the datasets. Our proposed models, PIETS and PIETS+, demonstrate enhanced predictive capabilities for both one and multi-step forecasting while effectively modelling heterogeneous time series. The proposed work addresses key challenges of multi-source time series data, including (1) heterogeneity and irregularity, (2) information inconsistency, and (3) highly variable dimensions. By leveraging information from diverse data sources, our models not only outperform in capturing the complexity of temporal data but also accelerate the convergence of the training process.
Next, we delve deeper into the challenge of irregularity, focusing on highly sporadic time series with consecutive unobserved values. Irregular time series, characterised by undefined intervals between observations leading to sporadic sequences, have recently been addressed using neural ordinary differential equations (ODEs). In our research, we investigate the performance of ODE models on time series data with varying levels of sparsity. Following this examination, we introduce SeqLink, a robust neural-ODE architecture for modelling partially observed time series. SeqLink is an innovative neural architecture designed to enhance the robustness of sequence representation. Unlike traditional approaches that rely solely on the hidden state generated from the last observed value, SeqLink leverages ODE latent representations derived from multiple data points, enabling it to generate robust data representations regardless of sequence length or data sparsity. Our model demonstrates its ability to maintain robust continuous representations even over long timescales.
Finally, we expand our contribution to focus on irregular streaming time series and continual learning. As real-world applications involve a continuous flow of data, utilising this wealth of information in real-time is crucial. Existing studies on continuous learning often propose models that necessitate buffering of lengthy sequences, potentially impeding the responsiveness of the inference system. Furthermore, these models are typically tailored for regularly sampled sequences, an assumption that is often unrealistic in real-world scenarios. To address these challenges, we introduce ODEStream: a Buffer-Free Online Learning Framework with an ODE-based Adaptor for Streaming Time Series Forecasting. This novel approach utilises the power of neural ODEs. ODEStream exhibits the capability to adapt to data irregularity and concept drift issues without reliance on complex frameworks. Concept drift refers to the phenomenon where the statistical properties of the target variable, which the model aims to predict, change over time due to various factors, such as changes in the underlying data distribution or evolving relationships between input and output variables. Our method mitigates performance degradation over time by learning how the dynamics of the sequence change, providing a streamlined solution for real-time analysis.
The proposed methods were evaluated on various important tasks and real-world benchmark datasets and have been compared against recent state-of-the-art studies in the field.