RMIT University
Browse

A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

journal contribution
posted on 2024-11-02, 12:16 authored by Catherine LeighCatherine Leigh, Omar Alsibai, Rob Hyndman, Sevvandi Kandanaarachchi, Olivia King, James McGree, Catherine Neelamraju, Jennifer Strauss, Priyanga Talagala, Ryan Turner, Kerrie Mengersen, Erin Peterson
Monitoring the water quality of rivers is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values or trends. However, the data are confounded by anomalies caused by technical issues, for which the volume and velocity of data preclude manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data collected from rivers flowing into the Great Barrier Reef. After identifying end-user needs and defining anomalies, we ranked anomaly importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, incorporation of multiple water-quality variables as covariates reduced performance due to complex relationships among variables. Classifications of drift and periods of anomalously low or high variability were more often correct when we applied mitigation, which replaces anomalous measurements with forecasts for further forecasting, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies and were similarly less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, however, all feature-based methods produced low false positive rates and have the benefit of not requiring training or optimization. Rule-based methods successfully detected a subset of lower priority anomalies, specifically impossible values and missing observations. We therefore suggest that a combination of methods will provide optimal performance in terms of correct anomaly detection, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and anomaly detection

History

Journal

Science of the Total Environment

Volume

664

Start page

885

End page

898

Total pages

14

Publisher

Elsevier BV

Place published

Netherlands

Language

English

Copyright

© 2019 Elsevier B.V. All rights reserved.

Former Identifier

2006096990

Esploro creation date

2020-06-22

Fedora creation date

2020-04-20

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC