RMIT University
Browse

Augmented reality, deep learning and vision-language query system for construction worker safety

journal contribution
posted on 2024-11-03, 11:22 authored by Haosen Chen, Lei HouLei Hou, Shaoze Wu, Guomin ZhangGuomin Zhang, Yang Zhou, Sungkon Moon, Muhammed BhuiyanMuhammed Bhuiyan
Low situational awareness contributes to safety incidents in construction. Existing Deep Learning (DL)-based applications lack the capability to provide context-specific and interactive feedback that is essential for workers to fully understand their surrounding environments. This paper proposes the Visual Construction Safety Query (VCSQ) system. The system encompasses real-time Image Captioning (IC), safety-centric Visual Question Answering (VQA), and keyword-based Image-Text Retrieval (ITR), integrated with head-mounted Augmented Reality (AR) devices. System validation includes benchmarks and real-world images. The ITR module posted high recall rates of 0.801 and 0.835 for Recall@5 and @10. The VQA module achieved an 89.7% accuracy rate, and the IC module had a SPICE score of 0.449. Feasibility tests and surveys confirmed the system's practical advantages in different construction scenarios. This study establishes an integration roadmap adaptable to future advancements in interactive DL and immersive AR.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1016/j.autcon.2023.105158
  2. 2.
    ISSN - Is published in 09265805

Journal

Automation in Construction

Volume

157

Number

105158

Start page

1

End page

15

Total pages

15

Publisher

Elsevier BV

Place published

Netherlands

Language

English

Copyright

© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Former Identifier

2006126706

Esploro creation date

2023-12-09

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC