posted on 2024-11-02, 17:54authored byCaixia Yan, Qinghua Zheng, Xiaojun ChangXiaojun Chang, Minnan Luo, Chung-Hsing Yeh, Alex Hauptmann
Most existing object detection models are restricted to detecting objects from previously seen categories, an approach that tends to become infeasible for rare or novel concepts. Accordingly, in this paper, we explore object detection in the context of zero-shot learning, i.e., Zero-Shot Object Detection (ZSD), to concurrently recognize and localize objects from novel concepts. Existing ZSD algorithms are typically based on a strict mapping-transfer strategy that suffers from a significant visual-semantic gap. To bridge the gap, we propose a novel Semantics-Preserving Graph Propagation model for ZSD based on Graph Convolutional Networks (GCN). More specifically, we develop a graph construction module to flexibly build category graphs by leveraging diverse correlations between category nodes; this is followed by two semantics-preserving graph propagation modules that enhance both category and region representations. Benefiting from the multi-step graph propagation process, both the semantic description and structural knowledge exhibited in prior category graphs can be effectively leveraged to boost the generalization capability of the learned projection function. Experiments on existing seen/unseen splits of three popular object detection datasets demonstrate that the proposed approach performs favorably against state-of-the-art ZSD methods.