RMIT University
Browse

Mining Inter-Video Proposal Relations for Video Object Detection

conference contribution
posted on 2024-11-03, 14:47 authored by Mingfei Han, Yali Wang, Xiaojun ChangXiaojun Chang, Yu Qiao
Recent studies have shown that, context aggregating information from proposals in different frames can clearly enhance the performance of video object detection. However, these approaches mainly exploit the intra-proposal relation within single video, while ignoring the intra-proposal relation among different videos, which can provide important discriminative cues for recognizing confusing objects. To address the limitation, we propose a novel Inter-Video Proposal Relation module. Based on a concise multi-level triplet selection scheme, this module can learn effective object representations via modeling relations of hard proposals among different videos. Moreover, we design a Hierarchical Video Relation Network (HVR-Net), by integrating intra-video and inter-video proposal relations in a hierarchical fashion. This design can progressively exploit both intra and inter contexts to boost video object detection. We examine our method on the large-scale video object detection benchmark, i.e., ImageNet VID, where HVR-Net achieves the SOTA results. Codes and models are available at https://github.com/youthHan/HVRNet.

History

Start page

431

End page

446

Total pages

16

Outlet

Proceedings of the 16th European Computer Vision Conference (ECCV 2020)

Editors

Andrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm

Name of conference

ECCV 2020, LNCS 12366

Publisher

Springer

Place published

Switzerland

Start date

2020-08-23

End date

2020-08-28

Language

English

Copyright

© Springer Nature Switzerland AG 2020

Former Identifier

2006109342

Esploro creation date

2021-08-28

Usage metrics

    Scholarly Works

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC