RMIT University
Browse

ZSTAD: Zero-Shot Temporal Activity Detection

conference contribution
posted on 2024-11-03, 14:36 authored by Lingling Zhang, Xiaojun ChangXiaojun Chang, Jun Liu, Minnan Luo, Sen Wang, Zongyuan Ge, Alexander Hauptmann
An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos. Currently, the most effective methods of temporal activity detection are based on deep learning, and they typically perform very well with large scale annotated videos for training. However, these methods are limited in real applications due to the unavailable videos about certain activity classes and the time-consuming data annotation. To solve this challenging problem, we propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected. We design an end-to-end deep network based on R-C3D as the architecture for this solution. The proposed network is optimized with an innovative loss function that considers the embeddings of activity labels and their superclasses while learning the common semantics of seen and unseen activities. Experiments on both the THUMOS'14 and the Charades datasets show promising performance in terms of detecting unseen activities.

Funding

Towards data-efficient future action prediction in the wild

Australian Research Council

Find out more...

History

Start page

876

End page

885

Total pages

10

Outlet

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020)

Name of conference

CVPR 2020

Publisher

IEEE

Place published

United States

Start date

2020-06-14

End date

2020-06-19

Language

English

Copyright

© 2020 IEEE.

Former Identifier

2006109335

Esploro creation date

2021-08-28

Usage metrics

    Scholarly Works

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC