RMIT University
Browse

Annotation efficient cross-modal retrieval with adversarial attentive alignment

conference contribution
posted on 2024-11-03, 14:46 authored by Po-Yao Huang, Guoliang Kang, Wenhe Liu, Xiaojun ChangXiaojun Chang, Alexander Hauptmann
Visual-semantic embeddings are central to many multimedia applications such as cross-modal retrieval between visual data and natural language descriptions. Conventionally, learning a joint embedding space relies on large parallel multimodal corpora. Since massive human annotation is expensive to obtain, there is a strong motivation in developing versatile algorithms to learn from large corpora with fewer annotations. In this paper, we propose a novel framework to leverage automatically extracted regional semantics from un-annotated images as additional weak supervision to learn visual-semantic embeddings. The proposed model employs adversarial attentive alignments to close the inherent heterogeneous gaps between annotated and un-annotated portions of visual and textual domains. To demonstrate its superiority, we conduct extensive experiments on sparsely annotated multimodal corpora. The experimental results show that the proposed model outperforms state-of-the-art visual-semantic embedding models by a significant margin for cross-modal retrieval tasks on the sparse Flickr30k and MS-COCO datasets. It is also worth noting that, despite using only 20% of the annotations, the proposed model can achieve competitive performance (Recall at 10 > 80.0% for 1K and > 70.0% for 5K text-to-image retrieval) compared to the benchmarks trained with the complete annotations.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/3343031.3350894
  2. 2.
    ISBN - Is published in 9781450368896 (urn:isbn:9781450368896)

Start page

1758

End page

1767

Total pages

10

Outlet

Proceedings of the 27th ACM International Conference on Multimedia (MM 2019)

Name of conference

MM 2019: Session 4A: Cross-Modal Retrieval

Publisher

Association for Computing Machinery

Place published

United States

Start date

2019-10-21

End date

2019-10-25

Language

English

Copyright

© 2019 Association for Computing Machinery.

Former Identifier

2006109382

Esploro creation date

2021-08-29

Usage metrics

    Scholarly Works

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC