RMIT University
Browse

M3: Multi-level dataset for Multi-document summarization of Medical studies

conference contribution
posted on 2024-11-03, 15:27 authored by Yulia Otmakhova, Cornelia VerspoorCornelia Verspoor, Timothy Baldwin, Antonio Jose Jimeno YepesAntonio Jose Jimeno Yepes, Jey Lau
We present M3 (Multi-level dataset for Multi-document summarisation of Medical studies), a benchmark dataset for evaluating the quality of summarisation systems in the biomedical domain. The dataset contains sets of multiple input documents and target summaries of three levels of complexity: documents, sentences, and propositions. The dataset also includes several levels of annotation, including biomedical entities, direction, and strength of relations between them, and the discourse relationships between the input documents (“contradiction” or “agreement”). We showcase usage scenarios of the dataset by testing 10 generic and domain-specific summarisation models in a zero-shot setting, and introduce a probing task based on counterfactuals to test if models are aware of the direction and strength of the conclusions generated from input studies.

History

Related Materials

  1. 1.

Start page

3916

End page

3930

Total pages

15

Outlet

Proceedings of the Findings of the Association for Computational Linguistics

Editors

Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang

Name of conference

EMNLP 2022

Publisher

Association for Computational Linguistics

Place published

United States

Start date

2022-12-07

End date

2022-12-11

Language

English

Copyright

© 2022 Association for Computational Linguistics

Former Identifier

2006125107

Esploro creation date

2023-08-26

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC