RMIT University
Browse

Fine-tuning large language models for improved health communication in low-resource languages

Download (990.62 kB)
journal contribution
posted on 2025-03-02, 23:38 authored by Nhat Bui, Giang Nguyen, Nguyen Nguyen, Bao Vo, Luan Vo, Thong HuynhThong Huynh, Kwok Hung TangKwok Hung Tang, Van Nhiem Tran, Tuyen Huynh, Huy Quang Nguyen, Minh Dinh
<h4>Background</h4><p dir="ltr">The reported study illustrates a methodology for compiling training datasets to fine-tune Large Language Models (LLMs) for healthcare information in Vietnamese, a low-resource language. The objective is to bridge the gap in medical information accessibility and enhance healthcare communication in developing countries by adapting LLMs to specific linguistic nuances and domain needs.</p><h4>Method</h4><p dir="ltr">The methodology involves selecting a base model, compiling a domain-specific dataset, and fine-tuning the model with this dataset. Three open-source models were selected. The dataset, comprising approximately 337,000 prompt-response pairs in Vietnamese, was compiled using existing datasets, data crawled from Vietnamese medical online forums, and distilled from Vietnamese medical textbooks. The three models were fine-tuned using the Low-Rank adaptation (LoRA) and Quantized Low-Rank adaptation (QLoRA) techniques. Models’ performances were evaluated using BertScore score, Rouge-L score, and the "LLM-as-a-Judge" method.</p><h4>Results</h4><p dir="ltr">The fine-tuned models showed enhancements in performance over their base versions across evaluation metrics in BertScore score, Rouge-L score and “LLM-as-a-Judge” method, confirming the effectiveness of the fine-tuning process. This study details the process of fine-tuning open-source LLMs for health information inquiries in Vietnamese, demonstrating its potential to improve healthcare communication in low-resource languages. Deploying the fine-tuned LLM on-premise enhances data privacy and security. However, the significant computing power and costs required pose challenges, especially for organizations in developing countries.</p><h4>Conclusion</h4><p dir="ltr">This case study highlights the unique challenges faced by developing countries using low-resource languages. Initiatives are needed to emphasize efforts to bridge healthcare gaps in underserved areas and contribute to global health equity.</p>

History

Related Materials

Journal

Computer Methods and Programs in Biomedicine

Volume

263

Number

108655

Start page

108655

End page

108655

Publisher

Elsevier BV

Language

en

Copyright

© 2025 The Author(s).