About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
EMNLP 2024
Paper
HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues
Abstract
As generative AI progresses, collaboration between doctors and AI scientists is leading to the development of personalized models to streamline healthcare tasks and improve productivity. Summarizing doctor-patient dialogues has become important, helping doctors understand conversations faster and improving patient care. While previous research has mostly focused on text data, incorporating visual cues from patient interactions allows doctors to gain deeper insights into medical conditions. Most of this research has centred on English datasets, but real-world conversations often mix languages for better communication. To address the lack of resources for multimodal summarization of code-mixed dialogues in healthcare, we developed the MCDH dataset. Additionally, we created HealthAlignSumm, a new model that integrates visual components with the BART architecture. This represents a key advancement in multimodal fusion, applied within both the encoder and decoder of the BART model. Our work is the first to use alignment techniques, including state-of-the-art algorithms like Direct Preference Optimization, on encoder-decoder models with synthetic datasets for multimodal summarization. Through extensive experiments, we demonstrated the superior performance of HealthAlignSumm across several metrics validated by both automated assessments and human evaluations