SALVATORE CONTINO

DR-Minerva: A Multimodal Language Model Based on Minerva for Diagnostic Information Retrieval

Autori: Siragusa, Irene; Contino, Salvatore; Pirrone, Roberto
Anno di pubblicazione: 2025
Tipologia: Contributo in atti di convegno pubblicato in volume
OA Link: http://hdl.handle.net/10447/668610

Abstract

This paper illustrates the development of Minerva Diagnostic Retriever (DR-Minerva), a Visual Language Model specialized in the medical domain. Prompted using a textual input with the patient’s information along with a CT or MR scan, the model provides information about the body part and the scanning modality of the given image. The model relies on the Flamingo architecture, which is well known for its good in-context and few-shot learning capabilities, and it encodes textual data using Minerva, a novel Large Language Model trained on English and Italian data. Model performances are improved via fine-tuning the aforementioned model, and using external knowledge by means of a Retrieval Augmented Generation approach. At inference time, the model is injected with the retrieved examples in form of in-context learning. The authors developed a rearranged version of the MedPix® multi-modal medical dataset, that was used for both the development and the test of the model as long as for retrieval. A detailed description of the system is reported along with the experimental results that are discussed in thoroughly. Dataset and models used are available on GitHub (https://github.com/CHILab1/MedPix-2.0.).