Efficient AI for Medical Imaging Analysis
Project Member(s): Chang, X.
Funding or Partner Organisation: Commonwealth Scientific and Industrial Research Organisation (Data61) (Data61 CSIRO)
Commonwealth Scientific and Industrial Research Organisation (Data61) (Data61 CSIRO)
Start year: 2023
Summary: Traditional MRG approaches are based on an encoder-decoder framework. Among all the encoderdecoder frameworks, Transformer has achieved SOTA performance in many fields. However, Transformer-based models’ effectiveness highly depends on the parameters’ initiations. Therefore, pretraining research, especially Vision-Language Pretraining (VLP), has advanced the performance of many vision-language tasks. However, two difficulties hinder the advance of pretraining in MRG. On the one hand, images in the medical domain are fundamentally different from the general domain images. Consequently, it is infeasible to employ general domain captioning models for the medical domain directly. Additionally, medical images annotation is a costly and time-consuming process which leads to limited gold standard medical images-text pairs; On the other hand, pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When writing a report, ophthalmologists make inferences with prior clinical knowledge. This knowledge has been neglected in prior medical report generation methods. Therefore, we aim to propose a knowledge enhanced pretraining for medical multi-modal tasks.
FOR Codes: Expanding knowledge in the biomedical and clinical sciences, Computer vision