Caption generation with transformer models across multiple medical imaging modalities

dc.audience.educationlevelInvestigadores/Researcherses_MX
dc.contributor.advisorSantos Díaz, Alejandro
dc.contributor.authorVela Jarquin, Daniel
dc.contributor.catalogerdnbsrpes_MX
dc.contributor.committeememberSoenksen, Luis Ruben
dc.contributor.committeememberMontesinos Silva, Luis Arturo
dc.contributor.committeememberOchoa Ruiz, Gilberto
dc.contributor.departmentSchool of Engineering and Scienceses_MX
dc.contributor.institutionCampus Monterreyes_MX
dc.contributor.mentorTamez Peña, José Gerardo
dc.date.accepted2023-06
dc.date.accessioned2023-07-17T20:49:08Z
dc.date.available2023-07-17T20:49:08Z
dc.date.issued2023-06
dc.descriptionhttps://orcid.org/0000-0001-5235-7325es_MX
dc.description.abstractCaption generation is the process of automatically providing text excerpts that describe relevant features of an image. This process is applicable to very diverse domains, including healthcare. The field of medicine is characterized by the vast amount of visual information in the form of X-Rays, Magnetic Resonances, Ultrasound and CT-scans among others. Descriptive texts generated to represent this kind of visual information can aid medical professionals to achieve a better understanding of the pathologies and cases presented to them and could ultimately allow them to make more informed decisions. In this work, I explore the use of deep learning to face the problem of caption generation in medicine. I propose the use of a Transformer model architecture for caption generation and evaluate its performance on a dataset comprised of medical images that range across multiple modalities and represented anatomies. Deep learning models, particularly encoder-decoder architectures have shown increasingly favorable results in the translation from one information modality to another. Usually, the encoder extracts features from the visual data and then these features are used by the decoder to iteratively generate a sequence in natural language that describes the image. In the past, various deep learning architectures have been proposed for caption generation. The most popular architectures in the last years involved recurrent neural networks (RNNs), Long short-term memory (LSTM) networks and only recently, the use of Transformer type architectures. The Transformer architecture has shown state-of-the art performance in many natural language processing tasks such as machine translation, question answering, summarizing and not long ago, caption generation. The use of attention mechanisms allows Transformers to better grasp the meaning of words in a sentence in a particular context. All these characteristics make Transformers ideal for caption generation. In this thesis I present the development of a deep learning model based on the Transformer architecture that generates captions for medical images of different modalities and anatomies with the ultimate goal to aid professionals improve medical diagnosis and treatment. The model is tested on the MedPix online database, a compendium of medical imaging cases and the results are reported. In summary, this work provides a valuable contribution to the field of automated medical image analysises_MX
dc.description.degreeMaster of Science in Computer Sciencees_MX
dc.format.mediumTextoes_MX
dc.identificator1||12||1203||120304es_MX
dc.identifier.citationVela Jarquin, D. (2023). Caption generation with transformer models across multiple medical imaging modalities (Master's thesis). Instituto Tecnológico de Monterrey.es_MX
dc.identifier.cvu1154114es_MX
dc.identifier.orcidhttps://orcid.org/0000-0001-5624-8791es_MX
dc.identifier.scopusid57215617169es_MX
dc.identifier.urihttps://hdl.handle.net/11285/651044
dc.language.isoenges_MX
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterreyes_MX
dc.relation.isFormatOfacceptedVersiones_MX
dc.rightsopenAccesses_MX
dc.rights.urihttp://creativecommons.org/licenses/by/4.0es_MX
dc.subject.classificationCIENCIAS FÍSICO MATEMÁTICAS Y CIENCIAS DE LA TIERRA::MATEMÁTICAS::CIENCIA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIALes_MX
dc.subject.keywordArtificial Intelligencees_MX
dc.subject.keywordDeep Learninges_MX
dc.subject.keywordTransformeres_MX
dc.subject.keywordMedPixes_MX
dc.subject.keywordCaptioninges_MX
dc.subject.keywordCaption generationes_MX
dc.subject.keywordMedical Imaginges_MX
dc.subject.keywordAutomatic Medical Interpretationes_MX
dc.subject.keywordHealthcarees_MX
dc.subject.keywordNatural Language Processinges_MX
dc.subject.keywordImage Captioninges_MX
dc.subject.lcshTechnologyes_MX
dc.titleCaption generation with transformer models across multiple medical imaging modalitieses_MX
dc.typeTesis de maestría

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
Final Thesis v5.pdf
Size:
2.79 MB
Format:
Adobe Portable Document Format
Description:
Tesis Maestría PDF
Loading...
Thumbnail Image
Name:
carta autorizacon.pdf
Size:
366.83 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
Firmas.pdf
Size:
130.19 KB
Format:
Adobe Portable Document Format
Description:
Hoja de firmas
Loading...
Thumbnail Image
Name:
Autoria.pdf
Size:
97.33 KB
Format:
Adobe Portable Document Format
Description:
Carta de autoría

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.3 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia