Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography

dc.audience.educationlevelInvestigadores/Researchers
dc.contributor.advisorSantos Díaz, Alejandro
dc.contributor.authorMolina Román, Yusdivia
dc.contributor.catalogeremipsanchez
dc.contributor.committeememberMenasalvas Ruiz, Ernestina
dc.contributor.committeememberTamez Pena, José
dc.contributor.committeememberMontesinos Silva, Luis Arturo
dc.contributor.departmentSchool of Engineering and Sciences
dc.contributor.institutionCampus Estado de México
dc.date.accepted2025-06
dc.date.accessioned2025-06-16T06:43:53Z
dc.date.issued2025-06
dc.descriptionhttps://orcid.org/0000-0001-5235-7325
dc.description.abstractThis study evaluates and compares the effectiveness of different deep learning approaches for automated breast density classification according to the BI-RADS system. Specifically, the research examines two distinct architectures: ConvNeXt, a CNN-based model, and BioMed- CLIP, a vision-language model that integrates textual information through token-based labels. Using mammographic images from TecSalud at Tecnol´ogico de Monterrey, the study assesses these models across three distinct learning paradigms: zero-shot classification, linear probing with token-based descriptions, and fine-tuning with numerical class labels. The experimental results demonstrate that while vision-language models offer theoretical advantages in terms of interpretability and zero-shot capabilities, based CNN architectures with end-to-end fine-tuning currently deliver superior performance for this specialized medical imaging task. ConvNeXt achieves an accuracy of up to 0.71 and F1 scores of 0.67, compared to BioMedCLIP’s best performance of 0.57 accuracy with linear probing. A comprehensive analysis of classification patterns revealed that all models encountered difficulties in distinguishing between adjacent breast density categories, particularly heterogeneously dense tissue. This challenge mirrors known difficulties in clinical practice, where even experienced radiologists exhibit inter-observer variability in density assessment. The performance discrepancy between models was further examined through detailed loss curve analysis and confusion matrices, revealing specific strengths and limitations of each approach. A key limitation in BioMedCLIP’s performance stemmed from insufficient semantic richness in the textual tokens representing each density class. When category distinctions relied on subtle linguistic differences—such as ”extremely” versus ”heterogeneously”—the model struggled to form robust alignments between visual features and textual descriptions. The research contributes to the growing body of knowledge on AI applications in breast imaging by systematically comparing traditional and multimodal approaches under consistent experimental conditions. The findings highlight both the current limitations and future potential of vision-language models in mammographic analysis, suggesting that enhanced textual descriptions and domain-specific adaptations could potentially bridge the performance gap while preserving the interpretability benefits of multimodal approaches for clinical applications.
dc.description.degreeMaster of Science in Computer Science
dc.format.mediumTexto
dc.identificator339999
dc.identifier.citationMolina Román Y. (2025). Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography [Tesis maestría]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperad de: https://hdl.handle.net/11285/703752
dc.identifier.orcidhttps://orcid.org/0009-0005-3623-8971
dc.identifier.urihttps://hdl.handle.net/11285/703752
dc.language.isoeng
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relationSECIHTI
dc.relationMinisterio de Economía y Transformación Digital de España
dc.relationInstituto Tecnológico y de Estudios Superiores de Monterrey
dc.relation.isFormatOfacceptedVersion
dc.rightsopenAccess
dc.rights.urihttps://creativecommons.org/licenses/by-sa/4.0
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIAL
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA MÉDICA::OTRAS
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::OTRAS ESPECIALIDADES TECNOLÓGICAS::OTRAS
dc.subject.keywordBreast Density Classification
dc.subject.keywordDeep learning
dc.subject.keywordMammography
dc.subject.keywordVision-language models
dc.subject.keywordBiomedclip
dc.subject.keywordConvnext
dc.subject.lcshTechnology
dc.subject.lcshScience
dc.titleBeyond images: convnext vs. vision-language models for automated breast density classification in screening mammography
dc.typeTesis de maestría

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
MolinaRoman_TesisMaestria_pdfa.pdf
Size:
5.31 MB
Format:
Adobe Portable Document Format
Description:
Tesis Maestría
Loading...
Thumbnail Image
Name:
MolinaRoman_ActaGradoDeclaracionAutoria_pdfa.pdf
Size:
368.99 KB
Format:
Adobe Portable Document Format
Description:
Acta de Grado
Loading...
Thumbnail Image
Name:
MolinaRoman_CartaAutorizacion_pdf.pdf
Size:
127.38 KB
Format:
Adobe Portable Document Format
Description:
Carta Autorización

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.28 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia