Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography
| dc.audience.educationlevel | Investigadores/Researchers | |
| dc.contributor.advisor | Santos Díaz, Alejandro | |
| dc.contributor.author | Molina Román, Yusdivia | |
| dc.contributor.cataloger | emipsanchez | |
| dc.contributor.committeemember | Menasalvas Ruiz, Ernestina | |
| dc.contributor.committeemember | Tamez Pena, José | |
| dc.contributor.committeemember | Montesinos Silva, Luis Arturo | |
| dc.contributor.department | School of Engineering and Sciences | |
| dc.contributor.institution | Campus Estado de México | |
| dc.date.accepted | 2025-06 | |
| dc.date.accessioned | 2025-06-16T06:43:53Z | |
| dc.date.issued | 2025-06 | |
| dc.description | https://orcid.org/0000-0001-5235-7325 | |
| dc.description.abstract | This study evaluates and compares the effectiveness of different deep learning approaches for automated breast density classification according to the BI-RADS system. Specifically, the research examines two distinct architectures: ConvNeXt, a CNN-based model, and BioMed- CLIP, a vision-language model that integrates textual information through token-based labels. Using mammographic images from TecSalud at Tecnol´ogico de Monterrey, the study assesses these models across three distinct learning paradigms: zero-shot classification, linear probing with token-based descriptions, and fine-tuning with numerical class labels. The experimental results demonstrate that while vision-language models offer theoretical advantages in terms of interpretability and zero-shot capabilities, based CNN architectures with end-to-end fine-tuning currently deliver superior performance for this specialized medical imaging task. ConvNeXt achieves an accuracy of up to 0.71 and F1 scores of 0.67, compared to BioMedCLIP’s best performance of 0.57 accuracy with linear probing. A comprehensive analysis of classification patterns revealed that all models encountered difficulties in distinguishing between adjacent breast density categories, particularly heterogeneously dense tissue. This challenge mirrors known difficulties in clinical practice, where even experienced radiologists exhibit inter-observer variability in density assessment. The performance discrepancy between models was further examined through detailed loss curve analysis and confusion matrices, revealing specific strengths and limitations of each approach. A key limitation in BioMedCLIP’s performance stemmed from insufficient semantic richness in the textual tokens representing each density class. When category distinctions relied on subtle linguistic differences—such as ”extremely” versus ”heterogeneously”—the model struggled to form robust alignments between visual features and textual descriptions. The research contributes to the growing body of knowledge on AI applications in breast imaging by systematically comparing traditional and multimodal approaches under consistent experimental conditions. The findings highlight both the current limitations and future potential of vision-language models in mammographic analysis, suggesting that enhanced textual descriptions and domain-specific adaptations could potentially bridge the performance gap while preserving the interpretability benefits of multimodal approaches for clinical applications. | |
| dc.description.degree | Master of Science in Computer Science | |
| dc.format.medium | Texto | |
| dc.identificator | 339999 | |
| dc.identifier.citation | Molina Román Y. (2025). Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography [Tesis maestría]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperad de: https://hdl.handle.net/11285/703752 | |
| dc.identifier.orcid | https://orcid.org/0009-0005-3623-8971 | |
| dc.identifier.uri | https://hdl.handle.net/11285/703752 | |
| dc.language.iso | eng | |
| dc.publisher | Instituto Tecnológico y de Estudios Superiores de Monterrey | |
| dc.relation | SECIHTI | |
| dc.relation | Ministerio de Economía y Transformación Digital de España | |
| dc.relation | Instituto Tecnológico y de Estudios Superiores de Monterrey | |
| dc.relation.isFormatOf | acceptedVersion | |
| dc.rights | openAccess | |
| dc.rights.uri | https://creativecommons.org/licenses/by-sa/4.0 | |
| dc.subject.classification | INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::INTELIGENCIA ARTIFICIAL | |
| dc.subject.classification | INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA MÉDICA::OTRAS | |
| dc.subject.classification | INGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::OTRAS ESPECIALIDADES TECNOLÓGICAS::OTRAS | |
| dc.subject.keyword | Breast Density Classification | |
| dc.subject.keyword | Deep learning | |
| dc.subject.keyword | Mammography | |
| dc.subject.keyword | Vision-language models | |
| dc.subject.keyword | Biomedclip | |
| dc.subject.keyword | Convnext | |
| dc.subject.lcsh | Technology | |
| dc.subject.lcsh | Science | |
| dc.title | Beyond images: convnext vs. vision-language models for automated breast density classification in screening mammography | |
| dc.type | Tesis de maestría |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- MolinaRoman_TesisMaestria_pdfa.pdf
- Size:
- 5.31 MB
- Format:
- Adobe Portable Document Format
- Description:
- Tesis Maestría
Loading...
- Name:
- MolinaRoman_ActaGradoDeclaracionAutoria_pdfa.pdf
- Size:
- 368.99 KB
- Format:
- Adobe Portable Document Format
- Description:
- Acta de Grado
Loading...
- Name:
- MolinaRoman_CartaAutorizacion_pdf.pdf
- Size:
- 127.38 KB
- Format:
- Adobe Portable Document Format
- Description:
- Carta Autorización
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.28 KB
- Format:
- Item-specific license agreed upon to submission
- Description:

