Development of a type two diabetes predictive model for mexicans applying to electronic health records dataset retrieved from National Public Data (ENSANUT 2018)

dc.audience.educationlevelInvestigadores/Researcherses_MX
dc.contributor.advisorNoguez Monroy, Juana Julieta
dc.contributor.authorFregoso Aparicio, Luis Martín
dc.contributor.catalogerpuemcuervoes_MX
dc.contributor.committeememberCantú Ortiz, Francisco Javier
dc.contributor.committeememberGonzález Mendoza, Miguel
dc.contributor.committeememberGarcía García, José Antonio
dc.contributor.departmentSchool of Engineering and Scienceses_MX
dc.contributor.institutionCampus Estado de Méxicoes_MX
dc.contributor.mentorMontesinos Silva, Luis Arturo
dc.date.accepted2021-12-02
dc.date.accessioned2022-05-31T17:22:58Z
dc.date.available2022-05-31T17:22:58Z
dc.date.issued2021-12-02
dc.descriptionhttps://orcid.org/0000-0002-6000-3452es_MX
dc.description.abstractDiabetes mellitus is a chronic and severe disease that occurs when the glucose levels in the blood rise above the limits because the body of the patient cannot produce insulin hormone or the amount is insufficient. Likewise, when the produced hormone is not able to be used efficiently. The American Diabetes Association establish to diagnosis Diabetes when the test of HbA1c is higher or equal to 6.5\%. Likewise, if basal fasting blood glucose (GB) is higher than 126 mg/dL or blood glucose 2 hours after an oral glucose tolerance test with 75 g of glucose (SOG) is greater or equal to 200 mg/dL. Type 2 diabetes (T2D), formerly known as adult-onset diabetes, is a form of diabetes characterized by high blood sugar, insulin resistance, and a relative lack of insulin. In Mexico, ten-point four percent of the population had diabetes in 2016, compared with 7\% of the population in 2006. In the past years, Machine Learning has been used to create a predictive model for the onset of type 2 diabetes, making it achievable to develop one for the Mexican population. The model should have the capacity to detect undiagnosed diabetics, applying a national public dataset of diabetes mellitus 2 in Mexico (ENSANUT 2018). The objective is to develop a predictive model of type 2 diabetes for Mexicans as a support tool helping primary care physicians make a timely diagnosis, preventing the onset of diabetes or its complications, detecting diabetes early with higher accuracy than the few Mexican models. A systematic review with 91 studies is performed to detect possible optimal machine learning techniques and features to create novel type 2 diabetes predictive models. Based on the PRISMA methodology combined with the methodology of Keele University and Durham University. The related work section results found that tree-type clusters of machine learning algorithms developed the best predictive models. There are five possible models Decision Tree, Random Forest, Gradient Boosting Tree, K-Nearest Neighborhood, and Logistic Regression to choose for classification diabetes. The database selected for the model is the National Health and Nutrition Survey (ENSANUT 2018), a tool that shows the general health and nutrition conditions of a representative sample of the population of Mexico. It is divided into several datasets joined by a unique ID created with values of their variables. The target (HEMGLICLASS) is a binary categorical variable which zero corresponds to a healthy person, and one is diabetic, and the complete database has 11639 samples and 55 attributes. After cleaning it and balancing the samples for diabetics and healthy, the final database has 21696 observations and 26 variables composed of the surveyed's categorization eating habits and their corresponding blood chemistry test values. Based on their metrics, after performing a model selection and optimization applying to the ENSANUT database, from the techniques described in the systematic review, Random Forest Classifier has the best metric for the prediction and could be interpreted it the physicians. The proposed model is a Random Forest with the default values with fifteen attributes from the original ENSANUT database. The attributes are related to the values of the testing blood measurements as the classical models and add new features like the intake of vegetables and fruits during the whole week as a protector or the enhancer in the case of an excessive intake of meat milky products or candies. Once the model was done, it was validated with the actual data to assure that the performance of the accuracy and AUC(ROC) keep higher than the 90 percent further other three metrics also are estimated. The results are accuracy: (0.90 $\pm$ 0.154), F1-Score: (0.86 $\pm$ 0.286) Precision: ( 0.94 $\pm$ 0.069), Sensitivity: (0.87 $\pm$ 0.294), and AUC(ROC): (0.92 $\pm$ 0.191). For proving the superior prediction capacity of the new model versus the Olimpia Arrellano-Campos model, equality of the means test with unknown variances is done with the T-student as estimator and p-value as the criterion to reject. The result is a p-value equal to 0.00572, demonstrating the improvement in the capacity of prediction by the model. Finally, the relevance of this model is the possibility to anticipate a diagnosis before the onset of symptoms, and even in the long term, anticipate the development of chronic complications. The model reflected this importance showing the complexity inherent to the detection of diabetes, generating a tool as simple as possible to support physicians in making a diagnosis. The ideal is to predict the onset before it is possible to call a pre-diabetic stage, but this model offers the possibility to generate a diagnosis near this stage.es_MX
dc.description.degreeMaster of Science in Computer Sciencees_MX
dc.format.mediumTextoes_MX
dc.identificator7||33||3304||120320es_MX
dc.identifier.citationFregoso Aparicio, L. M. (2021). Development of a type two diabetes predictive model for Mexicans applying to Electronic Health Records dataset retrieved from National Public Data (ENSANUT 2018) [Unpublished master's thesis]. Instituto Tecnológico y de Estudios Superiores de Monterrey.es_MX
dc.identifier.cvu962778es_MX
dc.identifier.orcidhttps://orcid.org/0000-0003-4986-5745es_MX
dc.identifier.urihttps://hdl.handle.net/11285/648435
dc.language.isoenges_MX
dc.publisherInstituto Tecnológico y de Estudios Superiores de Monterreyes_MX
dc.relation.isFormatOfversión publicadaes_MX
dc.rightsopenAccesses_MX
dc.rights.urihttp://creativecommons.org/licenses/by/4.0es_MX
dc.subject.classificationINGENIERÍA Y TECNOLOGÍA::CIENCIAS TECNOLÓGICAS::TECNOLOGÍA DE LOS ORDENADORES::SISTEMAS DE CONTROL MÉDICOes_MX
dc.subject.keywordDiabeteses_MX
dc.subject.keywordRandom Forestes_MX
dc.subject.keywordPredictive Modeles_MX
dc.subject.keywordMachine Learninges_MX
dc.subject.keywordElectronic Health Recordses_MX
dc.subject.lcshSciencees_MX
dc.titleDevelopment of a type two diabetes predictive model for mexicans applying to electronic health records dataset retrieved from National Public Data (ENSANUT 2018)es_MX
dc.typeTesis de maestría

Files

Original bundle

Now showing 1 - 4 of 4
Loading...
Thumbnail Image
Name:
Thesis Luis Fregoso.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
Description:
Tesis Maestría
Loading...
Thumbnail Image
Name:
CartaAutorizacionTesis-Luis Fregoso.pdf
Size:
118.08 KB
Format:
Adobe Portable Document Format
Description:
Carta de autorización
Loading...
Thumbnail Image
Name:
03_Thesis_Firmas Luis Fregoso_JN_LM_JAGG_MG_FC_FIRMADO.pdf
Size:
249.65 KB
Format:
Adobe Portable Document Format
Description:
Hoja de firmas
Loading...
Thumbnail Image
Name:
Autoria_Luis Fregoso.pdf
Size:
146.68 KB
Format:
Adobe Portable Document Format
Description:
Carta autoria

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.3 KB
Format:
Item-specific license agreed upon to submission
Description:
logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

DSpace software copyright © 2002-2026

Licencia