Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 6 of 6

Comparing Databases for the Prediction of Student’s Academic Performance using Data Science on the Novel Educational Model Tec21 at Tecnológico de Monterrey
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-06) Lara Castor, Miguel Andrés; HERNANDEZ GRESS, NEIL; 21847; Hernández Gress, Neil; tolmquevedo, emipsanchez; Batres Prieto, Rafael; Garza Villareal, Sara Elena; Escuela de Ingeniería y Ciencias; Campus Monterrey; Ceballos Cancino, Héctor Gibrán
Many studies have been made on the prediction of student's academic performance using Data Science. The students with poor academic performance as well as dropout students make a huge impact on the graduation rates, reputation, and finances of an educational institution. These studies take the advantage of the digitization of the admission and academic data of the students and the increasing computational power. However, since August 2019 Tecnologico de Monterrey has been doing it using entrance tests called Initial Evaluations. Unfortunately, the Initial evaluations did not provide useful predictions for the students of the fall semester in 2019. Therefore, this study aimed to compare the Initial Evaluations and the admissions data using Data Science models to predict the student's academic performance. The admission data was composed of five databases: Initial Evaluations, Emotions, Curriculum, Admission Exam and Grades of the first semester. A similar methodology to Cross Industry Standard Process for Data Mining was used to compare the models based on admission data and the models based only Initial Evaluations. A large number of experiments were carried out combining different data of admissions, feature reduction techniques and classification models. The experiments showed that the models based on admission data predicts the student's academic performance with higher accuracy than the models based only on Initial Evaluations. Nevertheless, some variables of the Initial Evaluations were relevant to the models based on admission data. Moreover, the accuracy of the experiments was in the range of the results from the related studies. The results of this study indicates that the Initial Evaluations provide useful information for the prediction of student's academic performance in the domain of Data Science.
Characterisation of visitors and description of their navigation behaviour using web log mining techniques
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-02) Huidobro Espejel, Alicia; MONROY BORJA, RAUL; 12232; Monroy Borja, Raúl; hermlugo, emipsanchez; Loyola González, Octavio; Graff Guerrero, Mario; Escuela de Ingeniería y Ciencias; Campus Estado de México; Cervantes González, Bárbara
The value of a company’s website depends on visitors performing actions that add value for the company. Those actions are called conversions. We present techniques for both characterising website visitors in terms of the conversions they make, and describing their navigation behaviour in an abstract way, with the aim of making them more amenable to interpretation. Existing web analytics techniques have not been designed to highlight the distinguishing characteristics of a class of visitors. There are no approaches for characterising classes of visitors that take into account specific business goals; further, the navigation behaviour of a visitor, let alone a class of visitors, is conveyed as a sequence of visited pages, without giving this an abstract meaning. In this thesis, we introduce a means of characterising website visitors. To find what the different segments of visitors have or do not have in common, we first separate visitor sessions in terms of conversions and then for each class we mine patterns to contrast one another. We also introduce a simplified description of visitor navigation behaviour. Our technique works by identifying subsequences of visited pages of common occurrence, called ``rules'', and then by shrinking a session replacing those rules with a symbol that is given a representative name. Further, we extended this to an entire class of visitors, creating a graph that collects the class sessions, summarising the class navigation behaviour and enabling an easier contrast of classes. Our results show that a few patterns are enough to characterise a visitor class; since each class is associated with a conversion, an expert can easily draw conclusions as to what makes two classes different from one another. Also, with our abstract representation, a session can be shrinked so that the behaviour of an entire visitor class can be depicted in a moderately small graph. Further work is concerned with incorporating information from other sales channels and completing the analysis provided by existing techniques.
Occupancy Estimation in Enclosed Spaces using an Indirect Approach, laying the Foundations to Build an IoT Architecture
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021) Vela Miam, Irving Andree; Ceballos Cancino, Héctor Gibrán; 223871; Ceballos Cancino, Héctor Gibrán; tolmquevedo, emipsanchez; Dávila Delgado, Juan Manuel; Hernandez Gress, Neil; Escuela de Ingeniería y Ciencias; Campus Monterrey; Alvarado Uribe, Joanna
The buildings industry accounts for 30% to 40% of total consumed energy worldwide, and with most of this energy coming from fossil fuels, improving energy efficiency is critical to reducing the harmful effects of this industry on the environment. Fortunately, opportune information about the number of occupants has been identified as a significant contributor to improving energy efficiency. The several works that have been carried out to solve the problem of occupancy detection/estimation fall in one of the following categories: (1) direct approaches based on sensors and cameras to measure occupancy directly, and (2) indirect approaches based on environmental data to derive the occupancy information. Due to the cost and privacy issues, indirect approaches are preferred for most use cases. This thesis focused on estimating occupancy in buildings’ indoor spaces using environmental variables andMachine Learning techniques. Specifically, the use of temperature, humidity, and pressure information was proposed to estimate the level of occupancy. Additionally, feature selection and time resolution selection steps were used to achieve high accuracy. In the process, it was necessary to generate a dataset with occupancy information from two different locations with contrasting characteristics. This dataset is an essential contribution as no other dataset suitable for estimating occupancy using the proposed environmental variables is publicly accessible.Likewise, a review of IoT platforms was carried out to identify the components required to build an occupancy estimation system. Among the contributions, it is reported that at least98% of accuracy can be achieved using this approach and a kNN model. Also, a theoretical architecture for an occupancy estimation system using AWS IoT Core was documented. Finally, the generated dataset was made publicly accessible through the Mendeley Data repository.
Wind Resource Assessment with Microscale Models and a Machine Learning Method
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2020-12) Quiroga Novoa, Pedro Fernando; HUERTAS BOLAÑOS, MARIA ELENA; 333833; Probst Oleszewski, Oliver Matthias.; tolmquevedo, emipsanchez; Huertas Bolaños, Maria Elena; Escuela de Ingeniería y Ciencias; Campus Monterrey; Preciado Arreola, José Luis
Wind energy has been gaining more prominence among renewable energy sources, as it is an affordable and increasingly reliable technology. The precision in the evaluation of the wind resource is, of course, a fundamental factor to guarantee the continuous development of these types of projects. As installed capacity increases, it is natural that the new wind farms increasingly have to be installed on more complex terrain. Therefore the methodologies that have traditionally been used to predict mean wind speed will be subject to greater uncertainty, given the limitations of the models under these challenging conditions. A more demanding energy industry requires further investigation of reliable and robust methodologies to assess available resources accurately. In this master thesis, two approaches to predicting average wind speed in complex terrain were evaluated. These approaches were wind flow models and statistical methods. Regarding the wind flow models, one year of on-site measurements was used to validate two well-known microscale models, the Wind Atlas Analysis and Application Program (WAsP) and the WindSim model. The performance of each model was evaluated by using a crossprediction methodology. The second approach corresponds to a machine learning method called k-Nearest neighbor (k-NN) regression. As its name implies, measurements from neighboring sites were used to predict the mean speed at a target site. Terrain and climatic features were used as predictors in the method mentioned above. By using the statistical method, the prediction errors were reduced to 1.29%. Further improvements in the accuracy were achieved by implementing a weight-based ensemble model between the WAsP model and the k-NN regression, with an overall percentage error of 1.06% compared with the 5.09% and 4.31% obtained with the WAsP model and the WindSim model, respectively.
Identifying models of DNA polymorphisms associated with alzheimer’s disease using step-wise and genetic algorithms from GWAS data
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2019-05) Romero Rosales, Brissa Lizbeth; ROMERO ROSALES, BRISSA LIZBETH; 861461; TREVIÑO ALVARADO, VICTOR MANUEL; 205076; Treviño Alvarado, Víctor Manuel; Vallejo Clemente, Edgar Emmanuel; Moreno Treviño, María Guadalupe; Escuela de Ingeniería y Ciencias; Campus Monterrey
Alzheimer's disease is a neurodegenerative disorder that involves cognitive deterioration accompanied by memory loss and inability to reason, affecting the patient's ability to carry out daily activities. This disorder is caused by genetic, environmental and lifestyle factors. The determination of the genetic factors is very important because the disease can be prognosticated and therefore treated before it appears. However, despite research efforts and many putative detections using univariate analyses, only the APOE gene has been plentiful validated as a risk factor associated with late-onset Alzheimer's disease. Thus, the problem of missing heritability arises, implying that only one gene does not determine the heritability of a disorder, but the combined effect of genes could better explain it. Genome-Wide Association Studies (GWAS) traditionally use univariate techniques to determine the association between markers and diseases. This research proposes the use of machine learning techniques based on GWAS data to identify sets of polymorphisms that maximize discrimination between cases and controls. This document explains the traditional strategies and theoretical bases that support this research. It presents previous works that apply multivariate methods for the prediction of different diseases and treatments, and their most representative characteristics are considered the basis to inspire a new solution. The proposed methodology includes obtaining genetic data and a pre-processing stage. Afterward, the process involves several quality control procedures that filter samples and SNPs to reduce the number of false positives and false negatives. Next, a chi-squared association test with kinship correction is performed to pre-select markers. Predictive models are built using wrapper and embedded computational methods. The first wrapper method used is BSWiMS, which is based on statistics and procedures of forward and backward selection to generate a logistic model. Its best AUC was 0.689. The second wrapper method used is based on stochastic search and was an ensemble of Genetic Algorithms coupled to a Support Vector Machine classifier followed by a Forward Selection that achieved a maximum AUC of 0.716. The third algorithm used is LASSO, one of the most well-known embedded methods, which use L1-regularization and performs a feature selection process in the training stage of the model. This classifier achieved an AUC of 0.8005. This study incorporates the analysis of poorly classified samples in predictive models as a strategy to build higher predictive models. The best result obtained with the mixed model of the variants of previous models outperformed the others with an AUC of 0.842. This result is promising since the model generated with LASSO showed the highest discrimination between classes, based solely on genetic data. The biological relevance of the markers of the models is presented through their association with their respective gene. The models replicated variants previously associated with Alzheimer's disease, especially on chromosome 19 close to the APOE gene.
Visualization and machine learning techniques to support web traffic analysis
(Instituto Tecnológico y de Estudios Superiores de Monterrey) Gómez-Herrera, Fernando; Monroy, Raúl; Campus Estado de México; Campus Estado de México; Campus Estado de México; Monroy, Raúl
Web Analytics (WA) services are one of the main tools that marketing experts use to measure the success of an online business. Thus, it is extremely important to have tools that support WA analysis. Nevertheless, we observed that there has not been much change in how services display traffic reports. Regarding the trustworthiness of the information, Web Analytics Services (WAS) are facing the problem that more than half of Internet traffic is Non-Human Traffic (NHT). Misleading online reports and marketing budget could be wasted because of that. Some research has been done, yet, most of the work involves intrusive methods and do not take advantage of information provided by current WAS. In the present work, we provide tools that can help the marketing expert to get better reports, to have useful visualizations, and to ensure the trustworthiness of the traffic. First, we propose a new Visualization Tool. It helps to show the website performance in terms of a preferred metric and enable us to identify potential online strategies upon that. Second, we use Machine Learning Binary Classification (BC) and One-Class Classification (OCC) to get more reliable information by identifying NHT and abnormal traffic. Then, marketing analysts could contrast NHT against their current reports. Third, we show how Pattern Extraction algorithms (like PBC4cip's miner) could help to conduct traffic analysis (once visitor segmentation is done), and to propose new strategies that may improve the online business. Later on, the patterns can be used in the Visualization Tool to analyze the traffic in detail. We confirmed the usefulness of the Visualization Tool by using it to analyze bot traffic we generated. NHT traffic shared a very similar linear navigation path, contrasted with the more complex human path. Furthermore, BC and OCC (BaggingTPMiner) worked successfully in the detection of well-known bots and abnormal traffic. We achieved a ROC AUC of 0.844 and 0.982 for each approach, respectively.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results