Ciencias Exactas y Ciencias de la Salud

Permanent URI for this collectionhttps://hdl.handle.net/11285/551039

Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.

Browse

Search Results

Now showing 1 - 10 of 14

Voice fraud mitigation: developing a deep learning system for detecting cloned voices in telephonic communications
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12-03) Villicaña Ibargüengoyti, José Rubén; Montesinos Silva, Luis Arturo; emimmayorquin; Santos Díaz, Alejandro; Mantilla Caeiros, Alfredo Víctor; School of Engineering and Sciences; Campus Ciudad de México
This study addresses the increasing threat in recent years of voice fraud by cloned voices in phone calls. This problem can compromise personal security in many aspects. The primary goal of this work is to develop a deep learning-based detection system for distinguishing between real and cloned voices in Spanish, focusing on calls made over telephone lines. To achieve this, a dataset was generated from real and cloned audio samples in Spanish. The audios captured were simulated under various telephone codecs and noise levels. Two deep learning models, a convolutional neural network (which in this project is named Vanilla CNN) and a transfer learning (MobileNetV2) approach, were trained using spectrograms derived from the audio data. The results indicate a high accuracy in identifying real and cloned voices, reaching up to 99.97% accuracy. Also, many validations were performed under different types of noise and codecs included in the dataset. These findings highlight the effectiveness of the proposed architectures. Additionally, an ESP32 audio kit was integrated with Amazon Web Services to implement voice detection during phone calls. This study contributes to voice fraud detection research focused on the Spanish language.
Object detection-based surgical instrument tracking in laparoscopy videos
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-12) Guerrero Ramírez, Cuauhtemoc Alonso; Ochoa Ruiz, Gilberto; emipsanchez; González Mendoza, Miguel; Hinojosa Cervantes, Salvador Miguel; Falcón Morales, Luis Eduardo; School of Engineering and Sciences; Campus Monterrey; Medina Pérez, Miguel Ángel
Minimally invasive surgery (MIS) has transformed surgery by offering numerous advantages over traditional open surgery, such as reduced pain, minimized trauma, and faster recovery times. However, endoscopic MIS procedures remain highly operator-dependent, demanding significant skill from the surgical team to ensure a positive postoperative outcome for the patient. The implementation of computer vision techniques such as reliable surgical instru ment detection and tracking can be leveraged for applications such as intraoperative decision support, surgical navigation assistance, and surgical skill assessment, which can significantly improve patient safety. The aim of this work is to implement a Multiple Object Tracking (MOT) benchmark model for the task of surgical instrument tracking in laparoscopic videos. To this end, a new dataset is introduced, m2cai16-tool-tracking, based on the m2cai16-tool locations dataset, specifically designed for surgical instrument tracking. This dataset includes both bounding box annotations for instrument detection and unique tracking ID annotations for multi-object tracking. This work employs ByteTrack, a state-of-the-art multiple-object tracking algorithm that follows the tracking-by-detection paradigm. ByteTrack predicts tool positions and associates object detections across frames, allowing consistent tracking of each instrument. The object detection step is performed using YOLOv4, a state-of-the-art object detection model known for real-time performance. YOLOv4 is first trained on the m2cai16-tool-locations dataset to establish a baseline performance and then on the custom m2cai16-tool-tracking dataset, al lowing to compare the detection performance of the custom dataset with an existing object detection dataset. YOLOv4 generates bounding box predictions for each frame in the laparo scopic videos. The bounding box detections serve as input for the ByteTrack algorithm, which assigns unique tracking IDs to each instrument to maintain their trajectories across frames. YOLOv4 achieves robust object detection performance on the m2cai16-tool-locations dataset, obtaining a mAP50 of 0.949, a mAP75 of 0.537, and a mAP50:95 of 0.526, with a real-time inference speed of 125 fps. However, detection performance on the m2cai16-tool tracking dataset is slightly lower, with a mAP50 of 0.839, mAP75 of 0.420, and mAP50:95 of 0.439, suggesting that differences in data partitioning impact detection accuracy. This lower detection accuracy for the tracking dataset likely affects the tracking performance of ByteTrack, reflected in a MOTP of 76.4, MOTA of 56.6, IDF1 score of 22.8, and HOTAscore of 23.0. Future work could focus on improving the object detection performance to enhance tracking quality. Additionally, including appearance-based features into the track ing step could improve association accuracy of detections across frames and help maintain consistent tracking even in challenging scenarios like occlusions. Such improvements could enhance tracking reliability to support surgical tasks better.
Machine translation for suicide detection: validating spanish datasetsusing machine and deep learning models
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2024-11) Arenas Enciso, Francisco Ariel; Zareel, Mahdi; emipsanchez; García Ceja, Enrique Alejandro; Roshan Biswal, Rajesh; School of Engineering and Sciences; Sede EGADE Monterrey
Suicide is a complex health concern that affects not only individuals but society as a whole. The application of traditional strategies to prevent, assess, and treat this condition has proven inefficient in a modern world in which interactions are mainly made online. Thus, in recent years, multidisciplinary efforts have explored how computational techniques could be applied to automatically detect individuals who desire to end their lives on textual input. Such methodologies rely on two main technical approaches: text-based classification and deep learning. Further, these methods rely on datasets labeled with relevant information, often sourced from clinically-curated social media posts or healthcare records, and more recently, public social media data has proven especially valuable for this purpose. Nonetheless, research focused on the application of computational algorithms for detecting suicide or its ideation is still an emerging field of study. In particular, investigations on this topic have recently considered specific factors, like language or socio-cultural contexts, that affect the causality, rationality, and intentionality of an individual’s manifestation, to improve the assessment made on textual data. Consequently, problems like the lack of data in non-Anglo-Saxon contexts capable of exploiting computational techniques for detecting suicidal ideation are still a pending endeavor. Thus, this thesis addresses the limited availability of suicide ideation datasets in non-Anglo-Saxon contexts, particularly for Spanish, despite its global significance as a widely spoken language. The research hypothesizes that Machine- Translated Spanish datasets can yield comparable results (within a ±5% performance range) to English datasets when training machine learning and deep learning models for suicide ideation detection. To test this, multiple machine translation models were evaluated, and the two most optimal models were selected to translate an English dataset of social media posts into Spanish. The English and translated Spanish datasets were then processed through a binary classification task using SVM, Logistic Regression, CNN, and LSTM models. Results demonstrated that the translated Spanish datasets achieved scores in performance metrics close to the original English set across all classifiers, with limited variations in accuracy, precision, recall, F1-score, ROC AUC, and MCC metrics remaining within the hypothesized ±5% range. For example, the SVM classifier on the translated Spanish sets achieved an accuracy of 90%, closely matching the 91% achieved on the original English set. These findings confirm that machine-translated datasets can serve as effective resources for training ML and DL models for suicide ideation detection in Spanish, thereby supporting the viability of extending suicide detection models to non-English-speaking populations. This contribution provides a methodological foundation for expanding suicide prevention tools to diverse linguistic and cultural contexts, potentially benefiting health organizations and academic institutions interested in psychological computation.
Caption generation with transformer models across multiple medical imaging modalities
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2023-06) Vela Jarquin, Daniel; Santos Díaz, Alejandro; dnbsrp; Soenksen, Luis Ruben; Montesinos Silva, Luis Arturo; Ochoa Ruiz, Gilberto; School of Engineering and Sciences; Campus Monterrey; Tamez Peña, José Gerardo
Caption generation is the process of automatically providing text excerpts that describe relevant features of an image. This process is applicable to very diverse domains, including healthcare. The field of medicine is characterized by the vast amount of visual information in the form of X-Rays, Magnetic Resonances, Ultrasound and CT-scans among others. Descriptive texts generated to represent this kind of visual information can aid medical professionals to achieve a better understanding of the pathologies and cases presented to them and could ultimately allow them to make more informed decisions. In this work, I explore the use of deep learning to face the problem of caption generation in medicine. I propose the use of a Transformer model architecture for caption generation and evaluate its performance on a dataset comprised of medical images that range across multiple modalities and represented anatomies. Deep learning models, particularly encoder-decoder architectures have shown increasingly favorable results in the translation from one information modality to another. Usually, the encoder extracts features from the visual data and then these features are used by the decoder to iteratively generate a sequence in natural language that describes the image. In the past, various deep learning architectures have been proposed for caption generation. The most popular architectures in the last years involved recurrent neural networks (RNNs), Long short-term memory (LSTM) networks and only recently, the use of Transformer type architectures. The Transformer architecture has shown state-of-the art performance in many natural language processing tasks such as machine translation, question answering, summarizing and not long ago, caption generation. The use of attention mechanisms allows Transformers to better grasp the meaning of words in a sentence in a particular context. All these characteristics make Transformers ideal for caption generation. In this thesis I present the development of a deep learning model based on the Transformer architecture that generates captions for medical images of different modalities and anatomies with the ultimate goal to aid professionals improve medical diagnosis and treatment. The model is tested on the MedPix online database, a compendium of medical imaging cases and the results are reported. In summary, this work provides a valuable contribution to the field of automated medical image analysis
The use of multispectral images and deep learning models for agriculture: the application on Agave
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-12) Montán López, José Alberto; FALCON MORALES, LUIS EDUARDO; 168959; Falcón Morales, Luis Eduardo; puelquio, emipsanchez; Sánchez Ante, Gildardo; Roshan Biswal, Rajesh; Sossa Azuela, Huan Humberto; Escuela de Ingeniería y Ciencias; Campus Estado de México
Agave is an important plant for Mexico, country considered as center of biological diversity of agave, in addition, one variety is used for production of tequila, an important product that brings money to the country. Demand of product has led farmers to pay more attention to plantation and to reduce quality. We can find several solutions regarding agricultural filed such as identification of weed and classification of species implementing aerial imagery along with machine and deep learning reaching good results. However, there are few solutions applied directly on agaves to monitor they health. Moreover, there is not a public dataset about agaves for the purpose of this work, for this reason we have worked to collect data using a drone equipped with a multispectral camera capable to capture five different channels of a different wavelength of the light spectrum. This dataset contains 7ha of agave information into five channels provided by the multispectral camera as well as three Vegetation Indices that were computed from the multispectral bands. In this work, we explore the use of recent deep learning (DL) algorithms as well as traditional machine learning (ML) algorithms to segment agaves based on health using aerial multispectral images. On the experiments we found out that ML algorithms were able to segment just one of the two classes defined for agaves. On the experiments of DL models we could define the size of the images we wanted to train where a size of 500x500 was the best for this problem. Experiments for both types of algorithms were done using many combinations of channels such as use just vegetation indices or using all available bands on the dataset. On the other hand, Vision Transformer (ViT) Segmenter model reached an accuracy of 92.96% using vegetation indices data while the best ML algorithm was Random Forest using the five bands captured by the drone reaching 88.06% accuracy. We also test the models using traditional RGB images to compare against multispectral images and see if there is an actual advantage on the use of this type of technology. Results show us that when we introduce the variable of health into agaves, i.e. we have two classes of agaves, models that have additional bands can get better results. Thus, the use of multispectal images actually increase the performance of all models, including ML and DL, for identification of more than one class of agave.
The identification of DoS and DDoS attacks to IoT devices in software defined networks by using machine learning and deep learning models
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022-05) Almaraz Rivera, Josué Genaro; PEREZ DIAZ, JESUS ARTURO; 31169; Pérez Díaz, Jesús Arturo; puelquio/mscuervo; Trejo Rodríguez, Luis Ángel; Botero Vega, Juan Felipe; School of Engineering and Sciences; Campus Monterrey; Cantoral Ceballos, José Antonio
This thesis project explores and improves the current state of the art about detection techniques for Distributed Denial of Service (DDoS) attacks to Internet of Things (IoT) devices in Software Defined Networks (SDN), which as far as is known, is a big problem that network providers and data centers are still facing. Our planned solution for this problem started with the selection of strong Machine Learning (ML) and Deep Learning (DL) models from the current literature (such as Decision Trees and Recurrent Neural Networks), and their further evaluation under three feature sets from our balanced version of the Bot-IoT dataset, in order to evaluate the effects of different variables and avoid the dependencies produced by the Argus flow data generator. With this evaluation we achieved an average accuracy greater than 99% for binary and multiclass classifications, leveraging the categories and subcategories present in the Bot-IoT dataset, for the detection and identification of DDoS attacks based on Transport (UDP, TCP) and Application layer (HTTP) protocols. To extend the capacity of this Intrusion Detection System (IDS) we did a research stay in Colombia, with Universidad de Antioquia and in collaboration with Aligo (a cybersecurity company from Medellín). There, we created a new dataset based on real normal and attack traffic to physical IoT devices: the LATAM-DDoS-IoT dataset. We conducted binary and multiclass classifications with the DoS and the DDoS versions of this new dataset, getting an average accuracy of 99.967% and 98.872%, respectively. Then, we did two additional experiments combining our balanced version of the Bot-IoT dataset, applying transfer learning and a datasets concatenation, showing the differences between both domains and the generalization level we accomplished. Finally, we deployed our extended IDS (as a functional app built in Java and connected to an own cloud-hosted Python REST API) into a real-time SDN simulated environment, based on the Open Network Operating System (ONOS) controller and Mininet. We got a best accuracy of 94.608%, where 100% of the flows identified as attackers were correctly classified, and 91.406% of the attack flows were detected. This app can be further enhanced with the creation of an Intrusion Prevention System (IPS) as mitigation management strategy to stop the identified attackers.
Motor imagery analysis with deep learning for potential application in motor impairment rehabilitation
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2022) Lomelín Ibarra, Vicente Alejandro; CANTORAL CEBALLOS, JOSE ANTONIO; 261286; Cantoral Ceballos, José Antonio; emipsanchez; School of Engineering and Sciences; Campus Monterrey; Gutierrez Rodriguez, Andrés Eduardo
Motor imagery is a complex mental task that represents muscular movement without the execution of muscular action, involving cognitive processes of motor planning and sensorimotor proprioception of the body. The mental process signals of motor imagery are found in the cortical areas of sensory and motor processing of the brain. Since the mental task has similar behavior to that of the motor execution process, it is used to create rehabilitation routines for patients with a form of Motor Skill Impairment. Due to the nature of this mental task, its execution is complicated. It usually requires subject’s training to perform it adequately. The mental task has also proved to vary among subjects, making it difficult to create a general method to process the signals. EEG signal acquisition provides a non-invasive method to acquire electrical potentials generated by neural activity. The techniques provide good temporal resolution, but poor spatial resolution, acquiring signals from every area of the brain. This leads to the problem of mixing different signals from different cognitive processes. To compensate for this problem, filtering and feature extraction are required to isolate the desired signals. Due to this problem, the classification of these signals in scenarios such as Brain-Computer Interface systems tends to have a poor performance. Deep Learning has proved to improve the classification of data fed into it, identifying patterns corresponding to the signal of interest. Throughout this thesis project for the Computer Science Master’s Program, different deep learning architectures were designed in order to classify the execution of Motor Imagery. For this work, a variety of representation of the EEG signal were prepared to serve as an input for the models. Forms of representations include image-based spectrograms, 2D and 3D matrix arrangements, and 1D vectors. In addition, the generated samples consider a process of channel selection to limit the information to the region of interest of the motor cortex. Additionally, this work considers an asymmetric hemispheric channel selection in order to represent the state of the brain during the execution of the mental task at different areas of the motor cortex independently. The best results were observed with a single channel spectrogram representation of the signal as an input for a CNN model, with a reported classification accuracy of 93.3%. Promising results were also obtained through the 1D CNN models, with a classification accuracy of 86.12%. Although the results were not as high, promising results were observed with the 2D CNN models with a 2D and 3D matrix as their input, with reported accuracies that outperformed the state-of-the-art. Lastly, the implementation of sequential models to analyze the signal as a time series was able to return results that outperformed the state-of-the-art with the devised asymmetrical 9- and 5-Channel selection.
Detection of suspicious attitudes on video using neuroevolved shallow and deep neural networks models
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-11) Flores Munguía, Carlos; Terashima Marín, Hugo; puemcuervo/tolmquevedo; Oliva, Diego; Ortiz Bayliss, Jose Carlos; School of Engineering and Sciences; Campus Monterrey
The analysis of surveillance cameras is a critical task usually limited by the people involved in the video supervision devoted to such a task, their knowledge, and their judgment. Security guards protect other people from different events that can compromise their security, like robbery, extortion, fraud, vehicle theft, and more, converting them to an essential part of this type of protection system. If they are not paying attention, crimes may be overlooked. Nonetheless, different approaches have arisen to automate this task. The methods are mainly based on machine learning and benefit from developing neural networks that extract underlying information from input videos. However, despite how competent those networks have proved to be, developers must face the challenging task of defining the architecture and hyperparameters that allow the network to work adequately and optimize the use of computational resources. Furthermore, selecting the architecture and hyperparameters may significantly impact the neural networks’ performance if it is not carried out adequately. No matter the type of neural network used, shallow, dense, convolutional, 3D convolutional, or recurrent; hyperparameter selection must be performed using empirical knowledge thanks to the expertise of the designer, or even with the help of automated approaches like Random Search or Bayesian Optimization. However, such methods suffer from problems like not covering the solution space well, especially if the space is made up of large dimensions. Alternatively, the requirement to evaluate the models many times to get more information about the evaluation of the objective function, employing a diverse set of hyperparameters. This work proposes a model that generates, through a genetic algorithm, neural networks for behavior classification within videos. The application of genetic algorithms allows the exploration in the hyperparameters solution space in different directions simultaneously. Two types of neural networks are evolved as part of the thesis work: shallow and deep networks, the latter based on dense layers and 3D convolutions. Each sort of network takes distinct input data types: the evolution of people’s pose and videos’ sequences, respectively. Shallow neural networks are generated by NeuroEvolution of Augmented Topologies (NEAT), while CoDeepNEAT generates deep networks. NEAT uses a direct encoding, meaning that each node and connection in the network is directly represented in the chromosome. In contrast, CoDeepNEAT uses indirect encoding, making use of cooperative coevolution of blueprints and modules. This work trains networks and tests them using the Kranok-NV dataset, which exhibited better results than their competitors on various standard metrics.
Attention YOLACT++: achieving robust and real-time medical instrument segmentation in endoscopic procedures.
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-04) Ángeles Cerón, Juan Carlos; Chang Fernández, Leonardo; 345979; Chang Fernández, Leonardo; emipsanchez; González Mendoza, Miguel; Alí, Sharib; Escuela de Ingeniería y Ciencias; Campus Monterrey; Ochoa Ruiz, Gilberto
Image-based tracking of laparoscopic instruments via instance segmentation plays a fundamental role in computer and robotic-assisted surgeries by aiding surgical navigation and increasing patient safety. Despite its crucial role in minimally invasive surgeries, accurate tracking of surgical instruments is a challenging task to achieve because of two main reasons 1) complex surgical environment, and 2) lack of model designs with both high accuracy and speed. Previous attempts in the field have prioritized robust performance over real-time speed rendering them unfeasible for live clinical applications. In this thesis, we propose the use of attention mechanisms to significantly improve the recognition capabilities of YOLACT++, a lightweight single-stage instance segmentation architecture, which we target at medical instrument segmentation. To further improve the performance of the model, we also investigated the use of custom data augmentation, and anchor optimization via a differential evolution search algorithm. Furthermore, we investigate the effect of multi-scale feature aggregation strategies in the architecture. We perform ablation studies with Convolutional Block Attention and Criss-cross Attention modules at different stages in the network to determine an optimal configuration. Our proposed model CBAM-Full + Aug + Anch drastically outperforms the previous state-of-the art in commonly used robustness metrics in medical segmentation, achieving 0.435 MI_DSC and 0.471 MI_NSD while running at 69 fps, which is more than 12 points more robust in both metrics and 14 times faster than the previous best model. To our knowledge, this is the first work that explicitly focuses on both real-time performance and improved robustness.
Pre-diagnosis of diabetic retinopathy implementing supervised learning algorithms using an ocular fundus Latin-American dataset for cross-data validation
(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-02) De la Cruz Espinosa, Emanuel; FUENTES AGUILAR, RITA QUETZIQUEL; 229297; Fuentes Aguilar, Rita Quetziquel; emipsanchez; García González, Alejandro; Ochoa Ruiz, Gilberto; Abaunza González, Hernán; School of Engineering and Sciences; Campus Monterrey
Nowadays diabetes is a disease with worldwide presence and high mortality rate, causing a big social and economic impact. One of the major negative effects of diabetes is visual loss due to diabetic retinopathy (DR). To prevent this condition is necessary to identify referable patients by screening for DR, and complementing with an Optic Coherence Tomography (OCT), that is another study to perform an early detection of blindness doing several longitudinal scans at a series of lateral locations to generate a map of reflection sites in the sample and display it as a two-dimensional image achieving transmission images in turbid tissue. Regrettably the number of ophthalmologists and OCT devices is not enough to provide an adequate health care to the diabetic population. Although there exist AI systems capable of do DR screening, they do not aim the assessment specifically in macula area considering visible and proliferated anomalies, signs of high damage and late intervention. This work presents three surpevised machine learnig algorithms; a Random Forest (RF) classifier, a Convolutional Neural Network (CNN) model, and a transfer learning (TL) pretrained model able to sort fundus images in three classes as an fundus images exclusive database is labeled. Processing techniques such as channel splitting, color space transforms, histogram and spatial based filters and data augmentation are used in order to detect presence of diabetic retinopathy. The stages of this work are: Publicly available dataset debugging, macular segmentation and cropping, data pre-processing, features extraction, model training, test and validation performance evaluation with a exclusive Latin-American dataset considering accuracy, sensitivity and specificity as metrics. The best results achieved are a 61.22% of accuracy, 86.67% of sensitivity and 89.47% of specificity.

Ciencias Exactas y Ciencias de la Salud

Browse

Filters

Settings

Sort By

Results per page

Search Results