Tesis de doctorado

Environmental monitoring to estimate indoor occupancy levels based on Semi-supervised machine learning and data fusion for building management

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

Occupancy information is essential for space management, energy efficiency, and in times of the COVID-19 pandemic, for crowd control. Obtaining labeled data is challenging due to hardware limitations, privacy considerations, and the required underlying costs. Furthermore, venues over 200 m2 require data fusion techniques. Therefore, this thesis mainly focuses on exploring the potential of Semi-Supervised Learning (SSL), which only needs a few labeled data and a large amount of unlabeled data, to estimate the occupancy levels in enclosed spaces. This study presents an empirical comparison between Supervised ML and SSL models as well as data fusion techniques in real-life university classrooms and offices (uncontrolled conditions) at the University of the West of England, Bristol, UK, and Tecnologico de Monterrey, Mexico. The data was collected for three weeks at each scenario using an in-house developed Internet of Things (IoT) device that measures air temperature, relative humidity, and atmospheric pressure. The ground truth records were gathered through manual logging of occupancy levels. Datasets’ sizes averaged 2350 entries with only 280 labeled instances per dataset. Support Vector Machine (SVM), Random Forest (RF), and Multi-Layer Perceptron (MLP) were used to define a performance baseline for supervised ML. Self-Training (ST) and Label Propagation (LP) were tested for SSL. In addition, several feature fusion methods were explored, including Chi-squared, ANOVA F-test, Spearman and Kendall’s Tau correlation, Mutual Information, Averages, Recursive Feature Elimination, and Principal Component Analysis. The models were evaluated using Accuracy, Precision, Recall, F1-score, Confusion Matrix, and High - Quality Supervised Baseline. ST achieved superior performance compared to baseline models (SVM, RF, MLP) with a highest average accuracy of 90.96% compared to SVM (86.66%). Furthermore, the data fusion results indicated that the Chi-squared approach for feature fusion outperformed others with an F1-score average of 95% and an accuracy average of 99%. These results demonstrate the effectiveness of SSL for indirect occupancy estimation while reducing the need for extensive data collection and labeling.

Description

https://orcid.org/0000-0002-2460-3442

Collections

Loading...

Document viewer

Select a file to preview:
Reload

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia