Tesis doctorado / doctoral thesis

Architecture for a named entity recognition and relation extraction model using word embeddings variations for building a dynamic skills taxonomy

Loading...
Thumbnail Image

Citation

View formats

Share

Bibliographic managers

Abstract

In this work, we present an architecture for extracting meaningful insights from unstructured documents through the use of Natural Language Processing (NLP) techniques to maintain a dynamic taxonomy of skills. NLP methods like Named Entity Recognition (NER) and Relation Extraction (RE), enable computers to find entities of interest such as skills and occupations in unstructured documents related to the Jobs Industry and the current and future state of job Knowledge, Skills and Abilities (KSA). The organization of a taxonomy of skills seeks to reflect the relations between occupations and the skills, knowledge and abilities required to perform it. It also aims to account for the current and future changes in the found relations. To do so, a Relation Extraction Model is proposed. This model is trained to find relations between entities like skills and occupations. It achieves this by having a general understanding of how skills, knowledge, abilities and occupations relate. These skills and occupations form the base for a hierarchical organized structure of concepts visualized as a related taxonomy. Current skills taxonomies are static and often built upon collected data that grows old quickly. Reports from the World Economic Forum (WEF) and the Organization for Economic Cooperation and Development (OECD) signal mismatches between current KSAs and future requirements due to emerging occupations and re-skilling needs. The architecture presented in this thesis enables a dynamic taxonomy capable of reflecting increasing, declining and mismatched skills in relation with distinct occupations. The results of its application are promising in terms of the models performance and accuracy. It has also proven to be effective in providing an end to end pipeline covering all aspects from the text collection gathering, its pre-processing, natural language processing and final visualization.

Description

https://orcid.org/0000-0002-6000-3452

Collections

Loading...

Document viewer

Select a file to preview:
Reload

logo

El usuario tiene la obligación de utilizar los servicios y contenidos proporcionados por la Universidad, en particular, los impresos y recursos electrónicos, de conformidad con la legislación vigente y los principios de buena fe y en general usos aceptados, sin contravenir con su realización el orden público, especialmente, en el caso en que, para el adecuado desempeño de su actividad, necesita reproducir, distribuir, comunicar y/o poner a disposición, fragmentos de obras impresas o susceptibles de estar en formato analógico o digital, ya sea en soporte papel o electrónico. Ley 23/2006, de 7 de julio, por la que se modifica el texto revisado de la Ley de Propiedad Intelectual, aprobado

Licencia