Architecture for a named entity recognition and relation extraction model using word embeddings variations for building a dynamic skills taxonomy
Citation
Share
Abstract
In this work, we present an architecture for extracting meaningful insights from unstructured documents through the use of Natural Language Processing (NLP) techniques to maintain a dynamic taxonomy of skills. NLP methods like Named Entity Recognition (NER) and Relation Extraction (RE), enable computers to find entities of interest such as skills and occupations in unstructured documents related to the Jobs Industry and the current and future state of job Knowledge, Skills and Abilities (KSA). The organization of a taxonomy of skills seeks to reflect the relations between occupations and the skills, knowledge and abilities required to perform it. It also aims to account for the current and future changes in the found relations. To do so, a Relation Extraction Model is proposed. This model is trained to find relations between entities like skills and occupations. It achieves this by having a general understanding of how skills, knowledge, abilities and occupations relate. These skills and occupations form the base for a hierarchical organized structure of concepts visualized as a related taxonomy. Current skills taxonomies are static and often built upon collected data that grows old quickly. Reports from the World Economic Forum (WEF) and the Organization for Economic Cooperation and Development (OECD) signal mismatches between current KSAs and future requirements due to emerging occupations and re-skilling needs. The architecture presented in this thesis enables a dynamic taxonomy capable of reflecting increasing, declining and mismatched skills in relation with distinct occupations. The results of its application are promising in terms of the models performance and accuracy. It has also proven to be effective in providing an end to end pipeline covering all aspects from the text collection gathering, its pre-processing, natural language processing and final visualization.
Description
https://orcid.org/0000-0002-6000-3452