Ciencias Exactas y Ciencias de la Salud
Permanent URI for this collectionhttps://hdl.handle.net/11285/551039
Pertenecen a esta colección Tesis y Trabajos de grado de las Maestrías correspondientes a las Escuelas de Ingeniería y Ciencias así como a Medicina y Ciencias de la Salud.
Browse
Search Results
- Intent discovery from conversational logs to prepare a student admission chatbot for Tecnológico de Monterrey(Instituto Tecnológico y de Estudios Superiores de Monterrey, 2021-05) Treviño Lozano, Rolando; Hernández Gress, Neil; tolmquevedo; Alvarado Uribe, Joanna; Castro Sánchez, Noé Alejandro; Escuela de Ingeniería y Ciencias; Campus Monterrey; Ceballos Cancino, Héctor GibránOnline chat services allow companies to serve and attend to their customers to resolve problems or doubts about a specific concept. Lately, conversational bots have been adapting to this domain, allowing a broader attention capacity while easing interactions between users and the company while also easing work for agents, increasing productivity and service quality. To design a chatbot is a time-consuming task as the designer has to provide the core key concepts known as intents that the conversational bot will respond to and provide example sentences and their respective answers. We propose a framework that receives as input data corresponding to conversational transcripts between prospects and agents and transform them through the use of regular expressions into a tabular dataset of the conversations in log format easing their analysis and representation to be converted into a convenient word representation of TF-IDF which serves as input for applying unsupervised machine learning algorithms as Non-Matrix Factorization for Topic Modeling and K-Means for utterance clustering to discover possible intents, which can then be passed on to the design of a knowledge base, which this last step of intent discovery allows an iterative process to process new conversations and identify changes in the intents or the addition of new ones. Results demonstrate that it is possible to cluster the utterances and find clusters that align to a possible intent out of a list of possible intents and such list is subject to change in time for continuously improving intent discovery. A cosine similarity threshold was set at 0.47 to differentiate correctly aligned clusters from those not aligned; 18 intents out of 55 were able to be correctly aligned with an initial intents list, and a total of 35 different intents were able to be captured by the clustering process. No exact similar research was found in the literature, as other works on the domain imply an already curated and labeled dataset to being working on classifying the intents rather than discovering them during the knowledge base design, also they do not take into account the whole process of transforming the raw conversations into a tabular and processed dataset.