Improved Kidney Stone Recognition Through Attention and Feature Fusion Strategies
Citation
Share
Abstract
Urolithiasis is the second most common kidney disease and is expected to increase its incidence rate in upcoming years. This disease refers to the formation of crystalline accretions from minerals dissolved in urine in the urinary tract (kidneys, ureters, and bladder) that cannot be expelled. Identifying the kidney stone type is considered crucial by many practitioners because it allows them to prescribe a proper treatment to eliminate kidney stones and most importantly, to avoid future relapses. For diagnostic purposes, the morpho-consitutional analysis (MCA) is the reference for ex-vivo stone characterisation. This analsysis consists of two complementary analyses. First, the visual examination under the microscope of the stone to obtain a description of the crystalline structure at different regions of the stone. Second, a FTIR that provides the biochemical composition of the kidney stone. The current clinical practices for removing kidney stones make increasing use of laser techniques for fragmenting the stone, such as ”dusting”, that reduces intervention time and the trauma for the patient, at the expense of losing important information about the morphology of the stone, which could lead to an incomplete or incorrect diagnosis. To overcome this issue, few experts visually identify the stone type on screen during the procedure. This visual kidney stone recognition by urologists is operator dependent and a great deal of experience is required due to the high similarities between classes. Therefore, AI techniques assessing endoscopic images could lead to automated and operator-independent in-vivo recognition. It has been proved that on ex-vivo data, with very controlled scenes and image acquisition conditions, kidney stones classification is indeed feasible. In the literature it has also been shown that classification on-the-vivo is also feasible using deep-learning architectures. This thesis presents a deep learning method for the extraction and fusion of information relating to kidney stone fragments acquired from different viewpoints of the endoscope. Surface and section fragment images are jointly used during the training of the classifier to improve the discrimination power of the features by adding attention layers at the end of each convolutional block. This approach is specifically designed to mimic the morpho-constitutional analysis performed in ex-vivo by biologists to visually identify kidney stones by inspecting both views. The addition of attention mechanisms to the backbone improved the results of single-view extraction backbones by 4% on average. Moreover, in comparison to the state-of the-art, the fusion of the deep features improved the overall results by up to 11% in terms of kidney stone classification accuracy.
Description
https://orcid.org/0000-0002-9896-8727