Improved Detection of Urolithiasis Using High-Resolution Computed Tomography Images by a Vision Transformer Model

Article information

Int Neurourol J. 2023;27(Suppl 2):S99-103

Publication date (electronic) : 2023 November 30

doi : https://doi.org/10.5213/inj.2346292.146

Hyoung Sun Choi ¹

, Jae Seoung Kim ²

, Taeg Keun Whangbo^,¹

, Sung Jong Eun^,³

¹Department of Computer Science, Gachon University, Seongnam, Korea

²Health IT Research Center, Gachon University Gil Medical Center, Incheon, Korea

³Digital Health Industry Team, National IT Industry Promotion Agency, Jincheon, Korea

Corresponding author: Taeg Keun Whangbo Department of Computer Science, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam 13120, Korea Email: tkwhangbo@gachon.ac.kr

Cor-corresponding author: Sung Jong Eun Digital Health Industry Team, National IT Industry Promotion Agency, 10 Jeongtong-ro, Deoksan-eup, Jincheon 27872, Korea Email: sjeun@nipa.kr

Received 2023 October 5; Accepted 2023 November 11.

Abstract

Purpose

Urinary stones cause lateral abdominal pain and are a prevalent condition among younger age groups. The diagnosis typically involves assessing symptoms, conducting physical examinations, performing urine tests, and utilizing radiological imaging. Artificial intelligence models have demonstrated remarkable capabilities in detecting stones. However, due to insufficient datasets, the performance of these models has not reached a level suitable for practical application. Consequently, this study introduces a vision transformer (ViT)-based pipeline for detecting urinary stones, using computed tomography images with augmentation.

Methods

The super-resolution convolutional neural network (SRCNN) model was employed to enhance the resolution of a given dataset, followed by data augmentation using CycleGAN. Subsequently, the ViT model facilitated the detection and classification of urinary tract stones. The model’s performance was evaluated using accuracy, precision, and recall as metrics.

Results

The deep learning model based on ViT showed superior performance compared to other existing models. Furthermore, the performance increased with the size of the backbone model.

Conclusions

The study proposes a way to utilize medical data to improve the diagnosis of urinary tract stones. SRCNN was used for data preprocessing to enhance resolution, while CycleGAN was utilized for data augmentation. The ViT model was utilized for stone detection, and its performance was validated through metrics such as accuracy, sensitivity, specificity, and the F1 score. It is anticipated that this research will aid in the early diagnosis and treatment of urinary tract stones, thereby improving the efficiency of medical personnel.

Keywords: Deep learning; Ureteral calculi; Urolithiasis; Machine learning; Artificial intelligence

INTRODUCTION

Urinary stones can cause lateral abdominal pain, such as side pain, and present in an intermittent pattern, temporarily subsiding before reappearing. In men, the pain may radiate to the lower abdomen, testicle, and scrotum, while in women, it may extend to the vagina. When stones are located near the bladder, symptoms of bladder irritation, such as frequent urination, may also occur. Severe pain can be accompanied by nausea, vomiting, and abdominal swelling, and urinary stones may also cause hematuria [1-3]. The primary cause of urinary tract stones is reduced water intake. This decrease leads to urinary crystals remaining in the urine for extended periods, increasing the likelihood of stone formation. Urinary tract stones are more common in men, occurring at a rate three times higher than in women, and are often seen in individuals in their 20s to 40s. The prevalence of this condition is particularly high in mountainous, desert, and tropical regions, suggesting that diet, temperature, and humidity may contribute to its development. Temperature and season are significant factors in the formation of urinary tract stones; during summer, concentrated urine from excessive sweating can facilitate stone production. Additionally, excessive sun exposure can lead to an overproduction of vitamin D, and a high intake of animal protein can increase the risk of urinary stones by raising the excretion of secondary calcium, phosphates, and uric acid [4,5].

Urinary tract stones can be diagnosed through the identification of the patient’s symptoms, physical examination, urinalysis, or radiographic imaging. Patients with suspected ureteral stones often experience severe pain in the back, and when a urinary tract infection is present, urinalysis may reveal an elevated white blood cell count in the urine. It is important to note that while a simple radiographic scan can confirm the presence of a stone, it cannot diagnose stones that are not radiopaque. Furthermore, ureteral stones may be obscured by pelvic bones or masked by fecal matter or other organs, complicating their detection. Therefore, to ascertain the presence of ureteral stones, hospitals may utilize angiography or computed tomography (CT) [6].

In the field of deep learning vision, high-performing technologies such as the convolutional neural network (CNN) [7] and transfer learning have been developed, along with the vision transformer (ViT) model [8], which has set new state-of-the-art records in many areas. Numerous studies employing artificial intelligence have been conducted in the medical and healthcare fields. Research has been carried out to detect kidney or urinary stones [9-12], manage urinary tract stones [13], predict the risk of urinary tract stones [14], and enhance the resolution of CT images using CNN [15]. However, the ViT model tends to underperform when fine-tuning with limited datasets. Therefore, in this paper, we propose a pipeline that utilizes medical data to determine the presence or absence of urinary tract stones with the ViT model.

MATERIALS AND METHODS

In this study, a total of 150 images were utilized, comprising 100 images of urinary stones and 50 images not containing urinary stones. These images were obtained from patients who visited Sejong Chungnam University Hospital and were verified by medical experts. The collected data were anonymized and, for research purposes, stored on a separate storage device in DICOM format. Fig. 1 represents the CT scans used in the research.

Fig. 1.

Sample images from the dataset: normal (A), stone (B), and ureter stone (C).

For stone detection, the ViT backbone model was utilized, necessitating the resizing of input images to 224×224 dimensions. A notable feature of the ViT model is its capacity to achieve outstanding performance when fine-tuned with high-resolution images, irrespective of the resolution of pre-trained images. To facilitate this, the technique of histogram smoothing [16] is applied. Histogram smoothing modifies the pixel value distribution within the image to create a more uniform distribution, which in turn enhances image contrast and improves performance.

Furthermore, extensive research has been conducted in the field of medical CT imaging, involving super-resolution techniques to enhance image resolution. Building on this research, the super-resolution CNN (SRCNN) [17] was employed in this experiment to increase the resolution of the data used. SRCNN is a technology that upscales low-resolution images to high resolution, restoring fine details and enhancing the overall image resolution. This improvement in data quality contributes to the enhancement of the model’s performance. The SRCNN employs bicubic interpolation to upscale a single low-resolution image to the desired size. It transforms the target low-resolution image, denoted as Y, into F(Y), with the goal of identifying a mapping function F that enables F(Y) to closely approximate the original high-resolution image X. The process adheres to the following steps:

F₁(Y)=max(0,W₁×Y+B₁) (1)

F₁(Y)=max(0,W₂×F₁(Y)+B₂) (2)

F(Y)=W₃×F₂(Y)+B₃ (3)

First, patch extraction and representation: This process involves extracting patches from the low-resolution image Wai using the first layer of a CNN and representing each patch as a high-level vector [1]. Next, nonlinear mapping: The high-dimensional vectors generated in the process [1] are nonlinearly mapped to other high-dimensional vectors. Finally, reconstruction: The high-resolution representations produced through the aforementioned processes are amalgamated to construct the final high-resolution image. At this stage, the resulting value should closely approximate the actual correct answer, X.

Data Augmentation

High-quality data play an important role in improving model performance. Therefore, extensive research has focused on data augmentation. This field has progressed from elementary techniques, such as random horizontal or vertical rotations and shifts of images, to more sophisticated methods including aspect ratio adjustments, image flipping, and the use of generative models.

Among them, CycleGAN [18] is a powerful model used to convert images into different styles within the framework of generative adversarial networks. The key idea behind this model is to effectively transform data from one domain to another while maintaining consistency with the original image. Cycle-GAN has been employed in various fields, including art, style transformation, photo coloring, and landscape transformation, significantly aiding in the creation of images with diverse styles and enhancing data diversity.

Vision Transformer

In this experiment, the ViT model played a crucial role in addressing the limitations of the existing CNN architecture regarding spatial structures. ViT segments images into small patches and converts these patches into vectors to represent the spatial structure of the images. This method relies on the attention mechanism and transformer architecture, facilitating efficient learning of both global and local features.

The advantage of ViT lies in its ability to learn efficiently while preserving the immutability of images. Consequently, this paper introduces a model designed to detect the presence or absence of urinary stones by effectively learning the spatial structure of CT images. The model identifies urinary tract stones by taking into account the spatial structure of the image, and Fig. 2 illustrates the architecture of this model. This approach allows ViT models to be utilized in experiments for efficient learning and to enhance the performance of urinary tract stone classification.

Fig. 2.

Model architecture. MLP, multilayer perceptron.

Model Evaluation

In this study, a 5-fold cross-validation method was employed to assess the model’s performance. Cross-validation is a technique used to evaluate the performance and generalization capabilities of machine learning models. It is particularly effective for small datasets, providing a more accurate evaluation than a single training-test split. The process involves dividing the entire dataset into k subsets (folds) and using each fold in turn for training and testing, ensuring that every data point is included in the test set at least once.

In this process, the model’s performance is measured on each fold, and the average of these results serves as the final performance indicator. The study employed accuracy, precision, and recall scores as performance evaluation metrics to assess the model’s effectiveness. Accuracy denotes the proportion of correct predictions out of all predictions made, precision indicates the proportion of positive predictions that are correct, and recall reflects the proportion of actual positive instances that were correctly identified. Furthermore, the diagnostic performance of the model in this study was validated using this method.

RESULTS

A pipeline for detecting the presence and absence of urinary stones was developed using a ViT model. To evaluate the model’s performance, accuracy, precision, and recall were used as indicators. Additionally, to ensure the model’s generalizability, 5-fold cross-validation was employed. Table 1 presents the results of the cross-validation comparison among different deep learning and machine learning models.

Table 1.

Comparison of model performance

DISCUSSION

Urinary tract stones cause lateral pain and various symptoms, and they occur with particularly high frequency in men and in young people in their 20s to 40s. The primary cause is an increase in urine concentration due to insufficient water intake. Additionally, environmental factors play a role, particularly in highland, desert, and tropical regions. Given these factors, the absence of urinary tract stones is regarded as a significant medical issue. The diagnosis of urinary tract stones may involve evaluating the patient’s symptoms, conducting a physical examination, performing a urine test, or utilizing imaging techniques. However, standard radiography may not always detect all stones, occasionally necessitating more advanced imaging methods such as angiography or CT scans.

This study introduces a pipeline designed to leverage medical data for the diagnosis of urinary tract stones. The dataset employed in this research comprised 150 CT images sourced from Sejong Chungnam National University Hospital. These included 100 images from patients diagnosed with urinary tract stones and 50 images from patients without urinary tract stones.

In the data preprocessing phase, the SRCNN model was utilized to enhance image resolution. This preprocessing step enhances the model’s performance and facilitates learning that takes into account the spatial structure.

Data augmentation plays an important role in improving the performance of the model, particularly through the application of technologies like CycleGAN to increase data diversity. Such improvements bolster the models’ generalization capabilities, allowing them to leverage a wide array of data styles.

The ViT model learns spatial structure by segmenting images into patches and transforming them into vectors, thereby enabling efficient learning while preserving the images’ immutability. Utilizing this model, we developed a system for detecting urinary tract stones, which demonstrated effective performance in our experimental results.

A 5-fold cross-verification method was used to evaluate the model’s performance, and accuracy, precision and recall scores were used as evaluation indicators. The results yielded a precision of 0.9396, a recall of 0.9476, and an accuracy of 0.93. These outcomes are anticipated to validate the model’s diagnostic capabilities and aid in the early diagnosis and treatment of patients with urinary tract stones.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

AUTHOR CONTRIBUTION STATEMENT

· Conceptualization: TKW

· Data curation: HSC

· Formal analysis: JSK

· Methodology: TKW

· Project administration: SJE

· Visualization: SJE

· Writing - original draft: HSC

· Writing - review & editing: JSK

References

1. Fwu CW, Eggers PW, Kimmel PL, Kusek JW, Krikali Z. Emergency department visits, use of imaging, and drugs for urolithiasis have increased in the United States. Kidney Int 2013;83:479–86.

2. Sakhaee K, Maalouf NM, Sinnott B. Clinical review. Kidney stones 2012: pathogenesis, diagnosis, and management. J Clin Endocrinol Metab 2012;97:1847–60.

3. Kim A, Ahn J, Choi WS, Park HK, Kim S, Paick SH, et al. What is the cause of recurrent urinary tract infection? Contemporary microscopic concepts of pathophysiology. Int Neurourol J 2021;25:192–201.

4. Parvivar F, Low RK, Stoller ML. The influence of diet on urinary stone disease. J Urol 1996;155:432–40.

5. Dauw CA, Alruwaily AF, Bierlein MJ, Asplin JR, Ghani KR, Wolf JS Jr, et al. Provider variation in the quality of metabolic stone management. J Urol 2015;193:885–90.

6. Andrabi Y, Patino M, Das CJ, Eisner B, Sahani DV, Kambadakone A. Advances in CT imaging for urolithiasis. Indian J Urol 2015;31:185–93.

7. LeCun Y, Bose B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1:541–51.

8. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, et al. An image is worth 16x16 words: transformers for image recognition at scale In: ICLR 2021; 2021 May 3-7, 2021.

9. Ma FZ, Sun T, Liu LY, Jing HY. Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Gener Comput Syst 2020;111:17–26.

10. Parakh A, Lee HK, Lee JH, Eisner BH, Sahani DV, Do SH. Urinary stone detection on CT images using deep convolutional neural networks evaluation of model performance and generalization. Radiol Artif Intell 2019;1e180066.

11. Eun SJ, Yun MS, Whangbo TK, Kim KH. A study on the optimal artificial intelligence model for determination of urolithiasis. Int Neurourol J 2022;26:210–8.

12. Park JM, Eun SJ, Na YG. Development and evaluation of urolithiasis detection technology based on a multimethod algorithm. Int Neurourol J 2023;27:70–6.

13. Anastasiadis A, Koudonas A, Langas G, Tsiakaras S, Memmos D, Mykoniatis I, et al. Transforming urinary stone disease management by artificial intelligence-based methods: a comprehensive review. Asian J Urol 2023;10:258–74.

14. Oh KJ. Risk factors for urinary stone. J Korean Med Assoc 2020;63:660–7.

15. Umehara K, Ota J, Ishida T. Application of super-resolution convolutional neural network for enhancing image resolution in chest CT. J Digit Imaging 2018;31:441–50.

16. Ayesha AS, Ghous BN, Mashal T, Adnan H. Investigation of histogram equalization filter for CT scan image enhancement. Biomedical Engineering 2019;31:1950038.

17. Dong C, Loy CC, He K, Tang X. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 2016;38:295–307.

18. Zhu JY, Park TS, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks In: 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-9; Venice, Italy. 2017. p. 2242-51. doi: 10.1109/ICCV.2017.244.

Article information Continued

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table 1.

Comparison of model performance

Cross validation	Model	Accuracy	Confusion matrix
Cross validation	Model	Accuracy	Precision	Recall
1	SVM	0.72	0.69	0.74
	ResNet-50	0.82	0.779	0.811
	ViT-B	0.92	0.921	0.953
2	SVM	0.75	0.72	0.766
	ResNet-50	0.84	0.821	0.879
	ViT-B	0.91	0.938	0.939
3	SVM	0.76	0.743	0.781
	ResNet-50	0.846	0.842	0.848
	ViT-B	0.93	0.947	0.981
4	SVM	0.71	0.712	0.708
	ResNet-50	0.866	0.841	0.893
	ViT-B	0.95	0.931	0.944
5	SVM	0.82	0.811	0.84
	ResNet-50	0.88	0.873	0.877
	ViT-B	0.94	0.961	0.921
Average Score	SVM	0.752	0.7352	0.767
	ResNet-50	0.8504	0.8312	0.8616
	ViT-B	0.93	0.9396	0.9476

SVM, support vector machine; ViT, vision transformer.