INTRODUCTION
Urolithiasis is among the most common urological conditions, affecting approximately 10% of adults worldwide and often recurring throughout life [
1,
2]. Timely and accurate detection is essential because stones can cause severe pain, urinary tract infection, or renal impairment. Abdominal computed tomography (CT) is considered the gold standard imaging modality for stone detection, although its diagnostic performance can vary depending on image quality and radiologist expertise [
3].
Recent advances in artificial intelligence (AI) and deep learning have enabled automated detection and classification of urological diseases, including urolithiasis [
4-
6]. Convolutional neural networks combined with transfer learning (TL) have achieved high accuracy when trained on large-scale labeled datasets. However, such models typically require thousands of annotated images, which are rarely available in real-world clinical practice due to the high cost and labor-intensive nature of data labeling [
7].
To overcome these limitations, self-supervised learning (SSL) has emerged as a powerful alternative. SSL enables representation learning from unlabeled data by defining proxy tasks such as contrastive learning [
8]. SimCLR, a representative SSL method, learns visual features by maximizing similarity between augmented views of the same image and minimizing similarity between different images [
9]. When combined with transfer learning, SSL can markedly improve generalization performance in small-data scenarios [
10].
The objective of this study was to develop and evaluate a data-efficient deep learning framework integrating SSL and TL for urolithiasis detection using limited CT scans. We hypothesized that SSL pretraining could extract meaningful feature representations that enhance model robustness even under data-constrained conditions.
MATERIALS AND METHODS
Dataset and Labeling
This study utilized 20 abdominal CT scans for urolithiasis detection. All images were anonymized before analysis and reconstructed into 3-dimensional volumes. Each case was labeled by an experienced urologist as either stone present (positive) or normal. Training was performed on axial slices containing relevant urinary structures. Representative CT slices from the dataset are shown
Fig. 1.
Preprocessing and Image Augmentation
CT images were resampled to 512 ×512 pixels, and intensity values were clipped to a window of -200 to 1,000 HU. Standard z-score normalization was applied. To improve generalization, data augmentation techniques including random rotation (±10°), horizontal flipping, and brightness adjustment were employed.
Self-Supervised Pretraining
A SimCLR-based contrastive learning approach [
10] was used for SSL pretraining. Using a ResNet50 encoder, augmented image pairs were generated from unlabeled CT slices. The model learned to maximize similarity for positive pairs and minimize it for negative pairs using the following loss function:
where sim(zi,zj) denotes cosine similarity between embeddings and τ is a temperature parameter. The overall SSL and transfer learning pipeline is illustrated in
Fig. 2.
Transfer Learning and Fine-Tuning
After SSL pretraining, transfer learning was applied using labeled CT data. Lower convolutional layers were frozen, while upper layers and the linear classifier were fine-tuned with a learning rate of 1 × 10−⁴using the Adam optimizer. Training was conducted with 5-fold cross-validation, ensuring patient-level separation between training and validation datasets.
Evaluation Metrics
Performance metrics included accuracy, precision, recall, F1-score, and AUC. All values were reported as mean ±standard deviation. Statistical significance was assessed using the paired t-test with a threshold of P < 0.05.
RESULTS
A comparison of model performance across the three learning strategies is summarized in
Table 1. The proposed SSL+TL model achieved superior performance, with an AUC of 0.95±0.02 and an F1-score of 0.91 ±0.03, significantly higher than both the random initialization model (AUC, 0.72 ±0.04) and the TL-only model (AUC, 0.85 ±0.03) (P <0.05). The receiver operating characteristic curves comparing the 3 learning strategies are shown in
Fig. 3. These findings confirm that SSL pretraining enhances feature generalization even under limited data conditions.
Training stability analysis indicated that the SSL+TL model converged within 30 epochs, exhibiting a steadily decreasing loss, whereas the TL-only model showed greater variance and trends toward overfitting. Cross-validation demonstrated consistent performance across folds (AUC, 0.93–0.97).
Feature representation analysis using t-distributed stochastic neighbor embedding revealed clear separation between normal and stone clusters in the SSL+TL model, whereas the random initialization and TL-only models showed substantial overlap (
Fig. 4).
Misclassification analysis showed that most errors occurred in cases with very small stones (<2 mm) or in high-density regions such as bowel gas and vascular calcifications, where similar intensity patterns led to confusion.
DISCUSSION
This study demonstrated that combining self-supervised and transfer learning enables robust urolithiasis detection from a small CT dataset. Unlike conventional supervised models that require large labeled datasets, the proposed SSL+TL framework effectively leveraged unlabeled data to pretrain meaningful feature representations.
The SSL+TL model achieved an AUC of 0.95 and an F1-score of 0.91, outperforming both baseline approaches. The pretraining phase allowed the model to learn morphological and contextual features of stones, facilitating efficient fine-tuning with limited labeled data. Consistent cross-validation results and faster convergence further indicate enhanced model stability and reduced overfitting.
From a clinical standpoint, this framework demonstrates that AI models can be feasibly developed even in data-limited environments such as small hospitals. SSL provides a data-efficient foundation for future extensions to tasks such as stone size and location analysis or other urological imaging applications.
Study limitations include the small dataset size, single-center design, and binary classification restricted to stone presence. Future research should incorporate multi-institutional datasets, evaluate advanced SSL variants (e.g., BYOL, MoCo), and validate the framework within real-world clinical workflows.
In conclusion, the proposed SSL+TL-based framework offers a practical and data-efficient solution for AI model development under data scarcity, presenting a promising direction for medical image analysis in urology.