Development of Decision Support Formulas for the Prediction of Bladder Outlet Obstruction and Prostatic Surgery in Patients With Lower Urinary Tract Symptom/Benign Prostatic Hyperplasia: Part II, External Validation and Usability Testing of a Smartphone App
Article information
Abstract
Purpose
We aimed to externally validate the prediction model we developed for having bladder outlet obstruction (BOO) and requiring prostatic surgery using 2 independent data sets from tertiary referral centers, and also aimed to validate a mobile app for using this model through usability testing.
Methods
Formulas and nomograms predicting whether a subject has BOO and needs prostatic surgery were validated with an external validation cohort from Seoul National University Bundang Hospital and Seoul Metropolitan Government-Seoul National University Boramae Medical Center between January 2004 and April 2015. A smartphone-based app was developed, and 8 young urologists were enrolled for usability testing to identify any human factor issues of the app.
Results
A total of 642 patients were included in the external validation cohort. No significant differences were found in the baseline characteristics of major parameters between the original (n=1,179) and the external validation cohort, except for the maximal flow rate. Predictions of requiring prostatic surgery in the validation cohort showed a sensitivity of 80.6%, a specificity of 73.2%, a positive predictive value of 49.7%, and a negative predictive value of 92.0%, and area under receiver operating curve of 0.84. The calibration plot indicated that the predictions have good correspondence. The decision curve showed also a high net benefit. Similar evaluation results using the external validation cohort were seen in the predictions of having BOO. Overall results of the usability test demonstrated that the app was user-friendly with no major human factor issues.
Conclusions
External validation of these newly developed a prediction model demonstrated a moderate level of discrimination, adequate calibration, and high net benefit gains for predicting both having BOO and requiring prostatic surgery. Also a smartphone app implementing the prediction model was user-friendly with no major human factor issue.
INTRODUCTION
Lower urinary tract symptoms (LUTS)/benign prostatic hyperplasia (BPH) is not a life-threatening malignant disease. Treatment practices are generally not standardized within a center because decision-making may vary according to patients’ preferences and the subjective judgment of the surgeon [1,2]. Bladder outlet obstruction (BOO) is one of the most important components to assess in patients with LUTS/BPH. However, it is difficult to implement urodynamic studies to diagnose BOO routinely in all patients [3].
In our previous study, we developed a nomogram to predict having BOO and requiring prostatic surgery based on urodynamically determined BOO using our urodynamics database that we have developed over the last 13 years (since 2004) [4]. These prediction models are expected to help support physicians in decision-making and in informing patients of their risk of requiring prostatic surgery. As a next step, it is necessary to confirm the transportability of the model into different but related populations. The successful implementation of prediction models in clinical practice generally requires validation of their performance [5]. The successful estimation of model performance is needed to make judgments regarding model reproducibility and model transportability [6].
Clinical prediction models are commonly developed to facilitate diagnostic or prognostic probability estimations in daily medical practice [6]. A user-friendly modality is necessary for the use of such formulas and nomograms to become a widespread part of routine practice. However, any such app or software should be validated through usability testing among potential end-users. In this study, we aimed to externally validate the prediction model for both having BOO and requiring prostatic surgery using 2 independent data sets from tertiary referral centers and to validate a mobile app for using this model through usability testing.
MATERIALS AND METHODS
Subjects and Established Prediction Model
A retrospective review of medical and surgical records was performed for 2,544 consecutive patients who had undergone a urodynamic study for LUTS between January 2004 and April 2015 at 2 tertiary referral centers: Seoul National University Bundang Hospital (SNUBH) and Seoul Metropolitan Government-Seoul National University Boramae Medical Center (SMG-SNU BMC). The study design and the use of patients’ information stored in the hospital database were approved by the Institutional Review Board of SNUBH (B-1410-272-404) and SMG-SNU BMC (26-2014-99). This study was also approved by the Seoul National University Hospital (SNUH) (H-1406-119-591).
All datasets of the development cohort at SNUH and the external validation cohort at SNUBH and SMG-SNU BMC were constructed in the same manner. Patients with LUTS due to the following causes were excluded: urethral stricture, bladder stone, genitourinary infection or inflammation, previous genitourinary surgery, genitourinary radiation, urinary diversion, genitourinary malignancy, or a neurologic condition. The original prediction model was developed including age, the voiding and storage subscores of the International Prostate Symptom Score (IPSS) questionnaire, the quality of life (QoL) item of the IPSS, maximal flow rate (Qmax) of free flowmetry, postvoid residual (PVR) volume, total prostate volume (TPV) assessed through transrectal ultrasonography, and the BOO index (BOOI) from a pressure-flow study (PFS). All procedures of the urodynamic study were conducted in accordance with the standardization of the International Continence Society and following the same protocol with the same urodynamic instrument (UD-2000 or Solar, Medical Measurement System, Enschede, the Netherlands) [7]. We excluded any patients missing at least one of the IPSS items and uroflowmetry parameters. We also excluded cases with a voided volume of less than 120 mL in free flowmetry.
Details regarding the development of the prediction formulas and nomograms have been presented in our previous study [4]. Briefly, using multivariate logistic regression analysis, 2 formulas, including age, Qmax, and TPV, were developed for calculating the probability of having BOO, with 1 formula for when TPV is available and 1 for when TPV is not available. A total of 4 formulas, including age, IPSS scores, the IPSS QoL score, TPV, and BOOI, were generated for calculating the probability of prostatic surgery, with 1 formula for when all the above variables are available, and the other 3 formulas for cases where TPV and/or BOOI are missing. These formulas and nomograms were validated in the external validation cohort.
External Validation
In the demographic comparison between the development and the external validation cohorts, the Student t-test and the chi-square test were used. We assessed the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the prediction model on the external validation cohort. In addition, the predictive accuracy of this model was quantified using a receiver operating characteristic curve, and summarized using the area under the curve (AUC) and the Hosmer-Lemeshow goodness-of-fit test. A calibration plot was used to obtain a graphical representation of the extent of overestimation or underestimation of those who actually required surgery versus those for whom the nomogram predicted that prostatic surgery would be necessary. A decision curve analysis was performed to assess the value of the prediction model in terms of net benefit according to different probability thresholds.
Mobile App
An app named BPH Probability Calculator was developed for smartphones and tablet devices for the Android (Google, Mountain View, CA, USA) and iOS (Apple, Cupertino, CA, USA) platforms, as well as a web-based PC version (accessible at bph.snu.ac.kr) with the assistance of Neozensoft, Inc. (Seoul, Korea). Only the smartphone app was validated in this study.
When the app is started, the startup screen appears. It automatically switches to the disclaimer screen. There are a few warnings about using the app; after agreeing, the user can go to the main screen. This app provides 2 interrelated probabilities for having BOO and requiring prostatic surgery. The user is expected to provide the patient’s information to use the main functions of the app. The probability of having BOO is calculated from the clinical parameters of age, Qmax, and PVR, with or without TPV. The probability of prostatic surgery is predicted using age, Qmax, and PVR, with or without TPV and/or BOOI. When BOOI is not available, it can still be calculated by the predicted BOO probability obtained from the first part. Age, Qmax, and PVR are mandatory input items for a prediction to be generated, so these are marked with ‘*’ to indicate that they are required input fields. If any of these items are missing, the calculate button is not activated. If an excessively large number out of the normal range is entered, an error massage is displayed. The clear button is at the bottom of the screen and resets all the values at once. There is an ‘OK’ button in the input window, making it easier to move to the next item. After the user enters the required values and presses the calculate button, this app calculates the probabilities of having BOO or requiring prostatic surgery through the use of a formula. The answer is displayed as a percentage of predicted probability. The 95% confidence interval (CI) of the predicted probability is also provided for reference.
For the convenience of conversion between the 2 interrelated calculators, a button labeled ‘move to BPH surgery’ was made to jump directly to the calculator for prostatic surgery for a given value of probability of having BOO. If this button is used, the information previously entered in the calculator for having BOO is also automatically entered into the calculator for prostatic surgery. A mailing button at the top of the page was created so that the results can be received directly via e-mail. The smartphone app does not collect any protected health information. The provided information is not saved; it is used only for calculation.
Usability Testing
After we developed this app, usability testing was conducted between November 2015 and February 2016 at the Medical Device Usability Testing Center of the Biomedical Research Institute, SNUH. The usability testing was conducted in an environment similar to a typical clinic office: 150 lux lighting, 32 dB noise, 25°C temperature, 36.4% relative humidity, and a flat floor. To further simulate the environment of a real outpatient clinic, participants were free to communicate verbally while using the smartphone device. According to a predetermined scenario, the usability test comprised 3 phases: (1) identification of usability issues, (2) a usability questionnaire, and (3) face-to-face interviews by interviewers.
Eight young urologists were invited for usability testing to evaluate the BOO probability calculator and the prostatic surgery probability calculator, and to identify if there were any human factor issues. All subjects provided informed consent for inclusion before they participated in the study. Since all participants were very familiar with Android or iOS smartphones, there was no need for prior education on the smartphone operating system. Based on the usability test plan, 1 examiner and 1 observer led the test according to the procedure that had been developed prior to the test. Participants were asked to use both the Korean and English versions. Along with allowing a free exploration of the app, participants were asked to complete several semistructured tasks, such as calculating the probabilities, entering an excessively large number outside of the normal range, and checking the response of the app to a missing value. The participants were encouraged to communicate with staff via a think-aloud approach during the study.
After evaluating the app, each participant was asked to fill out a questionnaire, which included Likert-scale response options to assess the app, with the following 4 options: 1, very difficult; 2, difficult; 3, easy; and 4, very easy. The questionnaire consisted of a total of 20 questions and subjective opinions, of which 5 were on the BOO probability calculator, 8 on the prostatic surgery calculator, and 7 on the overall convenience of use (Table 1). In the subjective evaluation section, participants described their subjective opinions narratively and made proposals for improving the ease of use.
Statistical Analysis
Numerical or categorical variables are presented as the mean±standard deviation or as absolute numbers with percentages. Two-sided P-values <0.05 were considered to indicate statistical significance, and 95% CIs were calculated. The statistical analysis was performed with IBM SPSS Statistics ver. 23.0 (IBM Co., Armonk, NY, USA) and the R statistical package system (R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org).
The criterion for passing the usability test was more than 90% of items having an average value of 3 points or more. The results of each item in the usability test were reported as median and range.
RESULTS
Original Development Cohort Versus External Validation Cohort
A total of 642 patients (417 from SNUBH, 225 from SMG-SNU BMC) were included in this external validation study. There were 18 patients (2.8%) without BOOI, 22 patients (3.4%) without TPV, and 2 patients (0.3%) without either. No significant difference in age was found between the patients in the original group and the validation group. However, the patients in the validation group had smaller prostate volumes (47.5±23.2 mL vs. 39.0±22.9 mL, P<0.001) and lower BOOI scores (33.2±23.2 vs. 25.8±30.7, P<0.001) than the patients in original group (Table 2).
Decision Versus Actual Surgery
In original development cohort, 744 of the 1,179 patients (63.1%) were judged as requiring prostatic surgery, and 627 patients (84.3%) actually underwent surgery. In the external validation cohort, 158 patients (25.2%) were judged as requiring prostatic surgery, and 111 patients (70.3%) actually underwent surgery. Among the patients determined to require prostatic surgery, no significant difference was found in the baseline characteristics of major parameters between the original and validation groups, except Qmax (10.9±4.3 vs. 12.9±5.0, P<0.001). The TPV was likewise not significantly different between the 2 cohorts among the patients determined to require prostatic surgery (54.0±24.3 mL vs. 52.5±27.6 mL). The BOOI was slightly higher in the external validation cohort, but this difference was not statistically significant (39.7±24.1 vs. 43.1±28.1).
External Validation of the Accuracy of the Nomograms
The prediction model of BOO in the external validation cohort showed a sensitivity of 24.0%, a specificity of 97.8%, a PPV of 72.0%, and an NPV of 84.4%. In the BOO prediction model without TPV, the sensitivity was 11.5%, the specificity was 96.9%, the PPV was 47.4%, and the NPV was 82.1%. The model that incorporated all the variables had an AUC of 0.785 (95% CI, 0.725–0.845), while the model without TPV had an AUC of 0.648 (95% CI, 0.576–0.721) (Fig. 1). The calibration plot showed a good agreement between the expected and observed rates in both models (Fig. 2). In addition, the decision curve showed also a high net benefit across the entire spectrum of probability thresholds in both models (Fig. 3).
Similar results were seen in the prediction model of prostatic surgery. In the external validation cohort, the prediction model of prostatic surgery showed a sensitivity of 80.6%, a specificity of 73.2%, a PPV of 49.7%, and an NPV of 92.0%. The model showed an AUC of 0.84 (95% CI, 0.838–0.913) (Fig. 1B). The calibration plot demonstrated that this prediction model showed a good correspondence between the predicted probability of prostatic surgery and the actual rate of requiring surgery (Fig. 2C). The overall model performance was excellent, with a scaled Brier score of 0.28 and an explained variance (R2) of 0.36. In addition, the decision curve showed also a high net benefit across the entire spectrum of probability thresholds (Fig. 3C).
Usability Testing
Fig. 4 shows the user interface of the app. The mean age of the participants was 35.0±4.2 years (range, 28–41 years), and they were male urology residents or fellows with average work experience of 4.6±2.0 years (range, 2–8 years) in urological patient care. None had hearing or visual impairments.
On the questionnaire survey, all 20 items (100%) exceeded 2 points (difficult) on average. The average score of the items on the questionnaire was 3.3, with a standard deviation of 0.4 (Table 1). The questionnaire satisfied the previously established passing criterion that 90% or more items had an average value of 3 points or more.
The score of question 19 (Q19) was the lowest, with an average of 2.2 points. Q19 was a question about recognizing whether the results corresponded to having all parameters present or were based on missing information. The overall data of this usability test demonstrated that the app was user-friendly, and no major human errors occurred in its use.
DISCUSSION
In this study, we externally validated the overall accuracy of previously proposed prediction models for BOO and prostatic surgery. These prediction models showed an excellent discriminant ability in the independent patient cohort. In addition, the models were well calibrated and had high net benefits. They could be used to support physicians in decision-making and in informing patients of their risk of undergoing prostatic surgery in routine clinical use among the general population.
In this external validation study, the prediction model for requiring prostatic surgery showed high values for accuracy, with a sensitivity of 80.6% and a specificity of 73.2%. The validity of the test is measured by sensitivity and specificity [8]. This is because the PPV and NPV depend directly on the prevalence of the disease in the population. Assuming all other factors remain constant, NPVs will increase as prevalence decreases [8]. In the present study, the NPVs were very high (92.0%). This means that the rate of patients requiring BPH surgery in the validation group was lower than in the original group among patients who underwent a urodynamic study.
The calibration plot for requiring prostatic surgery showed very high agreement. A decision curve analysis also showed high net benefits at all probability thresholds. Our predictive models were analyzed to determine whether they made accurate predictions of whether prostatic surgery was required, even without TPV or BOOI. These models are expected to be of great help for surgical decision-making in real clinical practice. However, in the external validation group, there were not many patients with a high probability, i.e., 70%–80% or more, in case of absence of BOOI or prostate volume, we should pay attention to the interpretation of these results. It is necessary to add more cases and revise the nomogram in the near future. The external validity of the prediction model of BOO was high, with an AUC of 79.5%. However, it decreased to 64.8% when TPV was absent. The specificity of the prediction model of BOO was 97.8%, and that of BOO without TPV also had a specificity of 96.7%. In contrast, the sensitivity of the prediction model for BOO was relatively low, with or without TPV, with values of only 24.0% and 11.5%. Several studies have explored the correlation between BOO and TPV. Although TPV can be an important factor in causing BOO, no strict relationship has been found between these parameters [9]. It has been reported that not only prostate volume, but also morphological and/or functional changes, such as intravesical protrusion, the urethral prostatic angle, and the ratio of the transitional zone volume, affect BOO [10-12]. The combination of high specificity and low sensitivity for the prediction of BOO in this study indicates that age, Qmax, PVR, and TPV are the minimum necessary factors that cause BOO, while other factors may act indirectly.
The calibration plot of the prediction model of BOO showed that the overall accuracy was acceptable, although it displayed a tendency towards underestimation at low predicted probabilities and overestimation at high predicted probabilities. This means that conservative judgments are required when using these models to predict BOO in high-probability patients with a very large prostate or a very low Qmax. This was also confirmed by the decision curve analysis. The net benefit changed to a negative value at a probability of approximately 80%. In the BOO prediction model without TPV, although the calibration plot did not deviate much from the ideal line, the decision curve analysis showed that the net benefit was not high for all thresholds. Although the prostate volume is not strictly related to BOO, it can be inferred that predicting BOO without TPV is not accurate. In the case of the absence of a PFS for various reasons, obtaining information on the TPV may have clinical benefits for predicting BOO.
Many algorithms or formulas have been developed to support medical professionals in the diagnosis or management of certain diseases or conditions [13]. However, directly applying such models in daily practice is not an easy task. A mobile or PC app can be used to support physicians in decision-making about patients with specific conditions. Algorithms or formulas can support medical professionals in the diagnosis or management of certain diseases or conditions. We first developed formulas for calculating the probabilities of both BOO and requiring prostate surgery, with the goal of creating a smartphone-based mobile health app and PC-based software to facilitate the widespread use of the formulas. Using such technologies is advantageous compared to conventional paper-based methods. First, data entry and computation are very efficient and fast. Second, such methods are easier, simpler, and more user-friendly. Therefore, the number of health care mobile apps predicting risk factors or prognoses is rapidly increasing. However, an app should be tested and validated among potential end-users before it comes into use in clinical practice [14,15]. If the user interface of the software is not well constructed, user errors can occur, which may lead to potentially serious consequences in the ultimate decision-making process. Therefore, usability testing is as important as developing the formulas.
Our study has a few limitations. First, differences were found in the baseline characteristics of patients between the original development cohort and the external validation cohort. However, no significant differences were found in the clinical parameters of patients who required BPH surgery. It is suggested that this reflects the environmental variation across referral centers, which differ in terms of the patient population, disease severity, socioeconomic status, and/or proximity to primary or secondary referral centers. Patients with more severe symptoms requiring surgery may be referred to an institution similar to the institution where the development cohort was created. It was also observed that the percentages of patients who received PFS and required surgery were 71.8% and 24.7%, respectively. However, no differences were found in the major parameters of patients who required surgery, such as BOOI, TPV, total IPSS, and the IPSS voiding subscale score. This means that patients who were referred to the 3 centers differed in their clinical characteristics, but that the decision about prostatic surgery was made on the basis of similar criteria in these 3 independent institutions. Therefore, it seems that there was no problem regarding the criteria for surgical decision-making in the development cohort.
Second, in this study, a usability test was performed among a relatively small number of participants at the end of the user-interface development. Additionally, the PC version of the software was not tested. There are 2 different types of usability tests in medical device development: formative and summative usability tests. A formative test is conducted during the development of the medical device to identify product strengths and shortcomings. The present test was performed as a summative test, where there should be at least 25 users to reveal subtle interaction issues that might not be identified in a test involving fewer participants [16]. However, during the development of the apps, the authors circulated and reviewed development prototype versions more than 10 times before the usability test. Therefore, we believe that most user errors and interface errors were corrected before the test.
In conclusion, external validation of the newly developed probability models demonstrated a moderate level of discrimination, adequate calibration, and high net benefits for predicting both having BOO and requiring surgery. Moreover, usability testing showed that the smartphone app was user-friendly and that no major human errors occurred in its use.
Notes
Fund/Grant Support
This study was supported by grant from the Seoul National University Hospital Research Fund (34-2014-0100).
Research Ethics
The study design and the use of patients’ information stored in the hospital database were approved by the Institutional Review Board of Seoul National University Bundang Hospital (B-1410-272-404) and Seoul Metropolitan Government-Seoul National University Boramae Medical Center (26-2014-99). This study was also approved by the Seoul National University Hospital (H-1406-119-591).
Conflict of Interest
MSC, a member of the Editorial Board of INJ, is the first author of this article. However, he played no role whatsoever in the editorial evaluation of this article or the decision to publish it. No potential conflict of interest relevant to this article was reported.
Acknowledgements
Kwi-Shik Kim and Yu-Kyung Lee assisted with database management. Banseok Han and Youngah Kim performed the usability test.