INTRODUCTION
Lower urinary tract symptoms (LUTS)/benign prostatic hyperplasia (BPH) is not a life-threatening malignant disease. Treatment practices are generally not standardized within a center because decision-making may vary according to patients’ preferences and the subjective judgment of the surgeon [
1,
2]. Bladder outlet obstruction (BOO) is one of the most important components to assess in patients with LUTS/BPH. However, it is difficult to implement urodynamic studies to diagnose BOO routinely in all patients [
3].
In our previous study, we developed a nomogram to predict having BOO and requiring prostatic surgery based on urodynamically determined BOO using our urodynamics database that we have developed over the last 13 years (since 2004) [
4]. These prediction models are expected to help support physicians in decision-making and in informing patients of their risk of requiring prostatic surgery. As a next step, it is necessary to confirm the transportability of the model into different but related populations. The successful implementation of prediction models in clinical practice generally requires validation of their performance [
5]. The successful estimation of model performance is needed to make judgments regarding model reproducibility and model transportability [
6].
Clinical prediction models are commonly developed to facilitate diagnostic or prognostic probability estimations in daily medical practice [
6]. A user-friendly modality is necessary for the use of such formulas and nomograms to become a widespread part of routine practice. However, any such app or software should be validated through usability testing among potential end-users. In this study, we aimed to externally validate the prediction model for both having BOO and requiring prostatic surgery using 2 independent data sets from tertiary referral centers and to validate a mobile app for using this model through usability testing.
MATERIALS AND METHODS
Subjects and Established Prediction Model
A retrospective review of medical and surgical records was performed for 2,544 consecutive patients who had undergone a urodynamic study for LUTS between January 2004 and April 2015 at 2 tertiary referral centers: Seoul National University Bundang Hospital (SNUBH) and Seoul Metropolitan Government-Seoul National University Boramae Medical Center (SMG-SNU BMC). The study design and the use of patients’ information stored in the hospital database were approved by the Institutional Review Board of SNUBH (B-1410-272-404) and SMG-SNU BMC (26-2014-99). This study was also approved by the Seoul National University Hospital (SNUH) (H-1406-119-591).
All datasets of the development cohort at SNUH and the external validation cohort at SNUBH and SMG-SNU BMC were constructed in the same manner. Patients with LUTS due to the following causes were excluded: urethral stricture, bladder stone, genitourinary infection or inflammation, previous genitourinary surgery, genitourinary radiation, urinary diversion, genitourinary malignancy, or a neurologic condition. The original prediction model was developed including age, the voiding and storage subscores of the International Prostate Symptom Score (IPSS) questionnaire, the quality of life (QoL) item of the IPSS, maximal flow rate (Qmax) of free flowmetry, postvoid residual (PVR) volume, total prostate volume (TPV) assessed through transrectal ultrasonography, and the BOO index (BOOI) from a pressure-flow study (PFS). All procedures of the urodynamic study were conducted in accordance with the standardization of the International Continence Society and following the same protocol with the same urodynamic instrument (UD-2000 or Solar, Medical Measurement System, Enschede, the Netherlands) [
7]. We excluded any patients missing at least one of the IPSS items and uroflowmetry parameters. We also excluded cases with a voided volume of less than 120 mL in free flowmetry.
Details regarding the development of the prediction formulas and nomograms have been presented in our previous study [
4]. Briefly, using multivariate logistic regression analysis, 2 formulas, including age, Qmax, and TPV, were developed for calculating the probability of having BOO, with 1 formula for when TPV is available and 1 for when TPV is not available. A total of 4 formulas, including age, IPSS scores, the IPSS QoL score, TPV, and BOOI, were generated for calculating the probability of prostatic surgery, with 1 formula for when all the above variables are available, and the other 3 formulas for cases where TPV and/or BOOI are missing. These formulas and nomograms were validated in the external validation cohort.
External Validation
In the demographic comparison between the development and the external validation cohorts, the Student t-test and the chi-square test were used. We assessed the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the prediction model on the external validation cohort. In addition, the predictive accuracy of this model was quantified using a receiver operating characteristic curve, and summarized using the area under the curve (AUC) and the Hosmer-Lemeshow goodness-of-fit test. A calibration plot was used to obtain a graphical representation of the extent of overestimation or underestimation of those who actually required surgery versus those for whom the nomogram predicted that prostatic surgery would be necessary. A decision curve analysis was performed to assess the value of the prediction model in terms of net benefit according to different probability thresholds.
Mobile App
An app named BPH Probability Calculator was developed for smartphones and tablet devices for the Android (Google, Mountain View, CA, USA) and iOS (Apple, Cupertino, CA, USA) platforms, as well as a web-based PC version (accessible at bph.snu.ac.kr) with the assistance of Neozensoft, Inc. (Seoul, Korea). Only the smartphone app was validated in this study.
When the app is started, the startup screen appears. It automatically switches to the disclaimer screen. There are a few warnings about using the app; after agreeing, the user can go to the main screen. This app provides 2 interrelated probabilities for having BOO and requiring prostatic surgery. The user is expected to provide the patient’s information to use the main functions of the app. The probability of having BOO is calculated from the clinical parameters of age, Qmax, and PVR, with or without TPV. The probability of prostatic surgery is predicted using age, Qmax, and PVR, with or without TPV and/or BOOI. When BOOI is not available, it can still be calculated by the predicted BOO probability obtained from the first part. Age, Qmax, and PVR are mandatory input items for a prediction to be generated, so these are marked with ‘*’ to indicate that they are required input fields. If any of these items are missing, the calculate button is not activated. If an excessively large number out of the normal range is entered, an error massage is displayed. The clear button is at the bottom of the screen and resets all the values at once. There is an ‘OK’ button in the input window, making it easier to move to the next item. After the user enters the required values and presses the calculate button, this app calculates the probabilities of having BOO or requiring prostatic surgery through the use of a formula. The answer is displayed as a percentage of predicted probability. The 95% confidence interval (CI) of the predicted probability is also provided for reference.
For the convenience of conversion between the 2 interrelated calculators, a button labeled ‘move to BPH surgery’ was made to jump directly to the calculator for prostatic surgery for a given value of probability of having BOO. If this button is used, the information previously entered in the calculator for having BOO is also automatically entered into the calculator for prostatic surgery. A mailing button at the top of the page was created so that the results can be received directly via e-mail. The smartphone app does not collect any protected health information. The provided information is not saved; it is used only for calculation.
Usability Testing
After we developed this app, usability testing was conducted between November 2015 and February 2016 at the Medical Device Usability Testing Center of the Biomedical Research Institute, SNUH. The usability testing was conducted in an environment similar to a typical clinic office: 150 lux lighting, 32 dB noise, 25°C temperature, 36.4% relative humidity, and a flat floor. To further simulate the environment of a real outpatient clinic, participants were free to communicate verbally while using the smartphone device. According to a predetermined scenario, the usability test comprised 3 phases: (1) identification of usability issues, (2) a usability questionnaire, and (3) face-to-face interviews by interviewers.
Eight young urologists were invited for usability testing to evaluate the BOO probability calculator and the prostatic surgery probability calculator, and to identify if there were any human factor issues. All subjects provided informed consent for inclusion before they participated in the study. Since all participants were very familiar with Android or iOS smartphones, there was no need for prior education on the smartphone operating system. Based on the usability test plan, 1 examiner and 1 observer led the test according to the procedure that had been developed prior to the test. Participants were asked to use both the Korean and English versions. Along with allowing a free exploration of the app, participants were asked to complete several semistructured tasks, such as calculating the probabilities, entering an excessively large number outside of the normal range, and checking the response of the app to a missing value. The participants were encouraged to communicate with staff via a think-aloud approach during the study.
After evaluating the app, each participant was asked to fill out a questionnaire, which included Likert-scale response options to assess the app, with the following 4 options: 1, very difficult; 2, difficult; 3, easy; and 4, very easy. The questionnaire consisted of a total of 20 questions and subjective opinions, of which 5 were on the BOO probability calculator, 8 on the prostatic surgery calculator, and 7 on the overall convenience of use (
Table 1). In the subjective evaluation section, participants described their subjective opinions narratively and made proposals for improving the ease of use.
Statistical Analysis
Numerical or categorical variables are presented as the mean±standard deviation or as absolute numbers with percentages. Two-sided P-values <0.05 were considered to indicate statistical significance, and 95% CIs were calculated. The statistical analysis was performed with IBM SPSS Statistics ver. 23.0 (IBM Co., Armonk, NY, USA) and the R statistical package system (R Foundation for Statistical Computing, Vienna, Austria;
http://www.R-project.org).
The criterion for passing the usability test was more than 90% of items having an average value of 3 points or more. The results of each item in the usability test were reported as median and range.
DISCUSSION
In this study, we externally validated the overall accuracy of previously proposed prediction models for BOO and prostatic surgery. These prediction models showed an excellent discriminant ability in the independent patient cohort. In addition, the models were well calibrated and had high net benefits. They could be used to support physicians in decision-making and in informing patients of their risk of undergoing prostatic surgery in routine clinical use among the general population.
In this external validation study, the prediction model for requiring prostatic surgery showed high values for accuracy, with a sensitivity of 80.6% and a specificity of 73.2%. The validity of the test is measured by sensitivity and specificity [
8]. This is because the PPV and NPV depend directly on the prevalence of the disease in the population. Assuming all other factors remain constant, NPVs will increase as prevalence decreases [
8]. In the present study, the NPVs were very high (92.0%). This means that the rate of patients requiring BPH surgery in the validation group was lower than in the original group among patients who underwent a urodynamic study.
The calibration plot for requiring prostatic surgery showed very high agreement. A decision curve analysis also showed high net benefits at all probability thresholds. Our predictive models were analyzed to determine whether they made accurate predictions of whether prostatic surgery was required, even without TPV or BOOI. These models are expected to be of great help for surgical decision-making in real clinical practice. However, in the external validation group, there were not many patients with a high probability, i.e., 70%–80% or more, in case of absence of BOOI or prostate volume, we should pay attention to the interpretation of these results. It is necessary to add more cases and revise the nomogram in the near future. The external validity of the prediction model of BOO was high, with an AUC of 79.5%. However, it decreased to 64.8% when TPV was absent. The specificity of the prediction model of BOO was 97.8%, and that of BOO without TPV also had a specificity of 96.7%. In contrast, the sensitivity of the prediction model for BOO was relatively low, with or without TPV, with values of only 24.0% and 11.5%. Several studies have explored the correlation between BOO and TPV. Although TPV can be an important factor in causing BOO, no strict relationship has been found between these parameters [
9]. It has been reported that not only prostate volume, but also morphological and/or functional changes, such as intravesical protrusion, the urethral prostatic angle, and the ratio of the transitional zone volume, affect BOO [
10-
12]. The combination of high specificity and low sensitivity for the prediction of BOO in this study indicates that age, Qmax, PVR, and TPV are the minimum necessary factors that cause BOO, while other factors may act indirectly.
The calibration plot of the prediction model of BOO showed that the overall accuracy was acceptable, although it displayed a tendency towards underestimation at low predicted probabilities and overestimation at high predicted probabilities. This means that conservative judgments are required when using these models to predict BOO in high-probability patients with a very large prostate or a very low Qmax. This was also confirmed by the decision curve analysis. The net benefit changed to a negative value at a probability of approximately 80%. In the BOO prediction model without TPV, although the calibration plot did not deviate much from the ideal line, the decision curve analysis showed that the net benefit was not high for all thresholds. Although the prostate volume is not strictly related to BOO, it can be inferred that predicting BOO without TPV is not accurate. In the case of the absence of a PFS for various reasons, obtaining information on the TPV may have clinical benefits for predicting BOO.
Many algorithms or formulas have been developed to support medical professionals in the diagnosis or management of certain diseases or conditions [
13]. However, directly applying such models in daily practice is not an easy task. A mobile or PC app can be used to support physicians in decision-making about patients with specific conditions. Algorithms or formulas can support medical professionals in the diagnosis or management of certain diseases or conditions. We first developed formulas for calculating the probabilities of both BOO and requiring prostate surgery, with the goal of creating a smartphone-based mobile health app and PC-based software to facilitate the widespread use of the formulas. Using such technologies is advantageous compared to conventional paper-based methods. First, data entry and computation are very efficient and fast. Second, such methods are easier, simpler, and more user-friendly. Therefore, the number of health care mobile apps predicting risk factors or prognoses is rapidly increasing. However, an app should be tested and validated among potential end-users before it comes into use in clinical practice [
14,
15]. If the user interface of the software is not well constructed, user errors can occur, which may lead to potentially serious consequences in the ultimate decision-making process. Therefore, usability testing is as important as developing the formulas.
Our study has a few limitations. First, differences were found in the baseline characteristics of patients between the original development cohort and the external validation cohort. However, no significant differences were found in the clinical parameters of patients who required BPH surgery. It is suggested that this reflects the environmental variation across referral centers, which differ in terms of the patient population, disease severity, socioeconomic status, and/or proximity to primary or secondary referral centers. Patients with more severe symptoms requiring surgery may be referred to an institution similar to the institution where the development cohort was created. It was also observed that the percentages of patients who received PFS and required surgery were 71.8% and 24.7%, respectively. However, no differences were found in the major parameters of patients who required surgery, such as BOOI, TPV, total IPSS, and the IPSS voiding subscale score. This means that patients who were referred to the 3 centers differed in their clinical characteristics, but that the decision about prostatic surgery was made on the basis of similar criteria in these 3 independent institutions. Therefore, it seems that there was no problem regarding the criteria for surgical decision-making in the development cohort.
Second, in this study, a usability test was performed among a relatively small number of participants at the end of the user-interface development. Additionally, the PC version of the software was not tested. There are 2 different types of usability tests in medical device development: formative and summative usability tests. A formative test is conducted during the development of the medical device to identify product strengths and shortcomings. The present test was performed as a summative test, where there should be at least 25 users to reveal subtle interaction issues that might not be identified in a test involving fewer participants [
16]. However, during the development of the apps, the authors circulated and reviewed development prototype versions more than 10 times before the usability test. Therefore, we believe that most user errors and interface errors were corrected before the test.
In conclusion, external validation of the newly developed probability models demonstrated a moderate level of discrimination, adequate calibration, and high net benefits for predicting both having BOO and requiring surgery. Moreover, usability testing showed that the smartphone app was user-friendly and that no major human errors occurred in its use.