Text is machine generated and may contain inaccuracies. View Plain Language Summary safety and compliance information
Logistic models are everywhere in scientific research, including observational studies, clinical and cancer research, health care investigations, and so on. In most cases, people would report a table, including odds ratios (ORs), ie, the mean value for OR, and 95% CI on the ORs and perhaps area under the curve (AUC) of a receiver operating characteristic (ROC). However, there exist other powerful statistical tools that can make a logistic model more thorough or complete, that is, create a more vivid and comprehensive logistic model, therefore clinically more helpful to our patients, clinicians, and researchers.
Logistic models are also applicable to count data, for instance, the number of breast cancer deaths. Besides, it is robust to considerable proportion of zeros in a data, making it a strong replacement to negative binomial model, Tobit model, Poisson model, and so on. Moreover, its applications may be found in remarkably different areas such as banking service industry (fraud detection), hotel booking, gaming, data mining and machine learning world (neural network, AI), presidential election, and more.
With available data from a published study on head and neck cancer, we are presenting several statistical tools that were developed in recent years, though they were invented at an earlier time, such as forest plot, AUC curve, nomogram, and decision curve analysis, as well as bootstrapping sampling and cross-validation, to enhance our presentation for and understanding on a logistic model.
We provide 5 figures relevant to the logistic model in the following contents with interpretations on how to read the figures statistically and clinically. At the same time, we will introduce key statistical concepts in understanding a logistic model.
It was our hope that with more and more clinicians and researchers making efforts to utilize more or all these statistical tools, our research data can be used to its full potential and beneficial more on our research and patients. Otherwise, we may have wasted largely on the valuable data.
METHODS AND RESULTS
Part 1: A Forest Plot Makes a Logistic Model Easier and Vivid to Catch Up the Overall Pictures on a Study -
Logistic models have become popular in many fields, including observational studies, clinical or cancer research. What we can do is more than just 1 or 2 odds ratios. We can seek more tools to enhance the model, improve our understanding of the model and visualize it with more information and higher confidence. A forest plot is the one, though often it is designed for meta-analysis.
Our logistic regression: the outcome variable is the dose-limiting toxicity (yes or no), the predictors (alternative names -- explanatory variables or independent variables) are Race+Sex+Age+Smoke+ECOG Performance Status+BMI+Sarcopenia.
Key concepts-proportion, odds, raw and adjusted odds ratio.
Raw odds ratio (1 vs. 2)=Odds in category 1/Odds in category 2, for example, 1=White, 2=non-White, or 1=Male, and 2=Female, etc. "raw" means we focus on one variable contrast to the so-called adjustment with other variables involved.
Adjusted odds ratios are derived from the logistic regression coefficients, which are for one variable (age, for example) at a time while holding other variables fixed (constant).
We see the following for the variable age (>=70 vs. <70):
Coef S.E. Wald Z Pr(>|Z|)
Age: 0.9421 0.3680 2.56 0.0105
The (mean) OR for age >=70 versus <70-year-old=exp (0.9421) ∼2.565, so the older people are of 2.565 times more likely to experience the toxicity with a statistical significance P value 0.0105! Correspondingly, the 95% CI of the OR is exp (coefficient+/-1.96* SE) = exp (0.9421+/-1.96*0.3680)=(1.247, 5.277), which contains the mean value 2.565 as the central point, where exp=2.71828182845... (Euler's number, a constant like π).
For OR <1, such as BMI OR=0.97, it says that relative to BMI <30 (the reference group), the patients with BMI >=30 will be less likely to experience the toxicity but since the 95% CI includes OR=1.0 (this happens when no group difference at all!) so the P value will tell the same story -- no statistical significance.
To make a forest plot, we put all the OR and their 95% CI in a graph. The OR in Figure 1 is adjusted OR, that is, it is the OR when there are other covariates (take one variable for consideration so we had call others as covariates) in the model; the raw OR can be obtained from the events/total events; for example, the OR of White versus non-White=(91/230) / (5/24) ∼1.899, and the adjusted OR is 1.98.
In a logistic mode, the values of the predictors can be any values, from negative infinity to positive infinity, depending upon their type: categorical, continuous, nominal, or ordinal. Why? Because p (outcome Y=1)=1/[1+exp (- (β + β ×1 + β ×2 + β ×3 + ...))], if any x value approaches positive infinity, then p (Y=1) ∼1/ [1 + exp (-∞)] ∼1/ [1 + 0]=1; if any x approaches negative infinity, then p (Y=1) ∼1/[1 + ∞] ∼0. This is a remarkable advantage compared with other linear regressions, where we are not able to deal with a very large value of a variable. Here ∼reads approximately, to replace equal sign=for infinity things.
A hint -- a forest plot may be equipped with other clinical measurements in daily research, to replace OR, such as risk ratio, relative risk, risk difference, effect size (for instance, Cohen d=mean difference/SD), Pearson coefficient r, and so on.
Part 2: Add AUC and ROC to Evaluate and Summarize the Model: Something like a R-square Value in a Linear Regression Model -
We could estimate the OR for each predictor to compare their contribution to the whole model, then we arrive at a new field to assess the model: ROC curve and the area under the ROC curve (AUC). For the 7 predictors in the model, we have AUC=0.706 (Fig. 2), which is a moderately good indicator that the 7 variables are good predictors overall on the toxicity.
The process of creating the ROC curve provides another evaluation on the model -- the optimal cutoff point (when the specificity+sensitivity is maximized, or the best combination) of the sensitivity and specificity in terms of the treatment.
Key concepts: sensitivity, specificity.
Mathematically, sensitivity is the predictive probability of a patient experiencing toxicity given the patient does have the toxicity from the treatment, in the current data, it is 68.8%. The specificity is the predictive probability of a patient will not experience the toxicity given the patient was not experiencing it; in this case, it is 63.9%, a different saying that this is a moderately good predictive model.
The more predictors included in a model, the larger AUC; but be careful, the SE (or equivalently SD) for each predictor can be larger due to more variables' existence now in the model, and hence "effective" sample sizes are reduced for each predictor!
For example, if we include only 3 predictors in the model -- race, sex, and age. The output parameters for the age are becoming like this --
Coef S.E. Wald Z Pr(>|Z|)
Age: 0.8775 0.3196 2.75 0.0065
The trade-offs: the OR for age >=70 years versus <70 years is exp (0.8775)=2.405 <2.565, which was the OR when the model included 7 predictors; but 95% CI is now (1.285, 4.499), which is narrower compared with the previous OR=(1.247, 5.277), because SE is now smaller (we have more degrees of freedom, or a larger sample size, which will be distributed to each of the 3 predictors!).
Besides, the P value is now 0.0065, which is more significant than that of the 0.0105 (with 7 predictors)! The area under the ROC curve is 0.611 (or 61.1%), but with 7 predictors this is 0.706!
For a quick reminder, we may compare this AUC to the R-squared value in a linear regression, indicating how large the proportion of the variance (linked to SD or SE, the uncertainties in the data) in the data is explained.
We may also consider AUC as a summary metric to the whole logistic model, or how well the model can predict or distinguish between 'yes' and 'no' for the outcome variable (a type of classification).
Part 3: Nomogram, a Visualization Tool, Making the Logistic More Like a Clinical Model, Providing Insightful Perspective for Clinicians -
Nomogram is a visualization of a complex mathematical model such as our previous talked logistic model or a survival Cox model; it is also powerful to check on individual patient and the likelihood of the interested event to happen (experiencing the toxicity). Therefore, nomogram is a valuable source for decision-making.
The algorithm -- the coefficients in the logistic regression may be considered as weights of importance, indicating the relative importance of each predictor. If we consider the most significant variable smoke, as the reference (assigned scores ranging from 0 to 100, can be other values but choose for general mathematical convenience), then all the other covariates have relative scores compared with it with their coefficients (Fig. 3A).
In our model, for race non-White is assigned a value 0 and White 50; sex female scores 0 and male 2; the age younger than 70 years has a score 0 and older than 70 is 69; interestingly, a former smoker has 0 score, never smokers have 38, and current smoker has the maximum value 100; if a patient has the ECOG performance status 0 or 1 then a value 0 will be assigned and with a status greater or equal 2 will have a score 21; BMI smaller than 30 has a value 2, and 0 for BMI greater than or equal to 30; finally, if there is sarcopenia then the value is 35, otherwise 0.
Let us calculate the total scores for a special patient -- a White (50) male (2) patient, younger than 70 years (0), a former smoker (0), with a BMI score smaller than 30 (2) and an ECOG performance score 1 (0) but with sarcopenias (35), the total scores (total points) will be 50+2+0+0+2+0+35=89, then from the 2 bottom lines, we can read out that the predictive value is about 0.28=28%, that is the probability of the patient who will be experiencing the toxicity (Fig. 3B).
In another algorithm, the absolute value of the coefficient will be the weight.
We let the variable Age has the maximum value 100 for patients older or equal to 70 years (has the maximum coefficient) and 59 for those who are younger than 70; relative to the maximum coefficient accordingly, we assign a value to other categories for other variables: Sarcopenia yes=80, no=59; BMI <30=59 and >=30 is 58; ECOG performance status 0 or 1=47, >=2 is 59; former smoker gets 0, never smoker is 23, and current smoker has a 59 scaled value; female patients get 59, male 60; non-White 59 and White patients are 89;
For the patient with ID is 36, we got the total points ∼465, the individual scores for each predictor can be found by the vertical lines (in red). He has the predicted probability to experience the toxicity ∼0.672=67.2% compared with the previous method! The probability of not experiencing the toxicity is 100% -- 67.2%=32.8%, therefore the odds (yes vs. no to experience the toxicity) is 67.2%/32.8%=672/328 ∼2.049.
If we apply the rule in Figure 3A to this patient, he was a White person, a current smoker, younger than 70 years old, BMI is smaller than 30, with sarcopenia and ECOG performance score is larger than 2, then we get his total score=210, which is approximately corresponding to a predicted probability ∼0.67=67%! We see that different statistical algorithms can predict the same for the same person.
In addition, we may apply the nomogram to a survival data analysis so Cox hazard ratios, overall survival, or disease-free survival will play the major roles.
Part 4: Decision Curve Analysis (DCA) - a tool to Evaluate Predictive Logistic Model, Supplementing ROC, and Account for Clinical Utility -
DCA estimates clinical "net benefit" for predictive models (or diagnostic tests). It includes 2 extreme strategies -- treat all patients ("All" line) and treat no patients ("None" line) and compare to a predictive model or models.
We may consider treating a patient without the "event" (a control) as "cost" and "benefit" as treating a patient with an event (a true positive case, experience toxicity in our case).
Net benefit is determined as the minimum probability of event at which further intervention would be warranted, net benefit=(sensitivity) x (prevalence)-(1-specificity) x (1- prevalence) x (the odds at the threshold probability).
"The threshold probability" of an event is where a patient would opt for treatment after weighing the relative harms of a false-positive and a false-negative prediction. A patient or clinician would have a threshold value (horizontal axis, 0% to 100%) in his/her mind and consult with the informative DCA curve (Fig. 4).
On the DCA curves, we can easily compare different treatment modalities and various models. After combining other factors such as financial burden or clinical convenience, the clinicians will make better decisions.
A DCA may be applied to diagnostic tests, risk assessment, or cancer screening tests in clinical practice.
Part 5: Bootstrap -- A Method to Resample the Data Multiple Times Randomly; It Tries to Find the Mechanism (The Model, Probability Distribution etc.) Behind That Have Generated the Data
Suppose that the current data is a good representative of the type of patients, then a bootstrapping method could be a validation on the data and models. In the method, we are resampling the original data randomly for 1000 times; each time we will take out, for example, 70% of the current data, that is, n=272*0.7 ∼190 without replacement (no repetitions of the values from the original data), or we just take the full size of the data (n=272) but with replacement (some values can be chosen more than once). After each individual resampling, we do a calculation for bootstrap statistics -- the new parameters (mean, SD, coefficients, and so on) and calculate the bias (mean difference of the new parameters and the original counterparts) and SE of the new parameters' estimation (the SD of the bootstrapped sample).
For example, for the age older than 70, the coefficient=0.9421, while the bootstrap bias=0.04393767, and the bootstrap SE of the coefficient=0.3907821; both values decreased, so now become more precise!
The coefficient=0.9421+0.04393767=0.9860377, OR for age older than 70 versus younger than 70=exp (0.9860377) ∼2.68; and the 95% CI on the OR=exp (mean coefficient+/-1.96*SE)=exp (0.9860377+/-1.96*0.3907821)=(exp (0.9860377-1.96*0.3907821), exp (0.9860377+1.96*0.3907821))=(1.246, 5.766), which is quite close to the original (1.247, 5.277).
Remember that every resampling of the 1000 samples (estimates) has a set of mean, SD, coefficients, and we are validating the original model by taking a mean of the 1000 estimates!
When the current data are not a good representative of the toxicity issue, the results of bootstrapping will deviate far from the logistic model based on the original data, so we know that the data is not valid.
Part 6: To Validate an Established Logistic Model, We Apply It as Internal Cross-validation; Of Course, We Need an External Validation to a Generalization -
After a logistic model was established, we now want to validate and generalize it to a broader area. It is ideal to find another external data source so we can apply our model to that data. If the model has a good predictive power (to confirm whether a patient will experience the toxicity or not), then we have more confidence to extend the model to other cases.
Under the current situation, we only can validate the established model with our present data -- using a method called (internal) k-fold cross-validation. Here, we let k=10 (a common practice, can be other values), so we had separate the original data randomly into 10 equal-sized subsamples (folds). We take 9 subsamples as the training data so that we can train the data (build a model, as usual, now on the 9 subsamples or partitions), then we had use the trained (logistic) model applied on the remaining one subsample data to test (validate) the trained model. That is, to count how many patients are predicted correctly (based on the yes or no to the outcome variable toxicity). For the same process, we do 100 times (to think we are like to enlarge or amplify the original data) and take the average counts that correctly predict the patients. Our 10-fold cross-validation says that the accuracy is about 65.17%; that is, about 2/3 patients will be predicted correctly (again, moderately good).
In addition, if the model is for survival data analysis such as a multivariate Cox model, the previous statistical principles will not change. Further investigation indicated that when k=15, we got the highest accuracy ∼65.83%.
DISCUSSIONS
In the current article, we briefly introduced 6 statistical tools applied to a logistic model based on a head and neck data. To make full use of a valuable clinical data (more cost effective), we had recommended applying as many as possible, if not all, of these methods to analyze our data and validate our conclusions. We may consider each tool a blind man, only to see a part of the truth behind the data but the clinical manifestations hidden in the data, that is, the elephant can be revealed to us via the statistical tools. Anywhere we find discrepancies between and among the methods, we see the deviations of our conclusions from the truth, the treatments regimes, or diseases.
We may consider logistic model, with a form logit (p)=log (odd of p)=log (P/(1-p))=β+β×1+β×2+β×3+..., a natural extension to the general linear model of Y on X variables, Y=β+β×1+β×2+β×3+.... Here, we are transforming the outcome variable Y (binomial, 0 or 1, or any pairs) into a probability function of Y, in a range of 0 to 1.
Let P=p (Y=1) such that logit (p)
=log (p (Y=1)/p(Y=0))
=log (p (Y=1)/1- p(Y=1)).
Y can be continuous, like blood glucose for diabetes, or discrete numbers like EPIC-26 for quality-of-life measures of prostate cancer, or categorical variable (or nominal) such as the binomial one we are talking about in the paper, the toxicity-yes or no.
In this paper, we will define (categorize) patients into 2 groups, "dose-limiting toxicity; 0 - no, 1-yes"; then we will model the data with a logistic model given X (x1=race, x2=sex, x3=age, x4=smoke, x5=ECOG performance status, x6=BMI, and x7=Sarcopenia):
log [probability of Y=yes (toxicity) vs. probability of Y=no (no toxicity) | X]
=beta0+beta1*race+beta2*sex+besta3*age+beta4*smoke+beta5*ECOG performance status+beta6*BMI+beta7*Sarcopenia, where the beta0 is the intercept term, and those other betas are linear regression coefficients of the covariates race, sex, and so on.
If we want to find the OR for race, just let all other betas be zero so that
log (P/(1-p))
= log (P(Y=yes)/1-P(Y=yes))
= log (P(Y=yes)/P(Y=no))
= log (odds of probability of Y yes vs. no)
= beta0+beta1*race.
The OR for the explanatory variable race
= log (odds when race=1) / log (odds when race=0)
= exp (beta0+beta1*1) /exp (beta0+beta1*0)
= exp (beta1), where without losing generality, let race=0=non-White, and race=1 is White.
That is, the exponential of the coefficient is the OR for that covariate. If we let race=1=non-White, then race=0=White, and the OR will be reversed, from 1.98 (White vs. non-White) to 1/1.98=0.51 (non-White vs. White!). Hence, OR is a relative quantity, so we need a reference category or level!
In the case of Y is ordinal (such as pain scores or financial toxicity) or nominal with more than 2 categories (for instance, 3 types of treatment), though it is not often used, but will require a larger or much larger sample (size) of the data. Again, in these models, predictors can be any types of data, categorical (nominal), continuous, or ordinal.
CONCLUSIONS
Statistical analysis has been a key role in clinical and health care research. On one hand, how to better understand some of the primary statistical concepts is of great important to clinicians; on the other hand, the less of appropriate statistical educations presents a huge barrier to prevent them from grasping the statistics behind the concepts.
In the present article, we provided an example with graphs to help readers, especially those who work as clinicians in their appreciating the concepts and values of cancer data analysis. We are introducing some important statistical tools such as forest plots, ROC curves, nomograms, decision curve analysis, bootstrapping resampling, and cross-validations. It is our hope that the promotion on the application of these statistical tools will enhance data analysis in practice.
A Statement -- the authors are trying to promote those statistical methods/tools presented in the manuscript; the models and results should be considered a tutorial type, therefore, not for clinical practice and consultations or real predictions.