Thresholds and accuracy in screening tools for early detection of psychopathology.


BACKGROUND: The accuracy of any screening instrument designed to detect psychopathology among children is ideally assessed through rigorous comparison to ‘gold standard’ tests and interviews. Such comparisons typically yield estimates of what we refer to as ‘standard indices of diagnostic accuracy’, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value. However, whereas these statistics were originally designed to detect binary signals (e.g., diagnosis present or absent), screening questionnaires commonly used in psychology, psychiatry, and pediatrics typically result in ordinal scores. Thus, a threshold or ‘cut score’ must be applied to these ordinal scores before accuracy can be evaluated using such standard indices. To better understand the tradeoffs inherent in choosing a particular threshold, we discuss the concept of ‘threshold probability’. In contrast to PPV, which reflects the probability that a child whose score falls at or above the screening threshold has the condition of interest, threshold probability refers specifically to the likelihood that a child whose score is equal to a particular screening threshold has the condition of interest. METHOD: The diagnostic accuracy and threshold probability of two well-validated behavioral assessment instruments, the Child Behavior Checklist Total Problem Scale and the Strengths and Difficulties Questionnaire total scale were examined in relation to a structured psychiatric interview in three de-identified datasets. RESULTS: Although both screening measures were effective in identifying groups of children at elevated risk for psychopathology in all samples (odds ratios ranged from 5.2 to 9.7), children who scored at or near the clinical thresholds that optimized sensitivity and specificity were unlikely to meet criteria for psychopathology on gold standard interviews. CONCLUSIONS: Our results are consistent with the view that screening instruments should be interpreted probabilistically, with attention to where along the continuum of positive scores an individual falls.