An empirical investigation into the role of subjective prior probability in searching for potentially missing items
There are many examples from the scientific literature of visual search tasks in which the length, scope and success rate of the search have been shown to vary according to the searcher's expectations of whether the search target is likely to be present. This phenomenon has major practical implications, for instance in cancer screening, when the prevalence of the condition is low and the consequences of a missed disease diagnosis are severe. We consider this problem from an empirical Bayesian perspective to explain how the effect of a low prior probability, subjectively assessed by the searcher, might impact on the extent of the search. We show how the searcher's posterior probability that the target is present depends on the prior probability and the proportion of possible target locations already searched, and also consider the implications of imperfect search, when the probability of false-positive and false-negative decisions is non-zero. The theoretical results are applied to two studies of radiologists' visual assessment of pulmonary lesions on chest radiographs. Further application areas in diagnostic medicine and airport security are also discussed.
1. Introduction
Visual search is a common task both in everyday life and in the scientific research environment [1,2]. A particularly common scenario is one in which the search ‘target’ may or may not be present. This arises in applications ranging from quality control in engineering [3] and baggage scanning in airport security [4] to the examination of medical images for broken bones [5] and cancerous lesions [6]. In each case, there is a trade-off between the accuracy of the search and the time taken to complete the search task.
The question to be addressed in this paper is as follows: how long should a searcher continue to search, given that the target has not yet been found? At what point should the search be terminated and the target declared to be absent? Although it might appear desirable to conduct a thorough search of allpossible locations, enabling the searcher to determine with certainty whether the target is present, in certain scenarios a search of this kind may be too time-consuming or too expensive. This is a particular concern if there are many such search tasks to be completed in succession, as might be the case in a busy airport or in a disease screening programme [7]. Additionally, the search strategy may be inefficient [8], leaving possible target locations unexplored, or the decision as to what constitutes a target might be subject to error [9].
It appears reasonable to think that the time spent searching might depend on the individual's level of anticipation that the target is present, which can be regarded as a prior probability. It appears natural to interpret this as a subjective Bayesian probability [10], which is gradually modified as the searcher assimilates information about the presence or absence of the target during the search.
This paper assesses how this probability might change according to the length of the search, and consequently indicate when the searcher might terminate the search entirely, as might be the case if the posterior probability falls below a suitably small value. The topic is introduced in §2 by means of a simple example based on just two possible locations, which is then extended in §3 to the case of multiple locations. Section 4 discusses the case in which the reader's ability to identify of the target is subject to error. Section 5 evaluates the approach in an analysis of data from two published studies of the assessment of pulmonary lesions on chest radiographs, and §6 is a concluding discussion of further implications for applied research.
2. A simple example
Suppose initially that there are only two possible locations (‘cell 1’ and ‘cell 2’) in which the object of interest (‘target’) may be hidden. Let Z denote the event that ‘the target is present in either of the cells’, and Zc its converse (‘the target is absent’). Let Xi denote the event that ‘the target is in cell i’ (i=1,2), and
We assume that either exactly one of the cells contains the target, which occurs with prior probability π=P(Z)∈(0,1), or neither of the cells contains the target, with prior probability 1−π. We also assume that the cells are superficially identical, so that if the target is present, it is equally likely to lie in either of the cells (i.e.
Initially, we assume that the searcher operates without error: searching a particular cell establishes with certainty whether that cell contains the target. In §4, we will investigate the implications of relaxing this assumption. Clearly, if cell 1 contains the target, we have P(Z|X1)=1. Suppose that cell 1 has been searched and does not contain the target.
Application of Bayes's theorem gives
For an intuitive explanation of this result, we might consider the outcome of a series of ‘trials’ in each of which the target is present with probability π=0.5. In the long run, the target would be in cell 1 in one in every four trials; in the other three, two would be scenarios in which both cells were empty, and in the other, cell 2 would contain the target. If the first cell is empty, the first of these cannot occur, and the other three will occur with equal probability, giving the ‘answer’ of
3. Extension to the case of multiple cells
Consider now the case in which there are n cells, of which m have been searched without the target being found, i.e.
Figure 1 shows how equation (3.1) varies with p for fixed values of π (R code used to create the figures is available as the electronic supporting information, where the effect of changing parameter values can be seen [12]).
If the prior probability π is high, the posterior probability will remain high except when an extremely thorough search is made. If π is large (0.99, say), searching a proportion p=π of possible locations gives a posterior probability of
Keeping π fixed, for small values of p, the function f(p)=(1−p)π(1−pπ)−1 is approximately quadratic: f(p)≈1−2(pπ)2 by Taylor expansion. The second derivative of f with respect to p is
The probability (3.1) drops below a fixed threshold t<π when
Similar results apply if, instead of regarding each cell as equally likely to contain the target, probabilities are allowed to vary between cells, with cells searched in decreasing order of probability. The prior probabilities might be represented as a non-increasing function π(x), where x∈[0,1] and
4. Extension in the presence of decision error
In practice, a search can rarely be carried out without the possibility of errors arising. For example, the assessment of medical imaging is generally subject to both recognition errors (in which the eye fixates on the target but the searcher fails to realize that an identification needs to be made) and decision errors (in which the searcher realizes that the object under scrutiny may be the target, but incorrectly identifies its nature) [6]. Visual search error, when the searcher fails to make a visual assessment of some parts of the region, is also possible and would correspond to a reduction in the proportion p of possible target locations searched. Such errors may occur through human error (e.g. fatigue or lack of sufficient expertise) or through factors that are generally beyond the searcher's control (e.g. if the quality of the image as a whole is poor, or the target itself is to some extent hidden or distorted). In this section, we extend previous results to allow for both ‘false positive’ and ‘false negative’ errors: situations in which, respectively, the searcher decides a target is present when it is not, and the searcher fails to identify a target that is present.
Xi still denotes the event that ‘the target is in cell i’, but now this is not directly observable. Instead, either Yi, the event that the searcher decides that cell icontains the target, or
Corresponding algebraic expressions to those in §§2 and 3 can be derived. Appendix A contains details of the derivation. In the case of only two cells, the posterior probabilities in the case that cell 1 is empty are
As in §3, we can extend these formulae to the case in which a proportion p of the cells have been observed to be empty (i.e. we observe
Figures 2 and 3 show how the posterior probability varies with respect to p, q, rand π. The impact of decision errors clearly increases the longer the search continues (i.e. as p increases), both because a larger number of assessments provides a greater chance of a false positive decision occurring and, more importantly, because if the target is present, its location is more likely to have been assessed if p is large. The lower panel of figure 2 shows that changing qand r has a larger effect on the posterior probability when π is nearer 0.5 than either of the extremes, which is intuitively reasonable: with equivocal prior opinion, the searcher's decision depends primarily on the observed data.
Figure 3 shows, for fixed, non-zero q and r, how the searcher's posterior probability changes as the search develops. It can be compared to figure 1, which represents the case in which p=q=0. The dependence on r can be seen particularly at high prevalence levels when the proportion searched is also high. In general, the posterior probabilities (4.5) and (4.6) depend on both the false positive and false negative error rates, even though these expressions are probabilities conditional only on a set of observations in which the target has notbeen found.
The overall pattern of results from figures 2 and 3 demonstrates that π has a much larger effect than both q and r at almost all levels of p, especially when qand r are small, as would be the case in most applications. One important exception presents itself: when a complete search has been made. In this case, in the absence of decision error, the prior probability is irrelevant, as the posterior probability is either one or zero, depending on whether the target has been found. However, figure 3 shows that this does not apply when decision error is present. The figure is plotted on the domain [0,1], with the posterior probability at p=1 given by
5. Application
We apply the results above to two studies by Reed et al. that examined the effect of prevalence expectations on the performance of radiologists' visual assessment of pulmonary lesions on chest radiographs [5,14]. The designs of the studies were similar: both required 22 experienced radiologist readers, with a minimum of 6 years' post-registration experience, to assess a sample of 30 radiographs, 15 of which contained abnormalities and 15 of which were ‘normal’. In most scenarios, prevalence information was provided to readers in advance (either
Here, we use the total duration data for the scenarios in which numerical prevalence information was provided. We use data only from the search of the ‘normal’ images, as in most cases search of abnormal images was interrupted by the finding of at least one lesion. Therefore, we consider only cases when, unknown to the reader, no target was present. Both studies found statistically significantly increased scrutiny times at higher levels of prevalence, with readers taking an average of 8–10 s per image at the lowest prevalence level to 17–18 s per image at the highest (table 1, calculated from results given in the two papers). The results we have derived in this paper can be used to provide additional insight into the findings of the two studies of Reed et al. In particular, we use the average search duration data to estimate the proportion of the radiograph that is consistent with adequate search by readers at the different prevalence levels.
Table 1.
Results and estimated proportions of locations searched, using data from the two studies of Reed et al.
For clarity of exposition, we consider the results in §3 (no decision error), although the method is readily extended to the more general case in §4. Three major assumptions are required. Firstly, readers are assumed to be ‘Bayesian readers’ in the sense that, given prevalence information, their behaviour is assumed to follow the principles already described in this paper. The decision as to when to cease searching is based on the stated prevalence allied to information already accumulated during the search, as represented by the likelihood, resulting in the posterior probabilities shown in figure 4. Secondly, readers are assumed to finish searching when their posterior probability of a lesion being present reaches a particular (unknown) level π⋆, which is the same regardless of the experimental condition. Thirdly, the rate at which information is accumulated during the search is assumed to be constant over time, so that, in two separate searches, the ratio of the two scrutiny times is equal to the ratio of the proportions of possible locations searched.
Let, i=1,2,3 index the three prevalence levels πi (30%, 50% and 73%, respectively). Let pi be the proportion of possible target locations searched when the search ends, and ti the time in seconds at which this occurs. The assumptions imply that, for each i≠j,
6. Discussion
This article provides a theoretical basis for understanding why duration of search depends on the searcher's initial expectation that the target of the search is present. That the posterior probability, when viewed as a function of the proportion searched p for fixed prior probability π, is concave is revealing, and might be seen as a ‘law of increasing returns’: the more of the region has already been searched, the greater the return, in terms of change in posterior probability, in searching a fixed proportion extra.
In treating the assessment of different cells as independent, we have not considered issues such as peripheral vision; nor have we addressed the scenario in which decision error may vary according to location, and may also be associated with the location-varying prior probability that the target is present. To achieve the latter would be algebraically cumbersome but no more difficult in principle than the scenario we have considered. We have concentrated on probabilities conditional on the event that, in the opinion of the searcher, no target has been found, and not investigated in detail the consequences of a ‘false positive’ assessment. This is primarily because in the examples that motivate this work, a ‘positive’ decision would typically lead to swift further action being taken (such as conclusive investigation of a suspect package during baggage screening) that radically alters the nature of the search task. Nevertheless, the extremely low prevalence of suspicious packages is sufficient to greatly reduce the probability of detection for imperfect searches.
There is evidence that performance can be improved by deliberately adding artificial targets to increase ‘vigilance’ [15]: inducing the observers to use a higher prior probability π as a means to increase p, in the terminology of the current paper. The value of doing so will depend on the trade-off between the respective costs of false positive and false negative decisions in any application, and is perhaps best suited to those in which false negatives, or ‘misses’, are costly errors.
There is also substantial medical literature to suggest that, in certain circumstances, so-called ‘prevalence expectations’ may affect diagnostic accuracy, and thus either the duration and thoroughness of the visual search or the way in which the findings of the search are interpreted [5,7,9]. When the prevalence of events is at an extreme, either high or low, the risk is that the searcher may be so influenced by prior expectations that scant attention is paid at all to the results of the search itself. Researchers should therefore also be aware that predictions of rare events may be influenced in the same way, perhaps making them unhelpfully conservative [16].
The application of our theoretical results to the findings in the two papers by Reed et al. has some limitations. We assume that the outcomes of different reads are independent, so readers' decisions are not influenced by their assessment of the cases already seen allied to the expected prevalence level. Framing prevalence information in terms of ‘expected’ or ‘population’ prevalence, rather than ‘sample’ prevalence, should help to prevent this. Other limitations relate to the three stated assumptions in §5 that underlie the analysis. In particular, although the model fits the experimental findings well, there is no intrinsic evidence that the readers, even subconsciously, follow Bayesian reasoning in making their decisions. Our results can show no more than that their search times are consistent with such reasoning. The novelty lies in providing an estimate of the proportion of locations searched corresponding to such a model, given the assumptions, which might be tested experimentally in future research and used as a means of highlighting possible reasons behind inadequate search when prevalence expectations are low.
In screening programmes, the prevalence and incidence of disease is typically very low. For example, in the National Lung Screening Trial, the annual incidence of lung cancer diagnosed by low-dose computed tomography was 0.6% [17], while a recent pilot trial in the UK assumed a 5 year incidence of 5% in a high-risk population [18]. The results in §§3 and 4 apply to assessments of individuals. Screening programmes generally assess many thousands of individuals, which in conjunction with the low prevalence of disease has the effect of producing an extremely high false positive rate, a result that has been discussed in detail elsewhere [19,20]. This well-established finding is consistent with results in §4, which show that a small false positive rate can have a substantial effect on the posterior probability, even when the diagnosis is negative.
The relationship between duration of search and diagnostic accuracy in the context is less clear, and blinded studies in this area are difficult to conduct without changing the nature of the search task [7]. There is some evidence that duration of search is associated with greater diagnostic accuracy when conducting clinical examinations for breast cancer, with a guideline of 3 min per breast recommended in order to achieve complete search, but the relationship between the duration of search and the extent of the search (in terms of possible target locations searched) remains unclear [21]. The two studies of Reed et al.considered in §5 used high disease prevalences, and so these results cannot be extrapolated to the lower prevalence levels that would be observed in the context of screening.
A natural extension of this work would be to link the Bayesian model for the decision-making process, proposed here, to the rich class of optimal observer models that relate to the way in which the visual search is conducted [22,23]. The results could also be extended to consider the case of multiple targets [24] or by providing information about the proportion of items searched to the searcher explicitly. These are all possible topics for future research.