Impact of Surrogate Assessments on High-Recall Retrieval
Adam Roegiest, Gordon V. Cormack, Charles L. A. Clarke & Maura R. Grossman
Accepted for publication at SIGIR 2015.
Download final camera-ready manuscript
Permanent link
Abstract
We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch)
supervised-learning method to rank documents for subsequent review, where the
effectiveness of the ranking will be evaluated using a different assessor
deemed to be authoritative. Previous studies suggest
that surrogate assessments may be a reasonable proxy for authoritative
assessments for this task. Nonetheless, concern persists in some application
domains---such as electronic discovery---that errors in surrogate training
assessments will be amplified by the learning method, materially degrading
performance. We demonstrate, through a re-analysis of data used in previous
studies, that, with passive supervised-learning methods, using surrogate assessments
for training can
substantially impair classifier performance, relative to using the same deemed-authoritative assessor
for both training and assessment. In particular, using a single surrogate to
replace the authoritative assessor for training often yields a ranking that must be traversed
much lower to achieve the same level of recall as the ranking that would
have resulted had the authoritative assessor been used for training. We also
show that steps can be taken to mitigate, and sometimes overcome, the impact of surrogate
assessments for training: relevance assessments may be diversified
through the use of multiple surrogates; and, a more liberal view of relevance
can be adopted by having the surrogate label borderline documents as relevant.
By taking these steps, rankings derived from surrogate assessments can match,
and sometimes
exceed, the performance of the ranking that would have been achieved, had the
authority been used for training. Finally, we show that our results still hold when the role of
surrogate and authority are interchanged, indicating that the results may simply reflect
differing conceptions of relevance between surrogate and authority,
as opposed to
the authority having special skill or knowledge lacked by the surrogate.