All runs consisted of three passes over progressively smaller subsets of the collection. (a) On-line logistic regression over the English ClueWeb09 collection, using all substrings of the query as binary features. (Alphabetic only, case insensitive). Only the first 35K bytes of each page was used. All 50 topics were processed with a single pass. (b) Same as (a) but over only the enwp (Wikipedia) documents. (c) Naive Bayes classifer, using binary byte 4-grams as features. (No preprocessing at all, except for selection of the first 35K bytes of each page.) Each topic was processed separately. Training data: base run: very relevant: first-ranked from (b) relevant: second-ranked from (b) notrel: 6,000 pages selected at random from full English collection relfeed runs: very relevant: as per qrels relevant: as per qrels notrel: 6,000 pages selected at random from full English collection Note: very relevent examples were given double weight (trained twice) Validation data: None. This is an automatic run. But we did compose 67 of our own queries that we used for pilot experiments. "Test" data: The classifier was run on the top 10K documents from (a) plus the top 10K documents from (b). Overall, the top-scored 1000 documents were submitted to NIST. P.S. Yes, indeed, we used spam filtering methods. The logistic regression was modified for speed and to process 50 topics simultaneously. The naive bayes was an unmodified spam filter, run using the TREC spam filter toolkit. We knew from previous experiments that Naive Bayes was more robust to training noise than logistic regression; this seemed to be confirmed in our pilot expermients. That's why we used it for the relevance feedback pass.