Non-readers (Computers and Reviewers)

Ted Underwood’s machine learning algorithm predicts the probability that a given work is part of the set of works reviewed in a group of prestigious publications. In the case of poetry from 1820 – 1899, the prediction is almost 80% accurate. (This requires that probabilities be collapsed into the categories of more likely to be reviewed than not, and vice versa.)

In its training phase, the algorithm does something a human could never do: it identifies groups of words whose presences/absences are most likely to sort the work into the correct category: reviewed or unreviewed (an approximation of prestigious or not-prestigious). Computers are good at counting; humans are not. Even though what the computer does is extremely different from human reading, some differences between the prestigious words and the non-prestigious words are actually recognizable to humans. As Underwood puts it, “All of this boils down to a fairly clear contrast between embodied lyric subjectivity and an older mode of poetic authority that is more didactic, sentimental, and collective” (84). Of course, Underwood chooses the most representative passages for his article. Contrasts in diction might not be clear to modern readers–or past reviewers–across the whole dataset. We also have no idea what went on in the minds of reviewers.

That said, the predictive value of diction might bolster the idea, put forth by Arnold Bennett, that to the reviewer, “The narrative everywhere discloses … the merits and defects of the writer; no author ever lived who could write a page without giving himself away” (97). Diction is apparent even from small samples. So if reviewers were compelled to only sample a work before determining whether it might be reviewed or not, diction might have been a useful feature to register (whether consciously or not). According to Bennett, the title page is also extremely useful to the reviewer, but Underwood strips this kind of material from his texts. I wonder if a model could be trained on the smaller data set of paratexts. Would it be too difficult to encode aesthetic and material features?

Leave a Reply