Authorship Attribution with Function Word N-Grams

  • Russell Clark Johnson

    Student thesis: Doctoral ThesisDoctor of Philosophy

    Abstract

    Prior research has considered the sequential order of function words, after the contextual words of the text have been removed, as a stylistic indicator of authorship. This research describes an effort to enhance authorship attribution accuracy based on this same information source with alternate classifiers, alternate n-gram construction methods, and a genetically tuned configuration. The approach is original in that it is the first time that probabilistic versions of Burrows's Delta have been used. Instead of using z-scores as an input for a classifier, the z-scores were converted to probabilistic equivalents (since z-scores cannot be subtracted, added, or divided without the possibility of distorting their probabilistic meaning); this adaptation enhanced accuracy. Multiple versions of Burrows's Delta were evaluated; this includes a hybrid of the Probabilistic Burrows's Delta and the version proposed by Smith & Aldridge (2011); in this case accuracy was enhanced when individual frequent words were evaluated as indicators of style. Other novel aspects include alternate n-gram construction methods; a reconciliation process that allows texts of various lengths from different authors to be compared; and a GA selection process that determines which function (or frequent) words (see Smith & Rickards, 2008; see also Shaker, Corne, & Everson, 2007) may be used in the construction of function word n-grams.
    Date of AwardJan 1 2013
    Original languageEnglish
    SupervisorMichael J Laszlo (Supervisor), Amon B. Seagull (Advisor) & Wei Li (Advisor)

    Cite this

    '