Classifying Financial Software Development Documentation in Knowledge Discovery Systems

  • Roman Melnyk

    Student thesis: Doctoral ThesisDoctor of Philosophy

    Abstract

    Organization of documents for quick and easy access is an essential component of knowledge management (KM) in technical teams. One advantageous and effective way of organizing these documents is to catalog them by a fixed set of specific knowledge categories. For large-scale technical companies, the number of categories can reach thousands or even tens of thousands, which makes cataloging especially useful. Text classification (TC) is a sophisticated process that involves data pre-processing, transformation, dimensionality reduction, application of classification techniques, classifier evaluation, and classifier validation. To date, TC remains a prominent research topic and still depends on human labor rather than on machine learning. It is a relatively new area of research and remains in an early phase. The goal was to develop and evaluate a prototype model that used machine learning algorithms to classify technical documentation in a KM system for a financial institution. Using a design-science research methodology, the study was carried out in seven phases: identify a proper data representation technique, develop a prototype text classification model, collect stakeholder feedback, analyze survey data, validate the model internally based in stakeholder input, review the final prototype via focus group discussion, and analyze focus group data. A prototype model that uses machine learning algorithms was developed to address the problematic aspects of the classification of technical documentation in KM system. Traditional and several state-of-the-art deep learning models were compared and evaluated. An online survey was designed and administered to collect qualitative and quantitative data to evaluate the Perceived Usefulness and the Perceived Ease-of-Use of the model. Last, the model was updated and the final prototype was reviewed via a focus group to solicit more detailed responses regarding the development of the model and answer some key questions. The focus group discussion found the model to be useful and easy to use. The panel’s comments showed that the model could effectively save time finding the needed information in the documents and help in the day-to-day job. This research contributes to the field of computer information systems and KM by examining several promising contemporary TC methods to determine if they can provide satisfactory classification performance in knowledge discovery of financial technical documentation.
    Date of AwardJan 1 2021
    Original languageEnglish
    SupervisorMartha Marie Snyder (Supervisor), Junping Sun (Advisor) & Ling Wang (Advisor)

    Cite this

    '