SODA Symposium: Rehabilitation of open-ends: Creating a codebook for open-ends using machine...
Tuesday, December 13, 2022
Title: "Rehabilitation of open-ends: Creating a codebook for open-ends using machine learning techniques and human intervention that then can be used to drive action"
Abstract: Open-ends are a well known problem in survey research: language can yield extremely rich responses, including bringing to the surface aspects of a question or issue that the researcher might not have known to look for, but the analysis of text is costly and labor-intensive. As a result, there is a tendency to include open-ends as an afterthought, to use them minimally, or to avoid them altogether. Computational methods can potentially help, but they often raise concerns about whether the results they provide are as trustworthy and actionable as other kinds of responses.
We will talk about approaches we've been taking to the analysis of open-ends, which combine automation with human intervention in order to navigate the balance between automation and trustworthiness. Two experiments were run independently on the same set of 16,648 responses on Reddit to a question about reasons that people who considered suicide did not end up killing themselves. The first experiment had human intervention at the start using a machine learning process that included word clouds and TF/IDF techniques to help human coders develop a codebook that was actionable. The second experiment used topic modeling, an unsupervised machine learning approach, to pull out latent categories from the open-ends, which then guided a step-by-step content analysis protocol carried out by subject matter experts to identify category labels and descriptions. We will compare/contrast our results at the symposium and more generally discuss the potential for techniques of this kind to bring open-ends out of the shadows in survey research