a) Determine suitable approaches towards the construction of AI systems.
b) Determine ethical challenges which are distinctive to AI and issues that may arise with such rapidly developing technologies.
d) Communicate clearly and effectively using the technical language of the field and constructively engage with different stakeholders.
Untitled Original Paper A Framework for Applying Natural Language Processing in Digital Health Interventions Burkhardt Funk1*, PhD; Shiri Sadeh-Sharvit2,3*, PhD; Ellen E Fitzsimmons-Craft4, PhD; Mickey Todd Trockel3, PhD; Grace E Monterubio4, MA; Neha J Goel2,3, MA; Katherine N Balantekin4,5, PhD; Dawn M Eichen4,6, PhD; Rachael E Flatt2,3,7, BSc; Marie-Laure Firebaugh4, LMSW; Corinna Jacobi8, PhD; Andrea K Graham9, PhD; Mark Hoogendoorn10, PhD; Denise E Wilfley4, PhD; C Barr Taylor2,3, MD 1Leuphana University, Institute of Information Systems, Lueneburg, Germany 2Palo Alto University, Center for m2Health, Palo Alto, CA, United States 3Stanford University, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States 4Washington University in St Louis, Department of Psychiatry, St Louis, MO, United States 5University at Buffalo, Department of Exercise and Nutrition Sciences, Buffalo, NY, United States 6University of California San Diego, Department of Pediatrics, San Diego, CA, United States 7University of North Carolina at Chapel Hill, Department of Psychology and Neurosciences, Chapel Hill, NC, United States 8Technische Universität, Institute of Clinical Psychology and Psychotherapy, Dresden, Germany 9Northwestern University, Department of Medical Social Sciences, Chicago, IL, United States 10Vrije Universiteit, Department of Computer Science, Amsterdam, Netherlands *these authors contributed equally Corresponding Author: Burkhardt Funk, PhD Leuphana University Institute of Information Systems Universitaetsallee 1 Lueneburg, 21335 Germany Phone: 49 4131 677 1593 Email:
[email protected] Abstract Background: Digital health interventions (DHIs) are poised to reduce target symptoms in a scalable, affordable, and empirically supported way. DHIs that involve coaching or clinical support often collect text data from 2 sources: (1) open correspondence between users and the trained practitioners supporting them through a messaging system and (2) text data recorded during the intervention by users, such as diary entries. Natural language processing (NLP) offers methods for analyzing text, augmenting the understanding of intervention effects, and informing therapeutic decision making. Objective: This study aimed to present a technical framework that supports the automated analysis of both types of text data often present in DHIs. This framework generates text features and helps to build statistical models to predict target variables, including user engagement, symptom change, and therapeutic outcomes. Methods: We first discussed various NLP techniques and demonstrated how they are implemented in the presented framework. We then applied the framework in a case study of the Healthy Body Image Program, a Web-based intervention trial for eating disorders (EDs). A total of 372 participants who screened positive for an ED received a DHI aimed at reducing ED psychopathology (including binge eating and purging behaviors) and improving body image. These users generated 37,228 intervention text snippets and exchanged 4285 user-coach messages, which were analyzed using the proposed model. Results: We applied the framework to predict binge eating behavior, resulting in an area under the curve between 0.57 (when applied to new users) and 0.72 (when applied to new symptom reports of known users). In addition, initial evidence indicated that specific text features predicted the therapeutic outcome of reducing ED symptoms. J Med Internet Res 2020 | vol. 22 | iss. 2 | e13855 | p. 1https://www.jmir.org/2020/2/e13855 (page number not for citation purposes) Funk et alJOURNAL OF MEDICAL INTERNET RESEARCH XSL•FO RenderX mailto:
[email protected] http://www.w3.org/Style/XSL http://www.renderx.com/ Conclusions: The case study demonstrates the usefulness of a structured approach to text data analytics. NLP techniques improve the prediction of symptom changes in DHIs. We present a technical framework that can be easily applied in other clinical trials and clinical presentations and encourage other groups to apply the framework in similar contexts. (J Med Internet Res 2020;22(2):e13855) doi: 10.2196/13855 KEYWORDS Digital Health Interventions Text Analytics (DHITA); digital health interventions; eating disorders; guided self-help; natural language processing; text mining Introduction Digitally delivered interventions for mental disorders have the potential to reduce the mental health burden worldwide [1]. Efficacious online and mobile phone app–based programs can overcome barriers to treatment such as stigma, reach, access, cost, and the scarcity of professionals trained in empirically supported interventions [2]. Furthermore, digital health interventions (DHI) are more scalable, potentially allowing one professional to manage a large number of individuals [3]. As DHIs are increasingly used, new data analytics capabilities are needed to evaluate treatment outcomes and mechanisms of engagement and symptom reduction [4]. Most DHIs collect structured data that are pertinent to assessing adherence to the intervention and symptom change over time, including symptom severity scales, number of sessions completed, and number of times the program was accessed [5]. Digital guided self-help interventions, a type of DHI, also incorporate a trained practitioner (coach) who facilitates the user’s learning of the intervention material, monitors progress, and helps troubleshoot barriers to change. This allows for the collection of rich, in-depth text data that could augment the understanding of intervention efficacy and inform the development and refinement of future programs. Such datasets include texts generated through direct communication between users and their facilitators through a digital platform. Another source of information comes from text users’ record during the intervention, for example, free-text diary entries and posts authored on intervention-related group chats and discussion boards [6]. Data analytic approaches, therefore, could benefit from cultivating an overarching perspective on methods to apply for studying the text data emerging from technology-delivered programs. Hereafter, we provide a brief review of the use of text analytics methods in DHIs. Then, we propose a framework for applying natural language processing (NLP) in this field and demonstrate its application in a test case of an online intervention for eating disorders (EDs), delivered as part of the Healthy Body Image (HBI) Program trial [7]. Methods Natural Language Processing in Mental Health Interventions NLP is a rapidly evolving interdisciplinary field that studies human language content and its use in predicting human behavior [8]. NLP models utilize computational models to analyze unstructured, user-generated text to identify patterns and related outcomes (eg, a change in target symptoms) [9]. If proven effective, NLP models may ultimately enable the design of automated chatbots in person-machine communication [10]. Although the use of NLP in consumer and online search behavior is well established [11], it has only recently been utilized in mental health research [12]. Text data analytics can inform clinical decisions, particularly when professionals have many data points at their disposal, but each characteristic has weak predictive potency [13]. Using NLP models, researchers have evidenced, for instance, that text communications can predict an increase in psychiatric symptoms [14], that text data on electronic medical records can effectively predict treatment outcomes [5], and that patients’ reviews of the care they receive can provide important insights for stakeholders [15]. Furthermore, when analyzing text data, machine learning algorithms demonstrated greater accuracy than mental health professionals in distinguishing between suicide notes written by suicide completers and controls [16]. A similar approach has also been utilized in understanding medical risks through NLP of electronic medical records [17]. NLP strategies have also been applied to analyze text data from social media in the context of mental health. For instance, Coppersmith et al [18] detected quantifiable signals of mental disorders through analyses of text data available on Twitter. NLP is also effective in using text messages exchanged with a crisis intervention service to predict outcomes [8]. Computational discourse analysis methods have been employed to develop insights on what constitutes effective counseling text conversations as well [19]. Similarly, by analyzing patterns of the words, sentiments, topics, and style of messages used, Hoogendoorn et al [12] found a correlation between several text features and social anxiety in an online treatment. However, research on the clinical applicability of NLP models is still in its early stages [10]. For example, Miner et al [20] have shown that currently available smartphone-based conversational agents (eg, Apple’s Siri), which many individuals use to search health information [21], are not equipped to respond effectively to users’ inquiries about mental health. Considering the potential of text data to inform and enrich both clinicians and clients, the development and refinement of NLP tools should be a significant public health priority. Proposed Framework NLP offers a useful set of tools for analyzing text data generated in DHIs and for building predictive models. NLP can clarify the mechanisms mediating the effects of online interventions as well as improve and personalize DHIs, leading ultimately to further automation of technology-delivered programs and lower J Med Internet Res 2020 | vol. 22 | iss. 2 | e13855 | p. 2https://www.jmir.org/2020/2/e13855 (page number not for citation purposes) Funk et alJOURNAL OF MEDICAL INTERNET RESEARCH XSL•FO RenderX http://dx.doi.org/10.2196/13855 http://www.w3.org/Style/XSL http://www.renderx.com/ costs [22]. DHI’s free text may be created by 2 sources. First, information about users’ thoughts, emotions, and behaviors is collected via open-ended questions embedded within the program (eg, “Hey [user], after learning about triggers, can you identify two of your common triggers for binge eating?”). Employing NLP techniques to this type of text data can be used to build predictive models, for instance, for calculating individual mood symptoms and symptom trajectories [23]. Second, in guided self-help interventions, users and coaches exchange messages for problem solving, engaging users, providing supplemental information, and individualizing the intervention. In DHIs, each text snippet, that is, a free-text segment, is associated with a specific user and has a unique time stamp. Figure 1 represents an exemplified user journey and shows the time interval a user spends within a DHI. Each filled symbol on the timeline represents a text snippet where the shape and color reflect the text classes (eg, a message from a user). Text snippets are not the only elements of user’s journeys; instead, structured touchpoints (indicated by open circles in Figure 1) complete the data associated with specific users. A touchpoint is, broadly speaking, an interaction of the user with the DHI. Besides text messages exchanged between users and coaches, this includes symptom severity scales. Figure 1. Text fragments along an exemplified user journey of a specific user i (vertical dots refer to other users); open circles refer to other nontext touchpoints and the interaction of the user with the digital health intervention; upward pointing triangles refer to fragments from diaries; red squares refer to the messages sent by coaches; black squares refer to the messages sent by users; and downward pointing triangles refer to the data collected within specific exercises (eg, deep breathing). The analysis of texts in DHIs encompasses 2 steps (Figure 2). The first step, feature engineering, concentrates on preprocessing the text data to identify structured features (free texts cannot be directly used by machine learning algorithms). These features form a numerical vector of typically fixed length that represents each snippet and can be used to estimate statistical models. In the second step, predictive modeling, models are constructed to infer and predict either short-term symptom change or overall therapeutic outcomes. Information