Natural Language Processing Methods for Analyzing Human Behavior: Modeling & Evaluation
Event Description
Abstract: This dissertation addresses the methodological disconnect between Natural Language Processing (NLP) and human-centric analysis by shifting the unit of analysis from document to human behavior in two broad respects: (i) time-ordering: modeling documents as sequential person-indexed behavioral observations, and (ii) person-level semantics: evaluation and explainability of models by their latent structure of psychological constructs rather than just its predictive accuracy against narrow proxy measures. First, we consider the most basic implication of language as a person's behaviors when measuring their psychological constructs: relationship between language sample size and model's predictive performance. We empirically show that the state-of-the-art transformers are often over-parameterized for typical NLP dataset sizes and can be reduced in dimensionality without performance loss. Establishing the author as the unit of analysis naturally allows us to treat their behavior as a time-ordered sequence. Second, we introduce a longitudinal evaluation framework that establishes ecologically valid evaluation settings, namely, cross-sectional and prospective generalization, and separates error measurement of the model into within-person dynamics and between-person differences. We demonstrate that traditional NLP evaluations based on random document splits can yield reversed conclusions under ecologically valid generalization settings. To address this, we develop models that capture the trajectory of mental states (e.g., mood shifts) rather than static traits. Third, moving into person-level semantics, we evaluate the latent structure of large language models using a novel machine behavior analytic framework. We find that while GPT-4 achieves high predictive correlation with self-reports, its latent symptoms structure diverges from clinical understanding. Finally, we propose a method for modeling multidimensional behaviors, embedding concurrent behavioral signals alongside language to predict future states. Taken together, this work suggests that operationalizing language as behavior advances NLP methods into a rigorous instrument for valid psychological inquiry.
Speaker: Adithya Ganesan
Location: Join Zoom Meeting (ID: 99021939129, Passcode: 569493)
Speaker: Adithya Ganesan
Location: Join Zoom Meeting (ID: 99021939129, Passcode: 569493)