Recently, large-scale language data combined with modern machine learning techniques have shown strong value as means for studying human psychology and behavior. For example, language alone has been shown predictive in mental health, personality, and health behaviors. However, many applications for such language-based assessments have readily available and important data beyond language (i.e. extra-linguistics), such as predicting the subjective well-being of a community using tweets, where one can take into account their age, education, and demographic attributes. Language may capture some characteristics while extra-linguistic variables captures others. We believe that effectively integrating linguistic and extra-linguistic data can yield benefits beyond either independently.
In this thesis, we develop methods which effectively integrate extra-linguistic data with language data focused primarily on social scientific applications. The central challenge is dealing with the size and heterogeneity of, often sparse and noisy, language data versus the, often low-dimensional and non-sparse, extra-linguistic variables. First, we consider structured extra-linguistics, like socioeconomic (income and education rates) and demographics (age, gender, etc.), and propose two integration methods, named residualized controls (RC) and residualized factor adaptation (RFA), to be used in county-wise prediction tasks. Demonstrating techniques that integrate information at both the model-level and data-level, we found consistently strong improvement over naively combining features, for example, increasing county level well-being predictions by over 12%. Next, we consider unstructured extra-linguistic data. In the first part, we incorporate social network connections and language over time to propose a novel metric for quantifying the stickiness of words - their ability to spread across friendship connections in a social network over time (or in other words, stick in ones vocabulary after seeing friends use it). We obtain which language features are more probable to disseminate through friendship and show such a metric is useful for predicting who will be friends and what content will spread. In addition, we analyze language content over time by proposing a novel dynamic content-specific topic modeling technique that can help to identify different sub-domains of a thematic scope and can be used to track societal shifts in concerns or views over time.
George Em Karniadakis received his SM and PhD from Massachusetts Institute of Technology. He was appointed lecturer in the Department of Mechanical Engineering at MIT in 1987 and subsequently he joined the Center for Turbulence Research at Stanford/Nasa Ames. He joined Princeton University as assistant professor in the Department of Mechanical and Aerospace Engineering and as associate faculty in the program of applied and computational mathematics. He was a visiting professor at Caltech in 1993 in the Aeronautics Department and joined Brown University as associate professor of applied mathematics in the Center for Fluid Mechanics in 1994. After becoming a full professor in 1996, he continues to be a visiting professor and senior lecturer of Ocean/Mechanical Engineering at MIT. He is an AAAS fellow (2018), fellow of the Society for Industrial and Applied Mathematics (2010), fellow of the American Physical Society (2004), fellow of the American Society of Mechanical Engineers (2003) and associate fellow of the American Institute of Aeronautics and Astronautics (2006). He received the Alexander von Humboldt award in 2017, the Ralf E Kleinman award (2015), the J. Tinsley Oden Medal (2013), and the CFD award (2007) from the US Association in Computational Mechanics. His h-index is 103, and he has been cited over 52,000 times.
Abstract:
Karniadakis will present a new approach to develop a data-driven, learning-based framework for predicting outcomes of physical and biological systems, governed by PDEs, and for discovering hidden physics from noisy data. He will introduce a deep learning approach based on neural networks (NNs) and generative adversarial networks (GANs). He will also introduce new NNs that learn functionals and nonlinear operators from functions and corresponding responses for system identification. Unlike other approaches that rely on big data, here we learn from small data by exploiting the information provided by the physical conservation laws, which are used to obtain informative priors or regularize the neural networks. He will demonstrate the power of PINNs for several inverse problems in fluid mechanics, solid mechanics and biomedicine including wake flows, shock tube problems, material characterization, brain aneurysms, etc., where traditional methods fail due to lack of boundary and initial conditions or material properties. He will also present a new NN, DeepM&Mnet, which uses DeepOnets as building blocks for multiphysics problems, and he will demonstrate its unique capability in a 7-field hypersonics application.
To register and for more information, click here.