Making sense of Twitter @ Bloomberg presented by Daniel Preotiuc-Pietro

ABSTRACT: The Bloomberg Terminal has provided ways for investors and journalists to sift through and understand the immense volume of tweets and discover financially-relevant content ever since the SEC approved the use of Twitter for company disclosures back in 2013.

In the first part of the talk, I will showcase how tweets impact financial markets and how Bloomberg is using Natural Language Processing methods to identify financially relevant tweets that move the markets. Our processing pipeline feeds directly to clients, journalists in the newsroom and powers several news analytic products offered by the company including trending companies and consumer sentiment for publicly traded equities.

However, understanding user pragmatic intent in individual tweets would allow us to gain deeper insights and enable new applications. I will present several recent research studies focused on understanding intent including identifying complaints and the roles with which vulgarity is used in social media and how these can help improve applications such as sentiment analysis and hate speech detection.

BIO: Daniel Preotiuc-Pietro is a Senior Research Engineer and Team Lead at Bloomberg LP, where he works on analyzing and building models for real-world large scale social media and news mining and information extraction. His research interests are focused on understanding the social and temporal aspects of text, especially from social media, with applications in domains such as Social Psychology, Law, Political Science and Journalism. Several of his research studies were featured in popular press including the Washington Post, BBC, New Scientist, Scientific American or FiveThirtyEight. He is a co-organizer of the Natural Legal Language Processing workshop series. Prior to joining Bloomberg LP, Daniel was a postdoctoral researcher at the University of Pennsylvania with the interdisciplinary World Well Being Project and obtained his PhD in Natural Language Processing and Machine Learning at the University of Sheffield, UK.

Hidden Biases. Ethical Issues in NLP, and What to Do about Them presented by Dirk Hovy of Bocconi University

ABSTRACT: Through language, we fundamentally express who we are as humans. This property makes text a fantastic resource for research into the complexity of the human mind, from social sciences to humanities. However, it is exactly that property that also creates some ethical problems. Texts reflect the authors' biases, which get magnified by statistical models. This has unintended consequences for our analysis: If our data is not reflective of the population as a whole, if we do not pay attention to the biases contained, we can easily draw the wrong conclusions, and create disadvantages for our users.

In this talk, I will discuss several types of biases that affect NLP models, their sources, and potential counter measures: (1) Bias stemming from data, i.e., selection bias (if our texts do not adequately reflect the population we want to study), label bias (if the labels we use are skewed) and semantic bias (the latent stereotypes encoded in embeddings); (2) Biases deriving from the models themselves, i.e., their tendency to amplify any imbalances that are present in the data; (3) Design bias, i.e., the biases arising from our (the researchers) decisions which topics to analyze, which data sets to use, and what to do with them. For each bias, I will provide examples and discuss the possible ramifications for a wide range of applications, and various ways to address and counteract these biases, ranging from simple labeling considerations to new types of models.

BIO: Dirk Hovey is an associate professor of Computer Science in the department of marketing at Bocconi University. He received his PhD from the University of Southern California in Los Angeles, where he worked as a research assistant at the Information Sciences Institute. 

He works in Natural Language Processing (NLP), a subfield of artificial intelligence. His research focuses on computational social science. His interests include integrating sociolinguistic knowledge into NLP models, using large-scale statistics to model the interaction between people's socio-demographic profile and their language use, and ethics for data science and algorithmic fairness.

CSE 600 Talk: Haibin Ling - Computer Vision Research and Applications


Abstract: Having been intensively studied over half a decade, computer vision has evolved as a broad research area and become mature in many applications. In this talk, we will summarize our work in computer vision in both core vision topics and application-oriented ones. In particular, for core vision problems, we will report studies on visual tracking, visual matching and visual detection; for applications, we will describe our work on medical image analysis, intelligent transportation, smart projector systems and preliminary work on material property prediction.

Bio: Haibin Ling received the BS and MS degrees from Peking University in 1997 and 2000, respectively, and the PhD degree from the University of Maryland, College Park, in 2006. From 2000 to 2001, he was an assistant researcher at Microsoft Research Asia. From 2006 to 2007, he worked as a postdoctoral scientist at the University of California Los Angeles. In 2007, he joined Siemens Corporate Research as a research scientist. From 2008 to 2019, he worked as a faculty member of Temple University. In fall 2019, he joined the Department of Computer Science of Stony Brook University where he is currently a SUNY Empire Innovation Professor. His research interests include computer vision, augmented reality, medical image analysis, and human computer interaction. He received the Best Student Paper Award at the ACM UIST in 2003, and the NSF CAREER Award in 2014. He serves as Associate Editor for several journals including IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Pattern Recognition (PR), and Computer Vision and Image Understanding (CVIU). He has served or will serve as Area Chair for CVPR 2014, 2016, 2019 and 2020.

Hyperscale Verification in Microsoft Azure talk by Nikolaj Bjorner

Abstract: Cloud providers are increasingly embracing network verification for managing complex datacenter network infrastructure. Microsoft's Azure cloud infrastructure integrates the SecGuru tool, which leverages the Z3 Satisfiability Modulo Theories solver, for checking network access
control lists. It also integrates a verifier that uses both custom verification algorithms and Z3 that checks correctness of forwarding tables in Azure data-centers. These tools assure that the network is configured to preserve desired intent over hundreds of thousands of network devices. We describe our experiences building and running SecGuru for network verification in Azure.

Finally we mention recent advances in Z3, including a distributed version of Z3 that scales with Azure's elastic cloud. It integrates recent advances in lookahead and distributed SAT solving for Z3's
engines for SMT. A different recent advance includes integration of DNNs to learn variable branching strategies for high-performance SAT solvers, including MiniSAT, Glucose and Z3's SAT solver.

Bio: Nikolaj Bjorner is a Principal Researcher at Microsoft Research, Redmond, working in the area of Automated Theorem Proving and Software Engineering. His current main line of work is around the state-of-the art theorem prover Z3, which is used as a foundation of several software engineering tools. Z3 received the 2015 ACM SIGPLAN Software System award and most influential tool paper in the first 20 years of TACAS in 2014, and test of time award at ETAPS 2018. Together with Leonardo de Moura received the CADE 2019 Herbrand award for contributions to SMT and applications. Previously, he developed the DFSR, Distributed File System - Replication, and Remote Differential
Compression protocols, RDC, part of Windows Server since 2005 and before that worked on distributed file sharing systems at a startup, and program synthesis and transformation systems at the Kestrel Institute. He received his Master's and PhD degrees in computer science from Stanford University.

Talk by Zhenhua Liu to be followed by AI Institute updates


Abstract: Decision making with uncertainty has been studied in multiple communities extensively. Recently, online optimization has gained popularity partially because of its promising performance guarantees by incorporating predictions. In this talk, I will provide an overview of our work on algorithm designs for online optimization and its applications. Then, I will talk about our recent work in ACM Sigmetrics 2019 on choosing predictions and control algorithms simultaneously and dynamically. Finally, I will discuss some ongoing efforts and collaboration opportunities.

Bio: Zhenhua Liu is currently an assistant professor in the Department of Applied Mathematics and Statistics at Stony Brook University. He is also affiliated with the Department of Computer Science, the AI Institute and the Smart Energy Technology Cluster. He received his PhD degree in Computer Science from California Institute of Technology. His current research interests include cloud computing, online optimization and learning, smart grid, market design and distributed control. His research combines rigorous analysis and system design, and goes from theory, to prototype, and eventually to industry to make real impacts.

A talk by Jerome Zhengrong Liang entitled, Machine Learning from Original Images to Texture Patterns: A Paradigm Shift from Non-Medical Application to Medical Diagnosis.

Abstract: Artificial intelligence (AI) research for medical diagnosis started soon after human began to use computer, initially called artificial neural network (ANN) and now convolutional neural network (CNN). ANN has been mainly explored to classify the experts' handcrafted features from the original (or raw) images, while CNN has been mainly explored directly on the raw images for both tasks of extracting abstract features and classifying the features.

Experimental evidences have been shown that CNN can be trained by a large number of the raw images with experts' scores (or labels) to match or even surpass the experts' performance for both non-medical and medical diagnosis applications. However, the performances of the CNN models as well as the experts on medical diagnosis dropped dramatically when the labels of the raw images were replaced by the corresponding medical pathological reports.

Accumulated medical knowledge reveals that the lesion heterogeneity is a footprint of lesion evolution and ecology, and the heterogeneity is an indicator of lesion progress and response to medical intervention. The heterogeneity can be reflected by the image contrast distribution (or texture patterns) across the lesion volume. Image textures have been shown as an effective descriptor of the lesion heterogeneity for computer-aided diagnosis.

Can we map the raw images into texture patterns (or images) and train CNN to learn from the texture images? This question is the central theme of this presentation with application to CT Colonography or virtual colonoscopy, a game from AlphaGo to PolypGo.

Bio: Jerome Zhengrong Liang, PhD, IEEE Fellow
Imaging Research and Informatics Laboratory
Department of Radiology, Stony Brook University

Stony Brook University Northern California Alumni Chapter - Institute for AI-Driven Discovery and Innovation Panel

Join us for a Northern California Alumni and Friends luncheon followed by a panel discussion, celebrating the Institute for AI-Driven Discovery and Innovation, moderated by Fotis Sotiropoulos, Dean, College of Engineering and Applied Sciences.

Panel Discussion with:
Richard Bravman '78, Chief Strategy Officer, Affinity Solutions
Jalal Mahmud, PhD '08, Master Inventor, IBM Watson
Reza Raji '86, CEO, Xenio Systems
Andrew Protter, PhD '83, Chief Scientific Officer, Auansa Inc.

Moderated by:
Fotis Sotiropoulos, Dean, College of Engineering and Applied Sciences

Click here for more information and to register.

The Antonija Prelec Memorial Committee in collaboration with Stony Brook University Libraries are very excited to bring you the 2019 Prelec Memorial Lecture! This year, we are pleased to announce our speaker is Patricia Flatley Brennan, RN, PhD, Director of the National Library of Medicine.

No registration required. Find more information here.

The Challenges of Machine Learning in Adversarial Settings by Patrick McDaniel, Pennsylvania State University

Abstract: Advances in AI and machine learning have enabled new applications and services to interpret and process inputs in previously unthinkable complex environments. Autonomous cars, data analytics, adaptive communication and self-aware software systems are now revolutionizing markets by achieving or exceeding human performance. In this talk, I consider the evolving use of machine learning in security-sensitive contexts and explore why many systems are vulnerable to nonobvious and potentially dangerous manipulation. Here, we examine sensitivity in any application whose misuse might lead to harm--for instance, forcing adaptive network in an unstable state, crashing an autonomous vehicle or bypassing an adult content filter. I explore the use of machine learning in this area particularly in light of recent discoveries in the creation of adversarial samples and defenses against them and posit on future attacks on machine learning. The talk is concluded with a discussion of the technological and societal challenges we face as a result of current and future advances in intelligent computing.

Bio: Patrick McDaniel is the William L. Weiss Professor of Information and Communications Technology and Director of the Institute for Networking and Security Research in the School of Electrical Engineering and Computer Science at the Pennsylvania State University. Professor McDaniel is also a Fellow of the IEEE and ACM and the director of the NSF Frontier Center for Trustworthy Machine Learning. He also served as the program manager and lead scientist for the Army Research Laboratory's Cyber-Security Collaborative Research Alliance from 2013 to 2018. Patrick's research centrally focuses on a wide range of topics in computer and network security and technical public policy. Prior to joining Penn State in 2004, he was a senior research staff member at AT&T Labs-Research.