Jeffrey Heinz: The Emphasis on Linguistics in AI

Part nine of our AI Researcher Profile series invites Professor Jeffrey Heinz of the Department of Linguistics and Institute of Advanced Computational Science to discuss his research interest and knowledge surrounding linguistics and its role in artificial intelligence.

AI Institute: What piqued your interest in the fusion between linguistics and artificial intelligence?

Professor Heinz: I was actually interested in this very early on. So when I was in high school, I was reading philosophers like Bertrand Russell, and he asked this question– how is it that humans, whose contact with the world is so fragile, brief, and limited, can know as much as they know? I always thought about that. I’d walk into math class, and I’d learn something about sine or cosine, and I’d walk out of the class knowing something I didn’t know before. What changed? Somehow my brain has changed, but I don’t know how it’s changed.

When you think about learning a language, it’s the same question. Children become very proficient in their native language by the age of 3 or 4. It’s incredible. The amount of data, for example, that the current language technologies use, like ChatGPT and so on, to do do, is way more than what children have. It’s hundreds– millions of times more data. They have access to the whole internet, and children don’t have access to that. So, it’s a very interesting question of how you can come to know something, to learn a generalization about something. I became interested in the language side of it because that’s a very concrete form of the question– like how does a child come to know their own language? That seems to involve some kind of generalization ability, and they have to learn that somehow.

AI: In your career, you’ve been across the U.S. You obtained your PhD at the University of California Los Angeles, taught at the University of Delaware, and now you teach at Stony Brook University. What brought you to Stony Brook?

PH: I came to Stony Brook University because at the University of Delaware, I was the only computational linguist in the linguistics and cognitive science department. Stony Brook University, in multiple ways, was investing in computational linguistics and computational science more generally. In the department of linguistics we have now five faculty members who are experts in different aspects of computational and mathematical linguistics. There’s no other linguistics department in the country or in North America, maybe even in the world, that has that many computational linguists in a linguistics department.

We’re really offering students here something they can’t get anywhere else. Students really benefit from synthesizing knowledge from different people with different perspectives. If you’re the only computational linguist in your department, then students are getting something, but only from you. So it's very valuable to get it from different people. So I think that’s really one of the things that makes our department so different.

The other thing was the Institute for Advanced Computational Science, which I’m also a core faculty member of. I was excited about that because it’s a multidisciplinary institute where people from different disciplines are always using computational methods in their work. Actually, many of them are using machine learning and different kinds of AI-style techniques. So, it was also very exciting to be part of that group.

I didn’t know this when I was in the process of getting hired here, but right when I was joining, I happened to go to this meeting they had in New York City with different SUNY people and the chancellor at the time, and they were talking about investing in AI. The AI Institute that Steven Skiena runs was started just a few years after I got here, and that was another exciting development for me because these are the kinds of things that I want to be a part of. So I am very grateful to be an affiliate of that institute.

AI: Which research projects have been the most interesting in your career thus far? Why?

PH: I've been fortunate to work on a lot of interesting projects both within linguistics and other disciplines. I’ve been able to work on robotic planning control, the protein folding problem, pediatric rehabilitation– bringing in machine learning ideas to help children with disabilities, things like that. Those have been fascinating projects.

What I’m most interested in has to do with fundamental mathematics and computer science that has to do with sequences. Languages are made up of sequences because the words unfold overtime. There are sequential patterns. We can also think about sequential functions. So you can imagine if you’re translating from English to French, there’s a function that takes a sequence and transforms into another sequence.

A lot of mathematics that you study in school is all about numerical functions. You’re taking numbers and getting other numbers, then you graph these parabolas or lines. If you think about it, numerical functions have been studied for thousands of years. People talk about polynomial functions, trigonometric functions, exponential functions. We can integrate, we can differentiate, we can take the inverse. We have all this vocabulary about it– we know about it.

Well, what about functions from sequences to sequences? What’s the vocabulary for that? What’s the study of that? Actually it’s much less old, maybe 70 years old. It’s very recent. It’s a very rich field. We know a lot about it– but it’s also still very young. The reason why I care about it is because the work that I do is a lot about trying to characterize and classify different kinds of sequential functions and patterns. Then we say here’s a big space of possible sequential functions or patterns– where are the natural language ones, the kinds we see in English, Spanish, or the translation from English to Japanese? What kind of transformation is that? What kind of sequential function is that?

Then we can also ask how we can learn those mathematical functions from data. The general results that we find is that out of all the possibilities, they actually tend to cluster. The natural language ones cluster in a certain way. That’s where we want to focus our attention. They’re not arbitrary– they have particular patterns. A lot of my research consists of this. It’s theoretical and abstract, but I like that.

AI: Can you provide an example of what reduplication is?

PH: Reduplication is a morphological process that is common in many of the world’s languages. English doesn’t really have a bonafide morphological process for reduplication, so it seems “exotic” to us, but it’s not exotic at all to most people on the planet. An example of reduplication is like in Indonesian where the way you say "woman " is "wanita," and if you want to say the plural of woman, so "women", you would say "wanita wanita." You would literally just duplicate the word and say it twice. In our case, when we want to make the plural of "cat", we add “s” and get "cats." We have suffixation and prefixation in English, but many languages have reduplication.

Actually, in English we have it too, just not in a sort of canonical construction. We have what people call shm-reduplication. So we say things like "cafe-schmafe", or "milk-schmilk". It’s like we are reduplicating the form, but then we put the "schm", replacing the first consonant of the word. You can use it on all kinds of things– "printer-schminter", "bottle-schmottle". It’s a very productive process. What does it mean in English? I don’t know exactly what it means in English. It doesn’t have a pure morphological function.

In other languages, it has a morphological meaning. Oftentimes it can mean pluralization, it can mean the habitual aspect of a verb– it has different meanings. Over 70% of the world’s languages have some form of reduplication.

As a sequential function, in English, we go from "bottle" to "schmottle", or "cafe" to "schmafe". So what kind of a function is that? If it was a numerical function, we’d want to know if it’s a line, a parabola– what is it? Well, we want to know what kind of thing this is too.

The other kind of reduplication that exists is something called partial reduplication. That’s where instead of saying "wanita wanita", you say "wa wanita", or in some languages you might say "wani wanita". You just copy the first part, the "wa" or the "wani," instead of the whole word. Partial reduplication was sort of always easy to handle because it was bounded in a certain size, but total reduplication means duplicating the whole word, and words can get longer and longer, which makes it computationally more difficult to deal with. Given some work we did here over the past few years ago, it’s more within reach than before.

AI: How does computational linguistics affect reduplication?

PH: I would say there’s a split between what we knew in the 20th century and what we know now in the 21st century specifically about reduplication.

So in the 20th century, reduplication, especially what’s called total reduplication, like in the Indonesian case we have "wanita" going to "wanita wanita"– that was considered to be very computationally complex because it was something you could not do with what’s called a finite-state machine. So it required a more complex computational device to produce that kind of sequence to sequence transformation.

In the 21st century, we’ve realized that reduplication is more complex, but not as much as we may have thought it to be. It actually has many properties in common with other kinds of processes. We sort of had a coarse-grained classification scheme before, and now we have a bunch of finer-grained classification schemes. As a result of that, before what looked like an outlier is now actually within the net, so to speak. It’s not that they were wrong before, it’s just that we have a much finer-grained understanding of these different types of functions.

Computational linguistics has always talked about reduplication being something that was difficult to deal with in terms of the language technology for morphology, but now we have a better understanding of it now.

AI: In your very recent research paper, “Regular and polyregular theories of reduplication,” you explore the computational side of reduplication. Can you share some details about it?

PH: I mentioned how we have total reduplication and partial reduplication, and there’s a lot of variation in the kinds of reduplication patterns we see cross-linguistically. Linguists have tried to develop different theories that account for the kinds of reduplication that we see and don’t see in the world’s languages.

In this paper we classified those linguistic theories according to their computational properties, whether those theories would call the reduplication’s patterns "regular" or "poly-regular". If you have a regular function, what that means is that it’s like saying that something is a linear function. If you have a poly-regular function, it’s like it’s beyond that, it’s polynomial. That’s what those terms are about, and strikingly, some linguistic theories are actually poly-regular which means they’re stronger than what we need them to be. Suppose you think about a production process. We want to produce something like "wanita wanita". So, how do you represent that? Do you represent it as "woman-plural", and then you produce "wanita wanita"?

There are some theories that would have a marker that would tell you that you have to duplicate, and you would have one of those markers for each copy to make. You would have "wanita," then you’d have "marker marker marker", and then you’d have to produce "wanita" not twice, but four times. The first time, and then one for each marker. That’s what makes it poly-regular.

We do see triplication in the world’s languages, and we do see sometimes when people can duplicate more than once, but it’s not clear that we need to have these poly-regular theories of language.

AI: What kind of future are you hoping to see between linguistics and artificial intelligence? Do you have specific goals that you are working towards?

PH: I personally would like to see AI pay more attention to linguistics. I think a lot of AI right now is trying to solve problems with lots of data, but I think we need to try to solve problems with small amounts of data. Humans generalize very quickly from small amounts of data, and I think that’s a hallmark of our intelligence. It can also get us in trouble– we can stereotype, we can make mistakes, and things like that– but we do generalize from very few examples. That’s very different from the modern AI technology that learns from millions of examples. I think we should try to focus on learning from small amounts of data in an accurate and reliable way.

One of the areas I work on is called formal grammar, or formal language theory. One of the reasons why I like it is because the models of language we construct are completely interpretable. A lot of the language technology that’s used today uses methods that are not interpretable. We don’t really understand how they go from A to B. I would hope AI would give more consideration to these interpretable models, or at least make interpretability one of the higher criteria on their priorities.

As for my own personal goals in research, I’m actually working on these things. I’m working on the kinds of patterns we see in natural language, the kinds of transformations we see in natural language, using interpretative methods that can be learned with small amounts of data. That’s a lot of what motivates my research. Students, faculty, collaborators across the university, the region, the world, so on– we’re working on these kinds of problems.