A Benchmark to Aid Research Curbing Gun Violence

Gun violence is incredibly prevalent in the United States. According to the Gun Violence Archive, as of early May, there have been over 200 mass shootings in the U.S. this year. The casualties that occur from gun violence are heartbreaking for friends, family, and even strangers. With AI technology, there is hope that one day we can decrease the amount of casualties that occur from gun violence.

“CCTV-Gun: Benchmarking Handgun Detection in CCTV Images”, a research paper that Stony Brook University’s Haibin Ling of the Department of Computer Science contributes to bringing this possibility into fruition. AI being able to recognize a handgun in surveillance is much different from a human recognizing one in real-time.

Handgun detection in actual closed circuit TV (CCTV) real-world images is difficult to execute. One factor is image quality. Guns are small, so there are only a few relevant pixels in any image, making it hard to see the weapon. Whether the image is in color or not can also affect detection.

“Nowadays you can use your iphone to get very high quality images. In CCTV, it's not yet like that. In a lot of CCTVs, they don’t even have color images. They are just grayscale images. That also makes things a little more difficult,” says Ling.

Other challenges include occlusion and similarity to other objects. The gunman blocks a big portion of the handgun with their hands, making it hard to see. Additionally, with the handgun being rather small, it's easy to mistake it as another small object.

“Handguns, when they are in action, are mostly occluded. Only a small portion shows,” Ling says, grabbing his computer mouse, the majority of it covered with his hand. “For example, if I’m holding a mouse like this, it’s hard to say whether it’s a handgun or it’s just a mouse. So that’s very difficult.”

Ling’s team compiled an annotated benchmark for handgun detection in real-world CCTV images, named CCTV-Gun, by extracting images from three other datasets that are exclusively security camera images with handguns. The frame images were then annotated to mark the handgun and the person holding it.

Above is an annotated image from the Monash Gun Dataset. The handgun is annotated

with a red box, while the person holding it is annotated with a green box.

The blur makes it difficult for the handgun to be detected.

In addition to the annotated benchmark, this paper performed standard intra-dataset evaluation, and developed a new cross-dataset evaluation protocol to see if the models would generalize better than the COCO pre-trained model.

The intra-dataset protocol involved testing five models separately on the Monash Gun Dataset, US Real-time Gun Detection Dataset, and UCF Crime Scene Dataset. It was found that the models performed best on the Monash Gun Dataset, the dataset with the clearest frames. This reinforces how difficult it is to detect handguns when dealing with blur, occlusion, etc.

As for the cross-dataset protocol, it involved training the models on two of the datasets, testing it on the third dataset, and then comparing its fine-tuned version to the pre-trained model. The models performed best when trained on the US Real-time Gun Detection Dataset and UCF Crime Scene Dataset, then tested on the Monash Gun Dataset. When they were fine-tuned on the Monash Gun Dataset, they performed better than the pre-trained model.

With this paper, researchers like Ling hope to encourage other researchers to further develop technology that will aid the diminishment of the mass shooting crisis in the U.S.

“One possibility, if we have a very successful and accurate gun detection algorithm, is to detect in real time that there is someone showing a gun. It could then immediately send an alert to whoever is in charge and also relevant staff,” says Ling. “For example, if someone brings a gun to an elementary school, the surveillance camera could detect this, automatically close certain gates, and send an alert to teachers and students so that they can be better prepared. Meanwhile, they could alert the police so that they can come as soon as possible.”

-Sara Giarnieri, Communications Assistant