Evaluating the Vulnerability of Deepfake Detectors to Multimodal Adversarial Attacks*

Session Number

1

Advisor(s)

Dr. Pooya Khorrami, Ms. Danielle Sullivan, MIT Lincoln Laboratory

Location

A121

Discipline

Computer Science

Start Date

15-4-2026 10:15 AM

End Date

15-4-2026 11:00 AM

Abstract

The rapid spread of AI-generated video has made robust deepfake detection essential for public trust and media integrity. Although current detectors perform well on standard benchmarks, prior studies show they remain highly vulnerable to adversarial perturbations. Yet, existing research primarily evaluates single-modality attacks, leaving open the question of whether coordinated audio-visual perturbations can further compromise detector reliability. This project aims to investigate how multimodal adversarial attacks—that manipulate both audio and video streams—impact modern deepfake detectors, and how multimodal attacks compare to single-modality attacks and clean data. We compare the effectiveness of these coordinated attacks to single-modality perturbations and clean data, quantifying how multimodal manipulation exacerbates detector vulnerabilities. Additionally, we explore potential defenses, including adversarial training, to assess their ability to improve robustness against such attacks. Through this work, we aim to improve the real-world reliability of deepfake detectors and support media authenticity while protecting against malicious adversaries

Share

COinS
 
Apr 15th, 10:15 AM Apr 15th, 11:00 AM

Evaluating the Vulnerability of Deepfake Detectors to Multimodal Adversarial Attacks*

A121

The rapid spread of AI-generated video has made robust deepfake detection essential for public trust and media integrity. Although current detectors perform well on standard benchmarks, prior studies show they remain highly vulnerable to adversarial perturbations. Yet, existing research primarily evaluates single-modality attacks, leaving open the question of whether coordinated audio-visual perturbations can further compromise detector reliability. This project aims to investigate how multimodal adversarial attacks—that manipulate both audio and video streams—impact modern deepfake detectors, and how multimodal attacks compare to single-modality attacks and clean data. We compare the effectiveness of these coordinated attacks to single-modality perturbations and clean data, quantifying how multimodal manipulation exacerbates detector vulnerabilities. Additionally, we explore potential defenses, including adversarial training, to assess their ability to improve robustness against such attacks. Through this work, we aim to improve the real-world reliability of deepfake detectors and support media authenticity while protecting against malicious adversaries