DeepFake Detection With Machine Learning
This post contains the source code for my video on deepfake detection
Overview of the pipeline used in our approach. It contains two main blocks, a pre-processing where the input istransformed to a more convenient domain and a training block, where a classifier uses the new transformed features to determine whether the face is real or not. Notice that input images are grey-scaled before DFT.
To the best of our knowledge, no public dataset gathers images containing both artificially and real faces, therefore, we have created our own called Faces-HQ. In order to have a sufficient variety of faces, we have chosen to download and label, images available from CelebA-HQ dataset, Flickr-Faces-HQ dataset, 100K Facesproject and www.thispersondoesnotexist.com. In total, we have collected 40K high quality images being half of them real and the other half fake faces, achieving in this manner a balanced dataset.
Results
Faces-HQ dataset. Test accuracy using SVM, logistic regression and k-means classifier under different data settings.
Detection CelebA
CelebA CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations
Click here to go the experiments on CelebA.
Results
Detection DeepFakeDetection (FaceForensics++)
FaceForensics++ is a forensics dataset consisting of video sequences that have been modified with different automated face manipulation methods. Additionally,it is hosting DeepFakeDetection Dataset. In particular, this dataset contains 363 original sequences from 28 paid actors in 16 different scenes as well as over 3000 manipulated videos using DeepFakes and their corresponding binary masks. All videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.
Results
DeepFakeDetection dataset.
Results based on frames.
Test accuracy using SVM and logistic regression classifier under different data settings.
Results based on videos. (We apply a simple majority vote over the single frame classifications).
Test accuracy using SVM and logistic regression classifier.
Datasets Faces-HQ
This repo uses and combines several datasets to form Faces-HQ:
Citation
If this work is useful for your research, please cite our paper:
@misc{durall2019unmasking,
title={Unmasking DeepFakes with simple Features},
author={Ricard Durall and Margret Keuper and Franz-Josef Pfreundt and Janis Keuper},
year={2019},
eprint={1911.00686},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Some notes on data pre-processing
Some users have difficulties to get the deteection working on new data sets. Here are some remarks:
For complex scenes, you need to run a feace detection first! Our approach will not work if the face/fake is not the dominant part of the input. Try to capture the inner parts of the faces without a lot of background...
Any re-sampling/re-scaling of the input images might distort the frequency spectrum: Do NOT resize the images, resize the spectra afterwards! Also: some prominent face detectors do resizing, don't use them if you can't turn it off.
Use square input images (non square image might distort the radial sampling)
Plot the spectra of your input data to check if they show the charaecteristic propoerties
Our approach might not work on videos/images that have been compressed to a large extend (impacts the spectrum).
How to Download Source code
Find the link below to download it.
You can download the code once you become a subscriber. You can be a free subscriber or become a paid subscriber, you also get a 7-day free trial once you opt for a paid version and get full access.