Deep Fake Image and Video Detection using CNN’s and RNN’s
DeepFake is composed from Deep Learning and Fake and means taking one person from an image or video and replacing with someone else likeness using technology such as Deep Artificial Neural Networks.
Large companies like Google invest very much in fighting the DeepFake, this including release of large datasets to help training models to counter this threat.The phenomen invades rapidly the film industry and threatens to compromise news agencies. Large digital companies, including content providers and social platforms are in the frontrun of fighting Deep Fakes. GANs that generate DeepFakes becomes better every day and, of course, if you include in a new GAN model all the information we collected until now how to combat various existent models, we create a model that cannot be beatten by the existing ones.
!pip install -U --upgrade tensorflow
This Python code installs or upgrades the TensorFlow library using the pip package manager. TensorFlow is an open-source machine learning framework thats used for a variety of applications, including deepfake detection. The ! at the beginning allows this command to be run in a Jupyter notebook or similar interactive environment. The -U or — upgrade option ensures that if TensorFlow is already installed, it will be upgraded to the latest version available.
import sys
import sklearn
import tensorflow as tf
import cv2
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import iplot
from matplotlib import pyplot as plt
This Python code appears to import a set of libraries that are commonly used for data science and machine learning tasks, with a specific focus on deepfake detection:
sys: Provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.
sklearn: An abbreviation for Scikit-learn, it is a machine learning library that contains various tools for data mining and data analysis.
tensorflow: An open-source library for numerical computation and machine learning, particularly deep neural networks.
cv2: The OpenCV library for computer vision tasks, which includes capabilities for image and video analysis.
pandas: A data manipulation and analysis library that offers data structures and operations for manipulating numerical tables and time series.
numpy: A package for scientific computing that provides support for arrays and matrices, along with a collection of mathematical functions to operate on these data structures.
plotly.graph_objs and plotly.offline.iplot: Components of the Plotly library, which is used to create interactive plots and figures for data visualization.
matplotlib.pyplot: A plotting library that is often used to generate 2D graphics and plots. Although the code block solely consists of import statements and doesnt perform any operations, it suggests that the larger script or application of which this code is a part likely engages in deepfake detection by analyzing visual data using cv2, manipulating and analyzing data using pandas and numpy, employing machine learning algorithms sklearn and tensorflow, and visualizing results plotly and matplotlib.
import os
def get_data():
return pd.read_csv('../input/deepfake-faces/metadata.csv')
This Python code defines a function get_data that reads a CSV file named metadata.csv from the directory ../input/deepfake-faces/ using the pandas library implicitly assumed to be imported as pd and returns the resulting DataFrame. This function could potentially be part of a larger script for loading data that will be used in deepfake detection analysis.
meta=get_data()
meta.head()
The code performs two steps:
meta = get_data: Calls the get_data function, which is presumably defined elsewhere in the code. This function is expected to retrieve some form of data necessary for the deepfake detection process, potentially including images, videos, or metadata. The data is then stored in the variable meta.
meta.head: This line suggests that the meta variable is a pandas DataFrame, which is commonly used for handling tabular data in Python. The .head method is a pandas function that returns the first five rows of the DataFrame meta. This is typically used to quickly inspect the first few entries of the dataset to ensure its been loaded correctly. Please note that without the complete context or the implementation of the get_data function, this explanation is based on common usage patterns for these lines of code.
real_df = meta[meta["label"] == "REAL"]
fake_df = meta[meta["label"] == "FAKE"]
sample_size = 8000
real_df = real_df.sample(sample_size, random_state=42)
fake_df = fake_df.sample(sample_size, random_state=42)
sample_meta = pd.concat([real_df, fake_df])
This Python code is designed to create a balanced dataset of real and fake samples, which can be used for training a deepfake detection model. The dataset is presumably stored in a dataframe meta with a column label that specifies whether each entry is REAL or FAKE.
It first filters the meta dataframe into two separate dataframes: real_df containing only the real examples and fake_df containing only the fake examples.
It then randomly selects 8000 samples from each of the real and fake dataframes using the .sample method, ensuring reproducibility by setting a random state random_state=42.
Finally, it combines these two sampled subsets back into one dataframe sample_meta using pd.concat, which now has an equal number of real and fake samples and a total of 16000 samples for use in deepfake detection.
As mentioned instead of using 95k images we will only use 16000 images.
y = dict()
y[0] = []
y[1] = []
for set_name in (np.array(Train_set['label']), np.array(Val_set['label']), np.array(Test_set['label'])):
y[0].append(np.sum(set_name == 'REAL'))
y[1].append(np.sum(set_name == 'FAKE'))
trace0 = go.Bar(
x=['Train Set', 'Validation Set', 'Test Set'],
y=y[0],
name='REAL',
marker=dict(color='#33cc33'),
opacity=0.7
)
trace1 = go.Bar(
x=['Train Set', 'Validation Set', 'Test Set'],
y=y[1],
name='FAKE',
marker=dict(color='#ff3300'),
opacity=0.7
)
data = [trace0, trace1]
layout = go.Layout(
title='Count of classes in each set',
xaxis={'title': 'Set'},
yaxis={'title': 'Count'}
)
fig = go.Figure(data, layout)
iplot(fig)
This Python code is creating a bar chart to visualize the counts of real and fake labels in training, validation, and test datasets. Its using Plotly for plotting, where go.Bar creates bar chart traces for real and fake counts separately. These counts are determined by summing the number of REAL and FAKE labels in Train_set, Val_set, and Test_set. The resulting bar chart is displayed using iplotfig, showing the distribution of real and fake labels across the three datasets.
The original image dataset were biased with more fake images than real since we are taking a sample of it its better to take equal proportion of real and fake images.
plt.figure(figsize=(15,15))
for cur,i in enumerate(Train_set.index[25:50]):
plt.subplot(5,5,cur+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(cv2.imread('../input/deepfake-faces/faces_224/'+Train_set.loc[i,'videoname'][:-4]+'.jpg'))
if(Train_set.loc[i,'label']=='FAKE'):
plt.xlabel('FAKE Image')
else:
plt.xlabel('REAL Image')
plt.show()
This Python code snippet is creating and displaying a grid of images with labels indicating whether they are real or fake. Specifically, it is:
Setting up a figure with a size of 15x15 inches using matplotlib.pyplot aliased as plt.
Iterating over a subset of indexes from the Train_set from the 26th to the 50th element.
For each iteration, it configures a subplot within a 5x5 grid without ticks on the x and y axes and no grid lines.
It reads images from a specified directory using cv2.imread, with the image filenames constructed from Train_set and adds the .jpg extension.
It uses plt.imshow to display these images in the subplots.