How To Use Auto-encoder As a Classifier With Fashion MNIST Dataset
Fashion MNIST dataset is a 28x28 grayscale image of 70,000 fashion products from 10 categories, with 7,000 images per category.
The training set has 60,000 images, and the test set has 10,000 images. It is a replacement for the original MNIST handwritten digits dataset for producing better results. The fashion MNIST dataset consists of 10 different classes of fashion accessories like shirts, trousers, sandals etc.
The image dimensions, training and test splits are similar to the original MNIST dataset. The dataset is freely avaiable on this URL and can be loaded using both tensorflow and keras as a framework without having to download it on your computer.
So what we want to do is built a convolutional autoencoder and use the encoder part of it combined with fully connected layers to recognize a new sample from the test set correctly.
# Loading the dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import gzip
from sklearn.model_selection import train_test_split
from keras.models import Model, Sequential
from keras.optimizers import RMSprop, Adam, SGD, Adadelta
from keras.layers import Conv2D, Input, Dense, Flatten,Dropout, merge, Reshape, MaxPooling2D, UpSampling2D, Conv2DTranspose
from keras.layers.normalization import BatchNormalization
from keras.callbacks import ModelCheckpoint, CSVLogger, EarlyStopping
from keras import regularizers
from keras import backend as k
from keras.utils import to_categorical
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
import itertools
# import os
# os.environ["TF_CPP_MIN_LOG_LEVEL"]="3"
First, we import all the necessary dependencies and libraries that will be used in the code. These include pandas, numpy, matplotlib, gzip, sklearn, keras, etc. Next, we load the fashion MNIST dataset which is a dataset of 70,000 grayscale images of clothing items. Then, we split the dataset into train and test sets using the train_test_split function from sklearn. After that, we define the architecture of the autoencoder model. The autoencoder model is a type of neural network that is trained to replicate the input data at the output layer. It consists of an input layer, hidden layers, and an output layer. Here, the layers are defined using the Sequential function from keras. Next, we define the optimizer to be used for training the model. The optimizer is responsible for updating the weights of the model during training to minimize the loss. Then, we define the layers of the model using Conv2D, Input, Dense, Flatten,Dropout, merge, Reshape, MaxPooling2D, UpSampling2D, Conv2DTranspose, BatchNormalization, etc. These layers are responsible for the convolution, pooling, normalization, and reconstruction of the images. The model is then trained using the fit function on the train set, and validated using the test set. During training, checkpoints are saved using the ModelCheckpoint function to monitor the progress of the model. After training, the model is used to make predictions on the test set, and the accuracy of the model is evaluated using metrics such as confusion matrix, classification report, and accuracy score. Finally, we can use the model to classify new images by passing them through the autoencoder model and comparing the reconstructed images with the original ones. This way, the autoencoder can be used as a classifier for the fashion MNIST dataset.
Loading The Data
def extract_data(filename, num_images):
with gzip.open(filename) as bytestream:
bytestream.read(16)
buffer = bytestream.read(28 * 28 * num_images)
data = np.frombuffer(buffer, dtype = np.uint8).astype(np.float32)
data = data.reshape(num_images, 28,28)
return data
This code first defines a function called extract_data that takes in two inputs: the filename of the dataset and the number of images in the dataset. The function then opens the file using the gzip module and reads the first 16 bytes, which are not relevant for the data. Next, the function reads the data from the file and stores it in a variable called buffer. This variable contains the raw pixel data for all the images in the dataset. The next step is to convert the data into a format that can be easily processed by a machine learning model. This is achieved by using the numpy module to convert the raw data into an array of 8-bit unsigned integers np.uint8. Additionally, the data is converted to a 32-bit floating-point type to make it suitable for training. Finally, the data is reshaped into the desired format, with the number of images as the first dimension and the dimensions of each image 28x28 as the remaining dimensions. The reshaped data is then returned from the function for further use in training an autoencoder as a classifier on the fashion MNIST dataset.
train_data = extract_data('train-images-idx3-ubyte.gz' ,60000)
test_data = extract_data('t10k-images-idx3-ubyte.gz',10000)
First, we import the Fashion MNIST dataset using the extract_data function. This dataset contains images of 60,000 training examples and 10,000 test examples. We then use the extract_data function to extract the training data from the train-images-idx3-ubyte.gz file and store it in the train_data variable. Similarly, we do the same for the test data, storing it in the test_data variable. This data will be used to train and test our autoencoder classifier model.
The process of reading labels from a dataset is quite similar to how the data itself is read, and it can be described in a few straightforward steps:
Firstly, a function is defined specifically for the purpose of opening the label file. This function utilizes a bytestream to access the file’s contents. The `bytestream.read()` method is employed here, where two key parameters are passed: the dimension of the label, which is set to 1, and the total number of images contained within the dataset.
Following this, the function addresses the conversion of the data read from the bytestream. The content, initially stored as a string in the buffer, is transformed into a NumPy array. This array is of the int64 data type, ensuring that it can effectively handle the label data.
Finally, an important aspect of this process is the handling of the array’s shape. In this particular scenario, there is no requirement for reshaping the array. The reason is that the ‘labels’ variable, where this array is stored, inherently forms a column vector. This vector adheres to the dimensions of 60,000 x 1, which perfectly aligns with the total number of images in the dataset. This streamlined process ensures efficient and accurate reading of the labels, mirroring the method used for reading the data.
def extract_labels(filename, num_images):
with gzip.open(filename) as bytestream:
bytestream.read(8)
buffer = bytestream.read(1 * num_images)
labels = np.frombuffer(buffer, dtype = np.uint8).astype(np.uint64)
return labels
This code is a function used to extract labels from a dataset, specifically the fashion MNIST dataset, in order to use an autoencoder as a classifier. The function takes two inputs, the filename and the number of images in the dataset. First, the gzip library is used to open the file containing the images and labels. Next, the function reads the first 8 bytes of the file. These bytes are not needed for extracting labels in this case, but they typically contain information about the dataset. Then, a buffer is created to store the labels. The buffer size is determined by multiplying the number of images with the size of each label, which is 1 byte in this case. The function then reads the buffer from the bytestream and converts it into a numpy array, with a data type of unsigned 8-bit integer. This array contains the labels for each image in the dataset. Finally, the function returns the labels as an output, which can be used for training the autoencoder as a classifier.