Tensor-Flow Baiscs 2: Text Classification With Tensorflow Hub
This post contains the explanation and source code for my video of tensorflow part 2
This post classifies movie reviews as positive or negative using the text of the review. This is an example of binary—or two-class—classification, an important and widely applicable kind of machine learning problem.
The tutorial demonstrates the basic application of transfer learning with TensorFlow Hub and Keras.
It uses the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database. These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are balanced, meaning they contain an equal number of positive and negative reviews.
This notebook uses tf.keras
, a high-level API to build and train models in TensorFlow, and tensorflow_hub
, a library for loading trained models from TFHub in a single line of code.
import os
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices("GPU") else "NOT AVAILABLE")
Download the IMDB dataset
The IMDB dataset is available on imdb reviews or on TensorFlow datasets. The following code downloads the IMDB dataset to your machine (or the colab runtime):
# Split the training set into 60% and 40% to end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
Explore the data
Let's take a moment to understand the format of the data. Each example is a sentence representing the movie review and a corresponding label. The sentence is not preprocessed in any way. The label is an integer value of either 0 or 1, where 0 is a negative review, and 1 is a positive review.
Let's print first 10 examples.
train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))
train_examples_batch
Let's also print the first 10 labels.
Rest of the article is under paid subscription, please consider subscribing to my newsletter as It helps me pay my tuition fees and also you get access to hundreds of source code, and new articles published everyday. Please consider subscribing it’s just the price of one cup of coffee.