Machine Learning for Time Series
Here, we will examine various examples of time series analysis incorporating machine learning techniques.
Time series analysis is still a relatively new field; however, it has become increasingly popular.
During previous articles, we focused on forecasting, but now we are transitioning to machine learning. To date, we have primarily focused on statistical models to forecast time series. For the creation of these models, it was necessary to establish a foundational theory regarding the dynamics of a given time series, as well as the statistical properties governing noise and uncertainty. Having hypothesized these dynamics, we then made predictions and assessed our level of certainty. Both model identification and parameter estimation rely on careful consideration of how to best describe the dynamics inherent in our data.
We will now examine methodologies that do not propose an underlying process or prescribe any rules pertaining to it. We should instead focus on identifying patterns that accurately describe the behavior of a process, in order to predict the desired outcome, such as the classification label for a time series. Moreover, unsupervised learning for time series, specifically time series clustering, is also to be considered.
With tree-based methodology, we address prediction and classification tasks, and clustering as well. To effectively utilize tree-based methodologies, it is necessary to formulate time series features. There is no inherent time-awareness in trees, unlike in ARIMA models.
Clustering and distance-based classification can be done with the input time series or with features. An analysis of dynamic time warping is necessary to utilize the time series directly. By using this metric directly on time series, we are preserving the entire chronological information rather than reducing it to a limited set of characteristics.
Time Series Classification
As part of this section, we will demonstrate the process of transforming raw electroencephalogram (EEG) time series into feature sets. Machine learning algorithms can make use of these feature sets. As a next step, we will classify the EEG data using decision tree techniques.
Selecting and Generating Features
We discussed the objectives of time series feature generation in the previous article. Using the tsfresh library, we illustrated feature generation for a time series dataset. As discussed previously, in the following section we will generate time series features using the Cesium package as an alternative.
Among Cesium’s features are its vast time series data sets for analysis and research. The collection includes a variety of data sets, including EEG data that originally appeared in a 2001 manuscript. Data collection and preparation are described in detail in the paper. We will discuss the EEG time series data set based on five distinct categories. EEG segments from ongoing time readings were extracted for each category. Data can be analyzed from a wide range of sources since samples come from a variety of sources. Recordings of EEGs taken from healthy individuals fall into the first two categories. A pair of recordings was captured: one with the eyes open, and one with the eyes closed. Two categories were developed based on the EEG recordings of patients with epilepsy during seizure-free periods. Aside from seizures, these recordings were obtained from two areas of the brain not associated with seizures. Final category is epileptic intracranial recordings. Researchers interested in studying brain behavior during seizures can take advantage of this opportunity. These five categories, therefore, include a comprehensive set of time series data that can be used for a variety of research purposes, from studying normal brain function to understanding epilepsy dynamics.
Using a convenience function of Cesium, the data set is gathered.
from cesium import datasets
eeg = datasets.fetch_andrzejak()
We may find it helpful to examine several examples of the data we are evaluating in order to gain an understanding of how we should categorize the sequence of events.
import matplotlib.pyplot as plt
# Define the indices of measurements you want to plot
indices = [0, 300, 450]
# Loop over each subplot
for i, index in enumerate(indices, start=1):
plt.subplot(3, 1, i)
plt.plot(eeg["measurements"][index])
plt.legend([eeg['classes'][index]])
plt.show()
The plots presented in Figure 1 illustrate variations between the classes of EEG measurements. As the EEG plots capture brain activity during different tasks in healthy and epilepsy-affected individuals, the marked distinctions observed are not surprising.