Stock Market Trend Analysis With Hidden Memory Model & Long Short Term Memory
Today I am going to show you, how you can do stock market trend analysis with HMM and LSTM. We are going to code a reasearch paper.
We are going to code a research paper, which is published online, check it here.
Structure of our code base:
.
├── .gitignore
├── FIGURE
│ ├── best_iter.png
│ ├── test1.jpg
│ ├── test2.jpg
│ ├── train1.jpg
│ └── train2.jpg
├── LICENSE
├── PAPER
│ └── 2104.09700.pdf
├── README.md
├── XGB_HMM
│ ├── GMM_HMM.py
│ ├── __pycache__
│ │ ├── GMM_HMM.cpython-36.pyc
│ │ ├── evaluate_A_pi.cpython-36.pyc
│ │ ├── form_B_matrix_by_XGB.cpython-36.pyc
│ │ ├── plot_result.cpython-36.pyc
│ │ ├── predict.cpython-36.pyc
│ │ ├── re_estimate.cpython-36.pyc
│ │ └── xgb.cpython-36.pyc
│ ├── form_B_matrix_by_XGB.py
│ ├── plot_result.py
│ ├── predict.py
│ ├── re_estimate.py
│ └── xgb.py
├── dataset_code
│ ├── HMM_duoyinzi.py
│ ├── HMM_hangqing.py
│ ├── __pycache__
│ │ ├── HMM_duoyinzi.cpython-36.pyc
│ │ ├── HMM_hangqing.cpython-36.pyc
│ │ ├── combine.cpython-36.pyc
│ │ ├── pred_proba_GMM.cpython-36.pyc
│ │ ├── pred_proba_XGB.cpython-36.pyc
│ │ └── process_on_raw_data.cpython-36.pyc
│ ├── combine.py
│ ├── form_df_all.py
│ ├── pred_proba_GMM.py
│ ├── pred_proba_XGB.py
│ └── process_on_raw_data.py
├── extracted_content.ipynb
├── extracted_content_5Humane.ipynb
├── extracted_content_5Humane_modified.ipynb
├── extracted_content_modified.ipynb
├── main_single_score.py
├── main_train_model.py
├── public_tool
│ ├── __pycache__
│ │ ├── bagging_balance_weight.cpython-36.pyc
│ │ ├── combine_allow_flag.cpython-36.pyc
│ │ ├── evaluate_plot.cpython-36.pyc
│ │ ├── form_accuracy.cpython-36.pyc
│ │ ├── form_index.cpython-36.pyc
│ │ ├── form_model_dataset.cpython-36.pyc
│ │ ├── random_cut.cpython-36.pyc
│ │ └── solve_on_outlier.cpython-36.pyc
│ ├── bagging_balance_weight.py
│ ├── combine_allow_flag.py
│ ├── evaluate_plot.py
│ ├── form_accuracy.py
│ ├── form_index.py
│ ├── form_model_dataset.py
│ ├── random_cut.py
│ └── solve_on_outlier.py
├── test.ipynb
└── train_model
├── GMM_HMM.py
├── LSTM.py
├── XGB_HMM.py
├── __pycache__
│ ├── GMM_HMM.cpython-36.pyc
│ ├── LSTM.cpython-36.pyc
│ ├── XGB_HMM.cpython-36.pyc
│ ├── train_HMM_model.cpython-36.pyc
│ └── train_LSTM_model.cpython-36.pyc
├── train_HMM_model.py
└── train_LSTM_model.py
Now let’s start coding:
import pickle
import numpy as np
from dataset_code.process_on_raw_data import form_raw_dataset, df_col_quchong
from dataset_code.HMM_duoyinzi import solve2, form_model_dataset, form_model
from public_tool.evaluate_plot import evaluate_plot
import warnings
warnings.filterwarnings("ignore")
if __name__ == '__main__':
temp = pickle.load(open('save/classified by id/000001.XSHE.pkl', 'rb'))
temp = df_col_quchong(temp)
temp = [i for i in temp.columns]
feature_list = temp[temp.index('AccountsPayablesTDays'):]
score_record = np.zeros(len(feature_list))
for i in range(len(feature_list)):
now_feature = [feature_list[i]]
dataset, label, lengths, col_nan_record = form_raw_dataset(now_feature, label_length=3, verbose=False)
if len(label) == 0:
print('skip ' + now_feature[0])
continue
solved_dataset, allow_flag = solve2(dataset, now_feature, now_feature)
train_X, train_label, train_lengths = form_model_dataset(solved_dataset, label, allow_flag, lengths)
model = form_model(train_X, train_lengths, 3, 'diag', 1000, verbose=False)
score = evaluate_plot(model, train_X, train_label, train_lengths)
score_record[i] = score
print('all:%s, now:%s, ' % (len(feature_list), i + 1) + now_feature[0] + ': score:%s' % score)
pickle.dump([score_record, feature_list], open('save/duoyinzi_solve2_score.pkl', 'wb'))
pickle.dump([score_record, feature_list], open('save/duoyinzi_solve2_score.pkl', 'wb'))
The Python code is involved with loading, processing, and evaluating dataset features using machine learning models, specifically Hidden Markov Models HMMs. It uses libraries such as pickle, numpy, and various custom modules from a package called dataset_code and a module public_tool. Initially, the script disables warnings to keep the output clean. The main process begins by loading a dataset from a pickle file. It assumes that the data has some form of structure, as it proceeds to apply a deduplication function, df_col_quchong, to the DataFrames columns.