Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market
Enhancing Stock Market Predictions: The Role of Deep Learning and Candlestick Chart Analysis
Download the source code from the link at the end of this article.
As a result of the multiple factors that influence the stock market, predicting the stock market has always been a very difficult task. From a company’s performance to economic indicators to investor sentiment and all other global events, all of these factors play a role. In the stock market, the stakes are extremely high since accurate predictions can result in significant financial gains, whereas errors can result in tremendous losses. In order to make informed decisions, both investors and traders use different forecasting methods. They aim to maximize returns while minimizing risks. Predicting market trends is not just of academic interest, but has very serious practical implications that affect the livelihood of millions of people, the stability of economies, and the flow of capital around the world.
For predicting stock movement, investors have used a variety of approaches, from fundamental analysis of financial statements to technical analysis of price charts. The unpredictability and large number of variables involved make it very difficult to make consistent, reliable predictions. Introducing machine learning into the financial market has caused a great shift. With machine learning, you can make more accurate predictions because it can process and analyze very large amounts of data. Our machine learning model can identify patterns and trends from historical data that are not immediately apparent to humans.
Deep learning algorithms, such as CNNs, convolutional neural networks, are one of the most important applications of machine learning. They are specifically designed to process and analyze image data, which makes them ideal for interpreting graphical representations like candlestick charts, which are commonly used in financial markets. Candlestick charts provide a visual representation of stock price movement over time in financial markets. These CNNs can be used to analyze these charts and detect price patterns that can help predict future market behavior.
We will explore how deep learning techniques, including CNNs and candlestick charts analysis, can help us improve stock market predictions in this article. In the article, we will discuss how these two techniques will be combined to maximize the strengths of each. Candlestick charts provide detailed time-specific data and the ability to identify patterns in CNNs. By the end of this article, we will have a better understanding of how to forecast stock movements using this approach.
Background and Literature Review
Stock Market Prediction Challenges
According to the efficient market hypothesis, stock prices fully reflect all available information, making it nearly impossible for investors to consistently achieve higher returns than the market through stock selection or market timing. Because of this, any new information is quickly incorporated into stock price, and so any attempt to forecast stock movement is flawed unless it is based on information not yet available to all traders. It creates a very significant challenge for those who attempt to forecast stock price, and suggests that higher returns can only be accomplished by taking additional risks.
Previous Approaches
Stock market predictions have been made using various methods over the years. A neural network, specifically the back-propagation method, was used in earlier approaches. The technique, which dates back to the 1990s, involves adjusting the weights of a neural network’s connections to minimize prediction errors. Data from a randomly selected German stock market was used with the back-propagation method to predict stock prices in one study. In spite of its promise, this approach was limited by computational resources at the time and simple models.
With the rise of social media, new sources of data have emerged for stock market predictions. Public sentiment about stocks and the market can be gauged through sentiment analysis, which analyzes tone and content of social media posts. Studies have shown that social media sentiment can provide valuable insights into market behavior, such as one that analyzed Twitter data to predict Dow Jones Industrial Average movements. In addition to its advantages, sentiment analysis presents challenges, including the need to remove noise and irrelevant information from large volumes of information.
In another innovative approach, audio waveforms were used to process historical stock data. Analyzing audio representations of stock data, such as the S&P 500 and CBOE, was used to forecast market movements. In financial forecasting, non-traditional data formats and advanced machine learning techniques are becoming increasingly popular.
Candlestick Charts in Prediction
In technical analysis, candlestick charts provide a visual representation of price movements over a specified period. Several key pieces of information are displayed on each candlestick on the chart: the opening price, closing price, high price, and low price. When the stock closed higher or lower than it opened, the candlestick’s body is filled or hollow, while the shadows are the highs and lows.
Candlestick charts have historically been used by traders to identify trends and patterns. The “hammer,” “doji,” and “engulfing” patterns are believed to predict future price movements. Some candlestick patterns can be statistically significant indicators of future price direction. The Brazilian stock market was analyzed using sixteen different candlestick patterns, while another used wavelet-based textures to improve prediction accuracy.
Machine learning models have incorporated candlestick charts into their predictive capabilities in recent years. For example, one approach involved using a convolutional encoder to learn the patterns within candlestick charts, and subsequently using this knowledge to make stock market investments. A different study compared artificial neural networks (ANNs), support vector machines (SVMs), random forests, and naive Bayes classifiers using candlestick chart data for input.
Candlestick chart data has been successfully analyzed using traditional machine learning algorithms, such as Random Forest. Researchers have been able to achieve good predictive performance by combining these algorithms with technical indicators. Furthermore, some studies have explored the use of external data sources, such as financial news and social media sentiment, in conjunction with candlestick chart analysis.
Technical analysis and machine learning are combined in candlestick charts to predict stock market movements. The visual and time-series nature of candlestick charts provides a rich source of data for machine learning models, allowing for more nuanced and accurate predictions. It is likely that candlestick charts will play an increasingly important role in financial forecasting as research in this area advances.
Data Collection
A study of the Taiwan and Indonesian stock markets is conducted using historical data from fifty Taiwanese companies and ten Indonesian companies. Their growth and prominence within their respective markets led to their selection. There are three types of historical data: training, testing, and independent. Training data spans January 1, 2000, to December 31, 2016, while testing data spans January 1, 2017, to June 14, 2018. Test data and independent data are used for further validation.
A API from Yahoo! Finance provided comprehensive historical time-series data, including daily open, close, high, and low prices, and trading volume. In order to make accurate predictions, it is imperative to use well-segmented and historical data, which ensures that the models are trained on a variety of market conditions.
Data Preprocessing
The Matplotlib library was used to convert historical stock data into candlestick charts. It is crucial to convert candlestick charts into machine learning models, since they provide a visual representation of stock price movements. Data was preprocessed into five, ten, and twenty-day periods. As a result of this segmentation, the study can investigate how the model predicts stock movements over different time frames.
Study participants also experimented with including and excluding volume indicators from candlestick charts in addition to varying trading periods. Some charts included the volume indicator, which represents the number of shares traded during a specific period. To investigate the impact of image resolution on the models’ performance, candlestick charts were generated with 50x50 pixels and 20x20 pixels.
Model Architecture
Several machine learning models, both modern and traditional, were used to analyze and predict stock market movements using candlestick charts. Convolutional Neural Networks (CNNs) were the primary focus, but other models were also compared, including Residual Networks (ResNet), VGG Networks, Random Forest, and K-Nearest Neighbors.
Convolutional Neural Networks (CNNs)
In deep learning, CNNs are particularly effective at handling image-based data. Convolutional, pooling, and fully connected layers work together to extract features from input images. Input images are filtered using convolutional layers, which capture edges, textures, and shapes. By pooling layers, the data becomes less dimensional, making the network more efficient.
The CNN architecture consisted of four convolutional layers followed by ReLU activation functions and max-pooling layers. To prevent overfitting, dropout layers were added to set a fraction of the input units to 0. As the final layer of the network, the dense, fully connected layer predicted whether the stock price would rise or fall.
We used CNNs to determine patterns in images containing image-based input, such as candlestick charts. By analyzing candlestick charts’ visual features, CNNs can recognize subtle patterns.
Other Models
• Residual Networks (ResNet): ResNet uses shortcuts or skip connections to jump over certain layers. It mitigates the vanishing gradient problem, where gradients used to train networks become too small, slowing down learning. When deep networks are required to capture complex patterns in data, ResNet is particularly useful.
VGG Networks: VGG Networks are known for their simplicity and effectiveness in image recognition. In order to reduce the spatial dimensions of the data, they use 3x3 convolutional layers stacked on top of each other. Because of their large number of parameters, VGG Networks require significant resources to train, despite their relatively straightforward architecture.
Random Forest: This method combines multiple decision trees to improve predictive accuracy. Decision trees in the forest are trained on different subsets of data, and the final prediction is made by averaging their predictions. As a result of averaging multiple models, Random Forest is particularly effective in handling datasets with many features.
• K-Nearest Neighbors (KNN): KNN classifies data points based on their closest neighbors’ majority class. A non-parametric method makes no assumptions about the underlying data distribution. When the relationship between the data points is highly nonlinear, KNN is very effective for small datasets.
Training and Evaluation
During the preprocessing stage, candlestick chart images were used to train the models. The models were trained with images representing different trading periods and resolutions, both with and without volume indicators. To determine which parameters would yield the best predictive accuracy, we combined parameters.
Several key performance metrics were used to evaluate the models during training:
Accuracy: The proportion of correct predictions made by the model. A general measure of the model’s effectiveness.
(True Positive Rate or Recall): The proportion of actual positive cases (e.g., stock price increases) that the model correctly identifies. For understanding the model’s ability to detect positive stock price movements, this metric is important.
• Specificity (True Negative Rate): How well the model recognizes actual negative events (such as stock price decreases). Measures the model’s ability to avoid false positives.
Matthew’s Correlation Coefficient (MCC): Balanced measure that considers true positives and false positives. Models with imbalanced datasets, where one class may be more prevalent than another, can be evaluated with MCC.
The study aimed to identify the best model for predicting stock market movements based on these metrics. In the evaluation process, CNNs were compared to other models across different datasets and configurations. As a result of this comprehensive approach, the selected model not only performed well on the training data but also generalized effectively to unseen data, making it an effective tool for predicting stock markets.
Taiwan 50 Dataset Results
Based on the Taiwan 50 dataset, Convolutional Neural Networks (CNNs) performed better than other models. In experiments, CNNs consistently outperformed traditional machine learning models like Random Forest and K-Nearest Neighbors (KNN) across a variety of configurations — trading periods, image dimensions, and volume indicators included or excluded.
In general, longer trading periods (20 trading days) led to better predictive performance when analyzing the effect of different trading periods. CNNs were 91.5% accurate with 50x50 pixel candlestick charts including volume indicators over a 20-day trading period. The predicted and actual stock movements showed a strong correlation with a high Matthew’s Correlation Coefficient (MCC) of 0.827.
Volume indicators were not always more accurate in candlestick charts. Some results were improved by excluding the volume indicator. On the Taiwan 50 dataset, CNNs with a 20-day trading period and 50x50 pixel charts achieved 92.2% accuracy and an MCC of 0.84 when volume was excluded.
The Taiwan 50 dataset clearly showed that CNNs performed best in predicting stock movements when configured with longer trading periods and larger image dimensions. Occasionally, excluding volume indicators improved accuracy marginally, suggesting that volume data is not always necessary for best performance.
Indonesia 10 Dataset Results
In the Indonesia 10 dataset, CNNs again outperformed other models, as in the Taiwan dataset. In the Indonesian stock data experiments, trading periods, image sizes, and volume indicators were varied.
Based on 20-day trading periods and 50x50 pixel candlestick charts without volume indicators, CNNs performed best for Indonesia 10. With 92.1% accuracy and 0.837 MCC, this configuration provided a high level of predictive accuracy. A smaller dataset, Indonesia 10, showed that CNNs can capture underlying patterns in stock price movements.



