Foreign exchange rate prediction using deep learning

Foreign Exchange Rate Prediction Using Deep Learning

Prediction of Forex Rate (USD/INR) Using LSTM & GRU

The foreign exchange rate (Forex) market is the largest and most crucial trading market in the world followed by the credit market. The foreign exchange rate market determines the exchange rate of different currencies of the world. It involves buying, selling, and exchanging currencies at current or determined prices.

As we can see from the above figure, the average daily trading volume of the Forex market is way too higher than other big stock exchanges in the world.

Some of the nice quotes on trading are –

Trading effectively is about assessing probabilities, not certainties”.

Yvan Byeajee, Paradigm Shift: How to cultivate equanimity in the face of market uncertainty

The stock market is a device for transferring money from the impatient to the patient”.

Warren Buffet

The ability to predict the foreign exchange rate is a valuable skill in the trading business. But the prediction of the foreign exchange rate is a highly complex time series problem. As forex prices depend on many external, political factors, its prediction is a very challenging task.

It’s not whether you’re right or wrong that’s important, but how much money you make when you’re right and how much you lose when you’re wrong”.

Stanley Druckenmiller
Forex trading using deep learning

Deep learning models proven to be very efficient in the prediction of complex financial analytics problems.

In the case of time series problems, Recurrent Neural Networks (RNNs) proven to outperform traditional Machine Learning algorithms and Artificial Neural Networks (ANNs).

RNNs are the networks with loops in their architecture as shown in the below figure. This feature of RNN helps them to retain information from previous sequences or events which makes them favorable for time series problems.

In this article, we will demonstrate the application of two different variants of RNNs i.e., Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) for prediction of the foreign exchange rate (i.e., USD/INR).

For knowing more about the architecture and basic working principle of RNN and LSTM you can refer to colah’s blog and for GRU you can refer to dive into deep learning blog

Firstly, we will build Artificial Neural Networks as a base model, and later, we will improve the performance in prediction by applying LSTM and GRU. The full code with the dataset demonstrated in this article is available at this repo.

The article is divided in the following order:

  1. About Dataset
  2. Data pre-processing
  3. Summary Statistics
  4. Train Test Split
  5. Data Normalization
  6. Modeling
    • ANN
    • Result of ANN
    • LSTM
    • Result of LSTM
    • GRU
    • Result of GRU
  7. Final Assessment of Model
  8. Conclusion

1. About Dataset

The data is the heart of any machine learning or deep learning project. in this case study, we have web scraped the Foreign exchange rates of USD/INR for the time period of 26 Aug 2010 to 26 Aug 2020 i.e., 10 years from the website in.investing.com.

The sample entries of the dataset are shown in below table.

Image of data sample

2. Data pre-processing

For the time-series problem, we have to convert our first column i.e., date to index.

For doing this there are two methods, in one method while reading CSV file in pandas you can specify index_col=’Date’ and parse_date =True. By specifying these two parameters, pandas preprocess your DateTime column i.e., Date during import, and set it to the index of the data frame.

The second method is related to post-reading means after reading the data you can first convert the date column to DateTime and then set it to the index of data frame. The code snippet demonstrating this method is shown below.

After doing the above pre-processing, the dataset looks like below as shown in table.

Data after pre-processing

As per the above table, our target variable is Price which we have to predict based on Date.

Sorting the dataframe

Next, we will sort the pandas dataframe with respect to the Date column in ascending order. The code snippet for sorting the dataframe is shown below.

Data after sorting the dataframe
Figure. The dataframe sorted by date

As our target variable is Price, so we will select Price column and date as index while discard all other columns present in the dataset.

The below figure shows the distribution of Price from August 2010 to August 2020. As per the figure, we can observe three spikes in the dataset first one is the major spike in the year 2014, the second spike observed in early 2019 while the third one is in the late 2020s.

Forex rate price visualization
Figure. Forex USD/INR price over the given years from 2010 to 2020

3. Summary Statistics

Next, we will explore the summary statistics of the data. As per the below table, we can observe that the average price in the dataset is 61.9025 while 50% of the price is around 64 and the maximum price in the dataset is 76.97.

Summary stats of Price

Further, we will check the distribution of data in percentage. As we can see from below figure, the total data comprise of 10 years and 80% of it lies in the interval between 2010 to 2018.

Total data description

4. Train Test Split

As we have seen 80% of the data lies in the interval of 2010 and 2018 so we will train the model for the date range of  26 August 2010 to 26 August 2018 and rest will be used for testing purposes.

The code snippet for splitting the dataset and plotting the train test data frame is shown below. As per the code, we have first selected the split date as 26-08-2020 based on which segregate training and test set.

Next, we have created the training set by selecting data points up to 26-08-2020 and test set by retaining all the data points after split date 26-08-2020.

The below plot showed the visualization of train test split of dataset.

Visualization of train test split
Figure. Line plot showing the train test split of the dataset

5. Data Normalization

In this step, we will normalize the dataset using a standard scaler normalization method. Standard Scaler normalization involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1.

The code for normalizing the dataset is shown below.

Next, we will segregate the train and test set into X_train, X_test, y_train, and y_test.

6. Modeling

In this step, we will build deep learning model and fit them into dataset for making the prediction. In this case study, we will first build the Artificial Neural Network (ANN) as a base model and evaluate its performance for making its result as a benchmark for LSTM and GRU models.

Artificial Neural Network (ANN)

Artificial neural networks (ANNs) are biologically inspired computational networks. ANNs are generally used in supervised learning problems in which we know the target labels of the data. It consists of mainly three layers: input, hidden, and output layer.

In our case, we have a single input, 12 neurons in hidden layers, and an output layer which will give the output result in terms of predicted price. We have used shallow neural network with one hidden layer for our use-case and this will be our base model.

The architecture of Artificial Neural Network is outlined in the below figure.

Structure of Artificial Neural Network
Figure. Structure of Artificial Neural Network

The code snippet of ANN model is shown below.

ANN model parameter

As we can see, we have 37 total parameters in our ANN model. So it is quite a lightweight model which helps in lowering down the risk of overfitting

The code for compiling and fitting the model is shared in the below code snippet.

The epochs summary of ANN model

Now let’s talk about some of the hyper-parameters we have used in our ANN model.

  1. loss = mean_squared_error: As it is a regression problem, we have used mean_squared_error as our loss function. The mean squared error (MSE) of an estimator measures the average of the squares of the errors i.e., the average squared difference between the predicted values by the model and actual values. The mathematical formulae for mean squared error are shown below
The mathematical formulae of Mean Squared Error
Figure. The mathematical formulae of mean squared error

2. metric = rmse: For metric we have used Root Mean Square Error (RMSE) which is the square root of mean squared error. It is basically the standard deviation of the residuals (i.e., difference of predicted and actual value). Residuals are a measure of how far from the regression line data points are.

3. Early stopping: Early stopping used to stop the training process of the model once the model stops showing improvement in terms of loss. Its monitor parameter used to specify which hyper-parameter we want to monitor while training the model. in our case we are monitoring loss.

As we can see from the above simulation, the model stopped training after 13 epochs as it stops showing improvement further.

Result of ANN

After fitting the model, we have perform model evaluation based on R2 score, RMSE and Mean Absolute error.

Performance metric Score
Train R2 score0.974
Train Adjusted R2 score0.974
Train MAE0.142
Train RMSE0.161
Test R2 score0.943
Test Adjusted R2 score0.942
Test MAE0.054
Test RMSE0.071
Table.. Results of ANN model

As we can see from the above results, the Test R2 score is 0.943 and the Adjusted R2 score is 0.942 which is quite good as in regression problem R2 and Adjusted R2 are two important performance metrics. The more the value closer to 1, the better is the model. Test RMSE of 0.071 and Test MAE of 0.054 are also relatively smaller errors which determines that the model is fitted decent regression line over the actual price.

Prediction plot of ANN
Figure. Prediction plot of ANN

As per the above prediction plot of ANN, it is observed that the model predicts closely the actual prices in the initial observations, but after 400 observations, its predicted values become too far from the actual values.

Next we will move towards more complex models in time series i.e., LSTM and see how it will perform.

LSTM

Long Term Short Memory (LSTM) networks are another variant of RNNs. They are capable of retaining long term dependencies among the sequence of events or data points. They were first introduced by Hochreiter & Schmidhuber (1997).

The major difference between RNN and LSTM is that  an LSTM has an explicit memory unit which stores information relevant for learning some task. In LSTMs, the memory units retain pieces of information even when the sequences get really long.

The structure of LSTM cell is shown below:

Basic structure of LSTM cell
Image source

Each LSTM cell consists of three gates Forget gate, Update gate and Output gate.

Forget Gate: This gate controls how much information needs to be discarded from the previous cell state depending on the new input.

Update Gate: This gate makes an update to the previous the cell state by writing a new piece of information to it.

Output Gate: This gate controls how much information needs to be passed on to the next LSTM layer based on the current cell state.

The structure of an LSTM cell allows an LSTM network to have a smooth and uninterrupted flow of gradients while backpropagating. This flow is also called the constant error carousel. Due to this characteristic LSTMs are able to solve the problem of vanishing and exploding gradients.

Data Pre-processing for LSTM

Before building the LSTM model, we have to pre-process the training and test set as shown in the below code snippet.

The code for building LSTM model using keras framework is shown below.

LSTM model parameters

As we can see from the above code, we have used 50 neurons in the LSTM cell with activation function relu, kernel initializer lecun_uniform, and return sequence False.

For understanding the concept of kernel initializer you can refer a good discussion on datasciencestackexchange.

The code for compiling and fitting the LSTM model is shown below.

The epochs summary of LSTM model

As we can see from above plot, the training of LSTM model stopped after 40 epochs due to early stopping.

Result of LSTM

The result after fitting the LSTM model is shown below.

Performance MetricsScore
Train R20.977
Train Adjusted R20.977
Train RMSE0.151
Train MAE0.132
Test R20.879
Test Adjusted R20.879
Test RMSE0.102
Test MAE0.075
Table Result of LSTM predictions

As we can see from the above results, the Test R2 score is 0.879 and the Adjusted R2 score is also 0.0.879 which is lower than ANN’s R2 and adjusted R2 score. Test RMSE of 0.102 and Test MAE of 0.075 are also relatively more than ANN’s RMSE and MAE. The above result of LSTM led to the conclusion that LSTM performance is lower than shallow ANN network for this particular case.

Prediction plot of LSTM model
Figure LSTM prediction plot

As we can see from above plot, LSTM performs well for initial observations but for later part it performs worse than ANN.

Further, you can try some other combinations of hyper-parameters of LSTM to see if its performance will improve or not.

So, next we will build GRU model to see if its performance will be bettern than LSTM and ANN or not.

GRU

Gated Recurrent Unit (GRU) and LSTM both have the characterstics to retain the long term dependency of past sequence of events. The major difference between the two is LSTMs control the exposure of memory content (cell state) while GRUs expose the entire cell state to other units in the network.

The LSTM unit has a separate update and forget gates, while the GRU performs both of these operations together via its reset gate.

The architecture of GRU is shown in below figure.

Basic structure of GRU
Image source

The code for building GRU model using keras framework is shown below.

GRU model parameters

As we can see from above figure, we have build a light weight GRU model having only 7 neurons in hidden layer due to which total model parameters are only 197.

The code for compiling and fitting the model is shown below.

Epoch summary of GRU model

As we can observe from above plot, the model stops training after 44 epochs due to early stopping.

Result of GRU

The result after fitting the GRU model is shown below.

Performance MetricsScore
Train R2 score0.991
Train Adjusted R2 score0.991
Train RMSE0.092
Train MAE0.084
Test R2 score0.967
Test Adjusted R2 score0.967
Test RMSE0.054
Test MAE0.042

As we can see from the above results, the Test R2 score is 0.967 and the Adjusted R2 score is also 0.967 which is better than ANN and LSTM. Test RMSE of 0.054 and Test MAE of 0.042 are also relatively too lesser than ANN and LSTM.

The above result of GRU led to the conclusion that GRU performance is way better than the shallow ANN network and LSTM network for prediction of Forex rate.

Prediction Plot of GRU
Figure GRU prediction plot

From the above plot, we can observe that GRU prediction is very close to actual values whereas after 400 observations its deviation from actual price is way too lower in comparison to ANN and LSTM.

So from the above results, it is clear that our final model for foreign exchange rate prediction is GRU. We further do the inverse transform of predicted and actual values as we had transformed the actual values during normalization using standard scaler.

7. Final Assessment of Model

In this step, after inverse transforming the predicted and actual values, we finally create the pandas data frame comprising Date, Actual Price, Predicted Price, and RMSE as shown in the table below. The reason for creating this data frame is to assess the summary statistics of predicted and actual values.

Final data prediction table
Table. Final table comprising Date, actual price in test set, prediction price and RMSE

The summary statistics of the above dataframe is shown below.

Summary statistics of final table
Table showing summary statistics of final prediction table

As we can see from the above stats table, the average RMSE is 0.195 which is quite lower for this kind of dataset. Further, if we compare average, 50%, and max prices of Actual Price with GRU_prediction price, we can observe a slight difference between the two which prove that the GRU model performs literally well for the prediction of Forex prices. As the model has predicted prices that are very close to actual prices, this model can be used to detect the trend of prices which can be utilized for forecasting whether to buy or sell the currency at a particular time in the future.

Finally we have plotted the prediction result of final dataframe with dates on x-axis.

Final prediction plot of GRU with date

8. Conclusion

In this article, we have explored the Forex dataset of USD/ INR currency exchange rate for the time period of 10 years. After necessary pre-processing, we have applied the ANN model as our base model, further, we have applied LSTM and GRU.

Finally, after comparing the results of these three models, we found that GRU outperformed ANN and LSTM for this task and shown commendable results

Further, there is much scope left for further improvement like trying different combinations of hyperparameters in LSTM and GRU for improving the performance, you can apply the Bi-directional LSTM model and check whether it can outperform GRU and lastly, you can also apply vanilla RNN and compare its performance with other variants for prediction of the foreign exchange rate.

Lastly, as we know foreign exchange rate depends on different factors like Balance of Payment of the country, Government debt, inflation, interest rates, current economic condition like recession, depression or boom, current political condition, etc. So we can use these factors in addition to technical indicators which can be fed into the input of the model for prediction of the foreign exchange rate in order to make more realistic forecasting.

“Deep Learning is the quality learning that ‘sticks’ with you for the rest of your life…”

Michael Fullan, Joanne Quinn & Joanne McEachen
+2

About the author

Manu Siddhartha

Hi!! I am Siddhartha, an aspiring blogger with an obsession to share my knowledge in Machine Learning & Data science domain. This blog is dedicated to demonstrate application of machine learning in different domains with real-time case studies.

View all posts

6 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *