Prediction of Forex Rate (USD/INR) Using LSTM & GRU
The foreign exchange rate (Forex) market is the largest and most crucial trading market in the world followed by the credit market. The foreign exchange rate market determines the exchange rate of different currencies of the world. It involves buying, selling, and exchanging currencies at current or determined prices.
As we can see from the above figure, the average daily trading volume of the Forex market is way too higher than other big stock exchanges in the world.
Some of the nice quotes on trading are –
“Trading effectively is about assessing probabilities, not certainties”.Yvan Byeajee, Paradigm Shift: How to cultivate equanimity in the face of market uncertainty
“The stock market is a device for transferring money from the impatient to the patient”.Warren Buffet
The ability to predict the foreign exchange rate is a valuable skill in the trading business. But the prediction of the foreign exchange rate is a highly complex time series problem. As forex prices depend on many external, political factors, its prediction is a very challenging task.
“It’s not whether you’re right or wrong that’s important, but how much money you make when you’re right and how much you lose when you’re wrong”.Stanley Druckenmiller
Deep learning models proven to be very efficient in the prediction of complex financial analytics problems.
In the case of time series problems, Recurrent Neural Networks (RNNs) proven to outperform traditional Machine Learning algorithms and Artificial Neural Networks (ANNs).
RNNs are the networks with loops in their architecture as shown in the below figure. This feature of RNN helps them to retain information from previous sequences or events which makes them favorable for time series problems.
In this article, we will demonstrate the application of two different variants of RNNs i.e., Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) for prediction of the foreign exchange rate (i.e., USD/INR).
Firstly, we will build Artificial Neural Networks as a base model, and later, we will improve the performance in prediction by applying LSTM and GRU. The full code with the dataset demonstrated in this article is available at this repo.
The article is divided in the following order:
- About Dataset
- Data pre-processing
- Summary Statistics
- Train Test Split
- Data Normalization
- Result of ANN
- Result of LSTM
- Result of GRU
- Final Assessment of Model
1. About Dataset
The data is the heart of any machine learning or deep learning project. in this case study, we have web scraped the Foreign exchange rates of USD/INR for the time period of 26 Aug 2010 to 26 Aug 2020 i.e., 10 years from the website in.investing.com.
The sample entries of the dataset are shown in below table.
2. Data pre-processing
For the time-series problem, we have to convert our first column i.e., date to index.
For doing this there are two methods, in one method while reading CSV file in pandas you can specify index_col=’Date’ and parse_date =True. By specifying these two parameters, pandas preprocess your DateTime column i.e., Date during import, and set it to the index of the data frame.
The second method is related to post-reading means after reading the data you can first convert the date column to DateTime and then set it to the index of data frame. The code snippet demonstrating this method is shown below.
After doing the above pre-processing, the dataset looks like below as shown in table.
As per the above table, our target variable is Price which we have to predict based on Date.
Sorting the dataframe
Next, we will sort the pandas dataframe with respect to the Date column in ascending order. The code snippet for sorting the dataframe is shown below.
As our target variable is Price, so we will select Price column and date as index while discard all other columns present in the dataset.
The below figure shows the distribution of Price from August 2010 to August 2020. As per the figure, we can observe three spikes in the dataset first one is the major spike in the year 2014, the second spike observed in early 2019 while the third one is in the late 2020s.
3. Summary Statistics
Next, we will explore the summary statistics of the data. As per the below table, we can observe that the average price in the dataset is 61.9025 while 50% of the price is around 64 and the maximum price in the dataset is 76.97.
Further, we will check the distribution of data in percentage. As we can see from below figure, the total data comprise of 10 years and 80% of it lies in the interval between 2010 to 2018.
4. Train Test Split
As we have seen 80% of the data lies in the interval of 2010 and 2018 so we will train the model for the date range of 26 August 2010 to 26 August 2018 and rest will be used for testing purposes.
The code snippet for splitting the dataset and plotting the train test data frame is shown below. As per the code, we have first selected the split date as 26-08-2020 based on which segregate training and test set.
Next, we have created the training set by selecting data points up to 26-08-2020 and test set by retaining all the data points after split date 26-08-2020.
The below plot showed the visualization of train test split of dataset.
5. Data Normalization
In this step, we will normalize the dataset using a standard scaler normalization method. Standard Scaler normalization involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1.
The code for normalizing the dataset is shown below.
Next, we will segregate the train and test set into X_train, X_test, y_train, and y_test.
In this step, we will build deep learning model and fit them into dataset for making the prediction. In this case study, we will first build the Artificial Neural Network (ANN) as a base model and evaluate its performance for making its result as a benchmark for LSTM and GRU models.
Artificial Neural Network (ANN)
Artificial neural networks (ANNs) are biologically inspired computational networks. ANNs are generally used in supervised learning problems in which we know the target labels of the data. It consists of mainly three layers: input, hidden, and output layer.
In our case, we have a single input, 12 neurons in hidden layers, and an output layer which will give the output result in terms of predicted price. We have used shallow neural network with one hidden layer for our use-case and this will be our base model.
The architecture of Artificial Neural Network is outlined in the below figure.
The code snippet of ANN model is shown below.
As we can see, we have 37 total parameters in our ANN model. So it is quite a lightweight model which helps in lowering down the risk of overfitting
The code for compiling and fitting the model is shared in the below code snippet.
Now let’s talk about some of the hyper-parameters we have used in our ANN model.
- loss = mean_squared_error: As it is a regression problem, we have used mean_squared_error as our loss function. The mean squared error (MSE) of an estimator measures the average of the squares of the errors i.e., the average squared difference between the predicted values by the model and actual values. The mathematical formulae for mean squared error are shown below
2. metric = rmse: For metric we have used Root Mean Square Error (RMSE) which is the square root of mean squared error. It is basically the standard deviation of the residuals (i.e., difference of predicted and actual value). Residuals are a measure of how far from the regression line data points are.
3. Early stopping: Early stopping used to stop the training process of the model once the model stops showing improvement in terms of loss. Its monitor parameter used to specify which hyper-parameter we want to monitor while training the model. in our case we are monitoring loss.
As we can see from the above simulation, the model stopped training after 13 epochs as it stops showing improvement further.
Result of ANN
After fitting the model, we have perform model evaluation based on R2 score, RMSE and Mean Absolute error.
|Train R2 score||0.974|
|Train Adjusted R2 score||0.974|
|Test R2 score||0.943|
|Test Adjusted R2 score||0.942|
As we can see from the above results, the Test R2 score is 0.943 and the Adjusted R2 score is 0.942 which is quite good as in regression problem R2 and Adjusted R2 are two important performance metrics. The more the value closer to 1, the better is the model. Test RMSE of 0.071 and Test MAE of 0.054 are also relatively smaller errors which determines that the model is fitted decent regression line over the actual price.
As per the above prediction plot of ANN, it is observed that the model predicts closely the actual prices in the initial observations, but after 400 observations, its predicted values become too far from the actual values.
Next we will move towards more complex models in time series i.e., LSTM and see how it will perform.
Long Term Short Memory (LSTM) networks are another variant of RNNs. They are capable of retaining long term dependencies among the sequence of events or data points. They were first introduced by Hochreiter & Schmidhuber (1997).
The major difference between RNN and LSTM is that an LSTM has an explicit memory unit which stores information relevant for learning some task. In LSTMs, the memory units retain pieces of information even when the sequences get really long.
The structure of LSTM cell is shown below:
Each LSTM cell consists of three gates Forget gate, Update gate and Output gate.
Forget Gate: This gate controls how much information needs to be discarded from the previous cell state depending on the new input.
Update Gate: This gate makes an update to the previous the cell state by writing a new piece of information to it.
Output Gate: This gate controls how much information needs to be passed on to the next LSTM layer based on the current cell state.
The structure of an LSTM cell allows an LSTM network to have a smooth and uninterrupted flow of gradients while backpropagating. This flow is also called the constant error carousel. Due to this characteristic LSTMs are able to solve the problem of vanishing and exploding gradients.
Data Pre-processing for LSTM
Before building the LSTM model, we have to pre-process the training and test set as shown in the below code snippet.
The code for building LSTM model using keras framework is shown below.
As we can see from the above code, we have used 50 neurons in the LSTM cell with activation function relu, kernel initializer lecun_uniform, and return sequence False.
For understanding the concept of kernel initializer you can refer a good discussion on datasciencestackexchange.
The code for compiling and fitting the LSTM model is shown below.
As we can see from above plot, the training of LSTM model stopped after 40 epochs due to early stopping.
Result of LSTM
The result after fitting the LSTM model is shown below.
|Train Adjusted R2||0.977|
|Test Adjusted R2||0.879|
As we can see from the above results, the Test R2 score is 0.879 and the Adjusted R2 score is also 0.0.879 which is lower than ANN’s R2 and adjusted R2 score. Test RMSE of 0.102 and Test MAE of 0.075 are also relatively more than ANN’s RMSE and MAE. The above result of LSTM led to the conclusion that LSTM performance is lower than shallow ANN network for this particular case.
As we can see from above plot, LSTM performs well for initial observations but for later part it performs worse than ANN.
Further, you can try some other combinations of hyper-parameters of LSTM to see if its performance will improve or not.
So, next we will build GRU model to see if its performance will be bettern than LSTM and ANN or not.
Gated Recurrent Unit (GRU) and LSTM both have the characterstics to retain the long term dependency of past sequence of events. The major difference between the two is LSTMs control the exposure of memory content (cell state) while GRUs expose the entire cell state to other units in the network.
The LSTM unit has a separate update and forget gates, while the GRU performs both of these operations together via its reset gate.
The architecture of GRU is shown in below figure.
The code for building GRU model using keras framework is shown below.
As we can see from above figure, we have build a light weight GRU model having only 7 neurons in hidden layer due to which total model parameters are only 197.
The code for compiling and fitting the model is shown below.
As we can observe from above plot, the model stops training after 44 epochs due to early stopping.
Result of GRU
The result after fitting the GRU model is shown below.
|Train R2 score||0.991|
|Train Adjusted R2 score||0.991|
|Test R2 score||0.967|
|Test Adjusted R2 score||0.967|
As we can see from the above results, the Test R2 score is 0.967 and the Adjusted R2 score is also 0.967 which is better than ANN and LSTM. Test RMSE of 0.054 and Test MAE of 0.042 are also relatively too lesser than ANN and LSTM.
The above result of GRU led to the conclusion that GRU performance is way better than the shallow ANN network and LSTM network for prediction of Forex rate.
From the above plot, we can observe that GRU prediction is very close to actual values whereas after 400 observations its deviation from actual price is way too lower in comparison to ANN and LSTM.
So from the above results, it is clear that our final model for foreign exchange rate prediction is GRU. We further do the inverse transform of predicted and actual values as we had transformed the actual values during normalization using standard scaler.
7. Final Assessment of Model
In this step, after inverse transforming the predicted and actual values, we finally create the pandas data frame comprising Date, Actual Price, Predicted Price, and RMSE as shown in the table below. The reason for creating this data frame is to assess the summary statistics of predicted and actual values.
The summary statistics of the above dataframe is shown below.
As we can see from the above stats table, the average RMSE is 0.195 which is quite lower for this kind of dataset. Further, if we compare average, 50%, and max prices of Actual Price with GRU_prediction price, we can observe a slight difference between the two which prove that the GRU model performs literally well for the prediction of Forex prices. As the model has predicted prices that are very close to actual prices, this model can be used to detect the trend of prices which can be utilized for forecasting whether to buy or sell the currency at a particular time in the future.
Finally we have plotted the prediction result of final dataframe with dates on x-axis.
In this article, we have explored the Forex dataset of USD/ INR currency exchange rate for the time period of 10 years. After necessary pre-processing, we have applied the ANN model as our base model, further, we have applied LSTM and GRU.
Finally, after comparing the results of these three models, we found that GRU outperformed ANN and LSTM for this task and shown commendable results
Further, there is much scope left for further improvement like trying different combinations of hyperparameters in LSTM and GRU for improving the performance, you can apply the Bi-directional LSTM model and check whether it can outperform GRU and lastly, you can also apply vanilla RNN and compare its performance with other variants for prediction of the foreign exchange rate.
Lastly, as we know foreign exchange rate depends on different factors like Balance of Payment of the country, Government debt, inflation, interest rates, current economic condition like recession, depression or boom, current political condition, etc. So we can use these factors in addition to technical indicators which can be fed into the input of the model for prediction of the foreign exchange rate in order to make more realistic forecasting.
“Deep Learning is the quality learning that ‘sticks’ with you for the rest of your life…”Michael Fullan, Joanne Quinn & Joanne McEachen