How To Use Decision Tree Algorithm C5.0 In Day Trading?

In this post we are going to discuss how we are going to predict the pip movement in the next 5 minutes using the machine learning decision tree C5.0 algorithm. Did you read the previous post on developing an artificial intelligence machine learning forex indicator? In that post we discussed in detail how we can use artificial intelligence machine learning in developing a forex indicator which learns from experience and keeps on modifying its behavior. Algorithmic trading has become very popular now a days. There are many hedge funds who employ algorithmic trading strategies. There are people who specialize in algorithmic and quantitative trading and are popularly called Quants.

Do you want to become a Quant?

If you want to become a Quant you should have an advanced degree in maths. But there are many people like us who have learn how to code and how to do machine learning on their own just by reading books on the subjects of machine learning with python and R. Read this post in which we explain why you need to learn Python, R and machine learning if you want to master algorithmic trading. Python and R are 2 very powerful machine learning and data analysis languages that are being used by traders now. Developing algorithmic trading strategies and back testing them using these languages gives you the edge when it comes in developing winning automated trading strategies. You can watch this documentary on the impact of automated trading systems on the markets.

What Are Decision Trees?

Decision trees are frequently used in machine learning and data mining. C5.0 algorithm is one algorithm that is used to automatically build a decision tree using the concept of entropy. There are many other algorithms also that can be used to build the decision trees until it finds a tree with minimum entropy. C5.0 algorithm keeps on building decision trees until it finds a tree with the minimum entropy. Entropy is measured in bits. This is a concept that has been borrowed from information theory. There are some problems when you use decision trees. The most important problem that you can face is the problem of noise. A decision tree is highly susceptible to noise in the data. We need to be careful and try not to overfit the data to a decision tree. Overfitting makes the problem of noise more serious. So we will avoid overfitting the data when we build the decision tree. Entropy is a concept that might be new for you. You can watch this video below which explains this entropy concept.

This is the second part of this video series. Watch this video also if you want in which the professor explains decision trees in a little more detail. Manually building a decision tree can be a cumbersome process and can take a lot of time. But when we use the C5.0 algorithm to build the decision tree we will get the results and predictions in less than 1 second. This is what we want. Fast results if we want to use the prediction in live trading.

You don’t have to know the mathematically details of how this decision tree algorithm C5.0 splits the data according to its different attributes in such a manner that entropy decreases until it minimizes it.  Let’s try to download M1 data and then try to predict the pip movement in the next 5 minutes by building a decision tree. We divide the data and make 7 categorical variables -3, -2, -1, 1, 2, 3 and 4. 1 means price will move less than 2 pips in up direction. -1 means price will move more than 2 pips and less than 6 pips in the down direction. 2 means price will move greater than 2 pips and less than 6 pips. 3 means price will move greater than 6 pips and less than 12 pips. 4 means price will move more than 12 pips. In the same manner, -2 means price will move more than 6 pips and less than 12 pips in the down direction. -3 means price will move more than 12 pips in the down direction.

Building A C5.0 Decision Tree Using R

Now we will build a decision tree using C5.0 algorithm and try to predict the pips movement in the next 5 minutes. you should R and RStudio installed on your computer. You should be familiar with R language if you want to understand the code below. Download the EURUSD1.csv file from MT4 and read it into R. Below is the R code that will implement our algorithmic trading strategy. If we are successful in predicting the pip movement correctly we will use this algorithm and build a binary options trading indicator based on it. You can download this R Autoregression Indicator FREE.

> ###Decision Tree C5.0###
> #import the data
> 
> data <- read.csv("E:/MarketData/EURUSD1.csv", header = FALSE)
> 
> 
> 
> colnames(data) <- c("Date", "Time", "Open", "High",
+                     "Low", "Close", "Volume")
> 
> 
> x1 <- nrow(data)
> 
> 
> #convert this data to n timeframe
> 
> n=5
> 
> #define lookback
> 
> lb=300
> 
> #define the minimum pips
> 
> pip <- 2
> 
> #define a new data frame
> 
> data1 <-data.frame(matrix(0, ncol=6, nrow=300))
> 
> colnames(data1) <- c("Date", "Time", "Open", "High",
+                      "Low", "Close")
> 
> # run the sequence to convert to a new timeframe
> 
> for ( k in (1:lb))
+ {
+   data1[k,1] <- as.character(data[x1-lb*n+n*k-1,1])
+   data1[k,2] <- as.character(data[x1-lb*n+n*k-1,2])
+   data1[k,3] <- data[x1-lb*n+n*(k-1),3]
+   data1[k,6] <- data[x1-lb*n+n*k-1,6]
+   data1[k,4] <- max(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5])
+   data1[k,5] <- min(data[(x1-lb*n+n*(k-1)):(x1-lb*n+k*n-1), 4:5])
+ }
> 
> 
> library(quantmod)
Loading required package: xts
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
> 
> data2 <- as.xts(data1[,-(1:2)], as.POSIXct(paste(data1[,1],data1[,2]),
+                                            format='%Y.%m.%d %H:%M'))
> 
> 
> 
> data2$rsi <- RSI(data2$Close)
> data2$MACD <- MACD(data2$Close)
> data2$will <- williamsAD(data2[,2:4])
> data2$cci <-  CCI(data2[,2:4])
> data2$STOCH <- stoch(data2[,2:4])
> data2$Aroon <- aroon(data2[, 2:3])
> data2$ATR <- ATR(data2[, 2:4])
> 
> 
> data2$Return <- diff(log(data2$Close))
> 
> 
> 
> candleChart(data2[,1:4],
+             theme='white', type='candles', subset='last 2 hour')
> 
> 
> 
> for (i in (1:(lb-1)))
+ {
+   
+   data2[i,20] <- data2[i+1,20] 
+   
+   
+ }
> 
> data3 <- as.data.frame(data2)
> 
> #convert return into factors
> 
> rr1 <- (pip/10000)/data1[lb, 6] 
> rr2 <- 3*(pip/10000)/data1[lb, 6]
> rr3 <- 6*(pip/10000)/data1[lb, 6]
> 
> # convert the returns into factors
> 
> nn <- ncol(data3)
> 
> data3$Direction <- as.factor(ifelse(data3[ ,nn] > rr3, 4,
+                                     ifelse(data3[ ,nn] > rr2, 3, 
+                                            ifelse(data3[ ,nn] > rr1, 2,
+                                                   ifelse(data3[ ,nn] > 0, 1,
+                                            ifelse(data3[ ,nn] > -rr1, -1,
+                                      ifelse(data3[ ,nn] > -rr2, -2, 
+                               ifelse(data3[ ,nn] > -rr3,  -3, -3))))))))
> 
> 
> str(data3)
'data.frame':	300 obs. of  21 variables:
 $ Open     : num  1.11 1.11 1.11 1.11 1.11 ...
 $ High     : num  1.11 1.11 1.11 1.11 1.11 ...
 $ Low      : num  1.11 1.11 1.11 1.11 1.11 ...
 $ Close    : num  1.11 1.11 1.11 1.11 1.11 ...
 $ rsi      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ macd     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ MACD     : num  NA NA NA NA NA NA NA NA NA NA ...
 $ will     : num  NA 0.00035 0.00028 0.00049 0.00069 ...
 $ cci      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ fastK    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ fastD    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ STOCH    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ aroonUp  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ aroonDn  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ Aroon    : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tr       : num  NA 0.00036 0.00012 0.00026 0.00023 ...
 $ atr      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ trueHigh : num  NA 1.11 1.11 1.11 1.11 ...
 $ ATR      : num  NA 1.11 1.11 1.11 1.11 ...
 $ Return   : num  3.15e-04 -3.60e-05 9.00e-05 1.71e-04 9.89e-05 ...
 $ Direction: Factor w/ 6 levels "-3","-2","-1",..: 5 3 4 4 4 3 4 3 3 3 ...
> 
> data3 <- data3[,-nn]
> 
> #load the C5.0 library
> 
> 
> library(C50)
> 
> 
> 
> # train a support vector machine
> 
> fit <- C5.0( Direction ~., data = data3[100:(lb-1) ,1:(nn)] )
> 
> #assess variable importance
> C5imp(fit)
         Overall
will       100.0
tr          93.0
MACD        91.0
aroonUp     49.0
High        48.5
atr         37.0
rsi         32.0
aroonDn     29.0
trueHigh    26.5
macd        25.0
STOCH       25.0
fastK       21.0
fastD       20.0
Aroon       19.5
Low         11.5
Open         7.0
Close        2.0
cci          0.0
ATR          0.0
> 
> ## predict the next candle size
> pred <-predict (fit , newdata =data3[lb, 1:(nn-1)], type ="class")
> pred
[1] -1
Levels: -3 -2 -1 1 2 3
> 
> ## predict the probability of each class
> pred <-predict(fit , newdata =data3[lb, 1:(nn-1)], type ="prob")
> pred
                    -3        -2        -1          1          2           3
2016-11-04 09:14:00  0 0.1635714 0.7685714 0.04214286 0.02428571 0.001428571
> 
> data1[lb,2]
[1] "09:14"
> data1[lb,6]
[1] 1.11045

In the above code, R first rearranged the 1 minute data into 5 minutes by constructing the following candlestick chart.

candlestick Chart

.
It then build the decision trees using C5.0 algorithm until it achieved minimum entropy and then it predicted that there is a probability of 76% that price will move in down direction but it will be less than 2 pips. Read this post in which we show you how to code a MACD Signal Line Trailing Stop loss MQL4 EA. Below is the summary that shows the details about the decision tree model. In the summary you can see the first few branches of the decision tree. The number in parentheses shows the data that meets the criteria.

> summary(fit)

Call:
C5.0.formula(formula = Direction ~ ., data = data3[100:(lb - 1), 1:(nn)])


C5.0 [Release 2.07 GPL Edition]  	Fri Nov 04 15:44:42 2016
-------------------------------

Class specified by attribute `outcome'

Read 200 cases (20 attributes) from undefined.data

Decision tree:

will <= -0.00123:
:...aroonUp <= 95: 2 (10)
:   aroonUp > 95:
:   :...tr <= 0.00037: -1 (3/1)
:       tr > 0.00037: 1 (5)
will > -0.00123:
:...MACD > 0.003086833:
    :...atr <= 0.000238554:
    :   :...Open <= 1.10999: -1 (2/1)
    :   :   Open > 1.10999: -2 (2)
    :   atr > 0.000238554:
    :   :...tr > 0.00063:
    :       :...aroonDn <= 5: -1 (3)
    :       :   aroonDn > 5: 1 (2/1)
    :       tr <= 0.00063:
    :       :...rsi > 71.71148: -2 (4)
    :           rsi <= 71.71148:
    :           :...High <= 1.11091:
    :               :...STOCH <= 0.7129818:
    :               :   :...fastD <= 0.4893098:
    :               :   :   :...STOCH <= 0.4705882: 2 (7/2)
    :               :   :   :   STOCH > 0.4705882: 1 (3)
    :               :   :   fastD > 0.4893098:
    :               :   :   :...Open <= 1.11048: 2 (8)
    :               :   :       Open > 1.11048: -2 (2)
    :               :   STOCH > 0.7129818:
    :               :   :...STOCH <= 0.8551427: -1 (6/1)
    :               :       STOCH > 0.8551427:
    :               :       :...rsi <= 68.0887: 1 (4)
    :               :           rsi > 68.0887:
    :               :           :...Close <= 1.10906: -1 (2)
    :               :               Close > 1.10906: 2 (2)
    :               High > 1.11091:
    :               :...tr > 0.00051:
    :                   :...macd <= 0.0344471: 1 (3)
    :                   :   macd > 0.0344471: -1 (2/1)
    :                   tr <= 0.00051:
    :                   :...aroonDn > 65: -1 (3)
    :                       aroonDn <= 65:
    :                       :...aroonDn > 0: -2 (9/1)
    :                           aroonDn <= 0:
    :                           :...tr <= 0.00022: -2 (2/1)
    :                               tr > 0.00022: -1 (3)
    MACD <= 0.003086833:
    :...tr > 0.00024:
        :...macd > -0.003408632: -1 (10/1)
        :   macd <= -0.003408632:
        :   :...aroonDn <= 65: 1 (3)
        :       aroonDn > 65:
        :       :...fastD > 0.4549763: 2 (4/1)
        :           fastD <= 0.4549763:
        :           :...STOCH > 0.07493839: -1 (11/3)
        :               STOCH <= 0.07493839:
        :               :...atr <= 0.0002172921: 2 (2)
        :                   atr > 0.0002172921: 1 (3/1)
        tr <= 0.00024:
        :...aroonUp <= 10:
            :...tr > 0.00016: 1 (10/1)
            :   tr <= 0.00016:
            :   :...Aroon <= -60: -1 (4)
            :       Aroon > -60:
            :       :...aroonDn > 45: 1 (5)
            :           aroonDn <= 45:
            :           :...fastK <= 0.212766: 1 (2)
            :               fastK > 0.212766: -1 (6)
            aroonUp > 10:
            :...trueHigh > 1.11063:
                :...macd <= -0.003419162: 1 (3/1)
                :   macd > -0.003419162: -1 (9/1)
                trueHigh <= 1.11063:
                :...High > 1.11022:
                    :...Aroon > -10: 1 (3)
                    :   Aroon <= -10:
                    :   :...rsi <= 43.17558: 1 (2)
                    :       rsi > 43.17558: -2 (2)
                    High <= 1.11022:
                    :...fastK <= 0.1308411: -2 (4/1)
                        fastK > 0.1308411:
                        :...MACD > -0.001937495:
                            :...Aroon <= 45: 1 (5/1)
                            :   Aroon > 45: -2 (2/1)
                            MACD <= -0.001937495:
                            :...Low > 1.10943: -1 (15/1)
                                Low <= 1.10943:
                                :...Aroon > -25: 1 (4)
                                    Aroon <= -25:
                                    :...Low <= 1.1092: 1 (2)
                                        Low > 1.1092: -1 (2)


Evaluation on training data (200 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	    45   22(11.0%)   <<


	   (a)   (b)   (c)   (d)   (e)   (f)    <-classified as
	  ----  ----  ----  ----  ----  ----
	                                        (a): class -3
	          23     3     2     1          (b): class -2
	           2    71     2     1          (c): class -1
	           1     3    54     1          (d): class 1
	           1     3          30          (e): class 2
	                 1     1                (f): class 3


	Attribute usage:

	100.00%	will
	 93.00%	tr
	 91.00%	MACD
	 49.00%	aroonUp
	 48.50%	High
	 37.00%	atr
	 32.00%	rsi
	 29.00%	aroonDn
	 26.50%	trueHigh
	 25.00%	macd
	 25.00%	STOCH
	 21.00%	fastK
	 20.00%	fastD
	 19.50%	Aroon
	 11.50%	Low
	  7.00%	Open
	  2.00%	Close


Time: 0.0 secs

Now this was the decision tree model that we made. It predicted that price will go down but it will be less than 2 pips. In reality price went up 1 pips instead of going down. So we need to further improve upon this mistake. We can further improve our decision tree model by either using boosting or by using a cost function that penalizes the model for making a wrong decision. More on this in later posts. But we think this prediction was not bad. Price did move 1 pips  but instead of 1 pip down, it moved 1 pip up. We don’t have a big error here and we think we can further improve the performance of our Decision Tree Model.

The whole calculations were made within 1 second which means we can easily use this algorithm in trading 5 minute binary options. We just need to connect this algorithm with MT4 so that data gets transferred between MT4 and R seamlessly in milliseconds and then we get our results in less than  1 second. Did you read the post in which we showed how to connect R with MT4 for machine learning? Read the post in which we show you how to connect R with MT4. The problem with MT4 is that it lacks any machine learning library. MQL4 language lacks many powerful machine learning and artificial intelligence libraries. But by connecting R with MT4 we can overcome this problem and do machine learning right within MT4.