The Fundamentals Of Decision Trees

September 28, 2021March 3, 2023 Pooja Ranavat

In the age of big data analytics, it is difficult to think of a company that does nor rely on decision sciences. Decision sciences have played a pivotal role in helping various types of organisations to take suitable decisions in accordance with the pulse of the market. Ranging from prospective investment opportunities to the determination of growth cycles, decision sciences have a key role to play. One of the most important algorithms that we use in decision sciences, big data analytics as well as machine learning is the decision tree. In the present time, decision trees are becoming extremely popular in predictive analytics. They are also an important part of big data courses in Bangalore and the rest of the country. In this article, we take a close look at the fundamentals of decision trees in detail.

Table of Contents

Decision trees

A decision tree specifies a sequence of events that are to be followed for predicting the course of a certain event. By specifying the values of input variables, we determine the result of an event with the help of an output variable. Decision tree comprises several nodes and branches which determine the sequence of decisions taken at each step. At the end node where the branches end, we say that we have reached a leaf node. The main function of leaf nodes is to return a probability score so that it can later be transformed into some decision rules. Decision trees can broadly be classified into two types. The first type is called the classification tree and the second type is called the regression tree.

The first type of decision tree, the classification tree usually takes two variables of the nature of yes and no. These trees are usually used for determining the opposite and bilateral decisions like investment in a particular company or disinvestment in a particular company. On the other hand, regression trees are used in such cases where the output variables take continuous values. For instance, regression trees give us the likelihood that a particular item would be purchased by a customer. However, the applications of decision trees are much more broad and holistic in nature.

Terminology

The important terms that we associate with a decision tree are entropy and information gain. Entropy is usually used to determine the extent of the levels of disorder in a particular sample. Specifically speaking, whenever we need to determine the impurity of a particular sample, entropy is one of the best measures to use. Similarly, whenever we need to measure the extent of purity in a particular attribute, we use information gain in this regard. Whenever we are given a parent node and child node, we may like to determine the levels of purity before and after the split. Information gain is one of the best parameters to determine the levels of purity in such a case.

Concluding remarks

Decision trees have a lot of applications in the age of big data analytics. Such applications are slated to increase as the volumes of data increase and decision-making becomes a complex exercise.