Introducing a customizable and interactable Decision Tree-Framework written in Python
Fast Track
- Repository:Ā https://github.com/Mereep/HDTree
- Complementary Notebook: insideĀ
Ā directory of the repository or directlyĀ here (every illustration you see here will be generated in the notebook. You will be able to create them on your own)examples
Whatās inside the story?
This story will introduce yet another implementation of Decision Trees, which I wrote as part of my thesis. The work will be divided into three chapters as follows:Ā
Firstly, I will try to motivate why I have decided to take my time to come up with an own implementation of Decision Trees; I will list some of itsĀ featuresĀ but also will listĀ the disadvantagesĀ of the current implementation.Ā
Secondly, I will guide you through the basic usage of HDTreeĀ using code snippets and explaining some details along the way.
Lastly, there will be some hints on how to customize and extend the HDTree with your own chunks of ideas.
However, this article willĀ notĀ guide you through all of theĀ basicsĀ of Decision Trees. There are really plenty of resources out there [1][2][3][16]. I think there is no need in repeating all of that again. Others have done that. I will not be able to do it better. You donāt need to be an expert in Decision Trees to understand this article. A basic level of understanding should be sufficient to follow up. Some experience in the ML domain is a plus, though.
Motivation & Background
For my work I came along working with Decision Trees. My actual goal is to implement an human-centric ML-model, whereĀ HDTreeĀ (Human Decision Tree for that matter) is an optional ingredient which is used as part of an actual user interface for that model. While this story solely focuses on HDTree, I might write a follow-up describing the other components in detail.
Features of HDTree & Comparison with scikit learn Decision Trees
Naturally, I stumbled upon theĀ scikit-learn-implementationā“ of decision trees. I guess many practitioners do. And lets make something clear from the beginning: nothing is wrong with it.
TheĀ sckit-learn implementationĀ has a lot ofĀ pros:
- itās fast & optimized
- The implementation is written in a dialect called Cythonāµ. Cython compiles to C-Code (which in turn compiles to native code) while maintaining interoperability with the Python interpreter.
- itās easy and to use and convinient
- Many people in the ML-domain know how to work with scikit-learn models. You will easily find help everywhere due to its user base.
- itās battle tested (a lot of people are using it)
- It just works
- it supports many pre-pruning and post-pruning [6] methods and provides many features (e.g., Minimal Cost-Complexity PruningĀ³ or sample weights)
- it support basic visualization [7]
That said, surely it also has someĀ shortcomings:
- tās not trivial to modify, partly due to the usage of the rather uncommon Cython dialect (see advantages above)
- no way to incorporate user knowledge about the domain or to modify the learning process
- the visualization is rather minimalistic
- no support for categorical attributes / features
- no support for missing values
- interface for accessing nodes and traversing the tree is cumbersome an not intuitive
- no support for missing values
- only binary splits (see later)
- no multivariate splits (see later)
Features HDTree
HDTree comes with a solution to most of the shortcomings mentioned in the above list, while sacrificing many of the advantages of the scikit-learn implementation. We will come back to those points later, so donāt worry if you do not understand every part of the following list yet:
š interact with the learning-behavior
š core components are modular and fairly easy to extend (implement an interface)
š purely written in Python (more approachable)
š rich visualization
š support categorical data
š support for missing values
š support for multivariate splits
š easy interface to navigate through the tree structure
š supports for n-ary splits (> 2 child nodes)
š textual representations of decision paths
š encourages explainability by printing human-readable text
-------------------------------------------------------------------------------------------------
š slow
š not battle-tested (itĀ willĀ have bugs)
š mediocre software quality
š not so many pruning options (it supports some basic options, though)
š core components are modular and fairly easy to extend (implement an interface)
š purely written in Python (more approachable)
š rich visualization
š support categorical data
š support for missing values
š support for multivariate splits
š easy interface to navigate through the tree structure
š supports for n-ary splits (> 2 child nodes)
š textual representations of decision paths
š encourages explainability by printing human-readable text
-------------------------------------------------------------------------------------------------
š slow
š not battle-tested (itĀ willĀ have bugs)
š mediocre software quality
š not so many pruning options (it supports some basic options, though)
ā ļø Although the disadvantages seem to be not too numerous, they are critical. Let us make that clear right away: Do not throw big data at it. You will wait forever. Do not use it in production. It may break unexpectedly. You have been warned!ā ļø
Some of these problems may get fixed over time. However, the training speed probably will remain slow (inference is okay, though). You will have to come up with a better solution to fix that. You are very welcome to contribute š.
That said, what would be possibleĀ use cases?
- extract knowledge from your data
- test the intuition you have about your data
- understand the inner workings of decision trees
- explore alternative causal relationships regarding to your learning problem
- use it as part of your more complex algorithms
- create reports and visualizations
- use it for any research-related purposes
- have an accessible platform to easily test your idea idea for decision tree algorithms
Structure of Decision Trees
Although this work will not review decision trees in detail, we will recap their basic building blocks. This will provide a basis to understand the examples later and also highlights some of HDTrees features. The following graphic is anĀ actual output of the HDTreeĀ (except for the markers)
a (Nodes)
ai:Ā textual description of theĀ test/split ruleĀ which was used at this node to separate the data into its children. Displays the relevant attribute(s) and the verbal description of the operation. Those tests areĀ highly configurable and can includeĀ any data-separating logicĀ you can come up with. Development of your own custom rules is supported by implementing an interface. More on that in section 3.
aii:Ā theĀ scoreĀ of the node measures its pureness, i.e., how good the data that passes through the node is separated. The higher the score, the better. High scores are also represented by the nodesā background color. The more greenish, the higher the score (white means unpure, i.e. equally distributed classes). Those scores direct the induction (building) process of the tree and are aĀ modular and exchangeable component of the HDTree.
aiii: the nodesāĀ borderĀ indicates how many data points are passing through this node. The thicker the border the more data is passing through that node.
aiv:Ā list of the targets / labels (i.e., the prediction goal) which the data points passing that node have. The most common class is marked with a ā.
av:Ā optionally a visualization can mark the path that an individual data points follows (illustrate the decision that are made when the data point passes the tree). This is marked with a line at the corner of the decision tree.
b (Edges)
bi:Ā the arrow connect each possible outcome of the split (ai) with its child nodes. The more of the data (relatively to the parent) is āflowingā along the edge, the thicker it is drawn.
bii:Ā each edge has a readable textual representation of the splitsā corresponding outcome.
Why different splits / tests?
At this point of reading you might already wonder, what is different from the HDTree to a scikit-learn tree (or any other implementation) and why we might want to have different kinds of splits? Letās try to make this clear.
Maybe you have an intuitive understanding of the notion ofĀ feature space.Ā All the data we are working with lies in some multi-dimensional space which is defined by the amount and the type of the attributes / features your data have.
The task of a classification algorithm now is to partition this space into non-overlapping regions and assigning those region a class / target / outcome, whatever you like to call it. Let's visualize that. Since our brains have troubles fiddling with high dimensionality, we will stick with a 2-dimensional example and a very simple 2-class problem as follows:
You see a very simple data set, which is made of two dimensions (features / attributes) and two classes. The generated data points were normal distributed at the center. A street which is just the linear functionĀ
f(x) = y
Ā separates those two classes into Class 1 (South East) and Class 2 (North West). Also, some random noise (blue data points in orange area and vice versa) was added in order to illustrate effects of overfitting later on.The task of a classification algorithm like HDTree (though, it can also be used for regression tasks) is to learn, which class each data point belongs to. In other words: given some pair of coordinates
(x, y)
like (6, 2)
the task is to learn if this coordinate belongs to the orange Class 1 or the blue Class 2. A discriminative model will try to separate the feature space (here the (x,y) -axis) into blue and orange territories respective.Given that data, the decision (rules) on how data would be separated (classified) seems very easy. A reasonable human being would say (think on your own first):
āItās Class 1 ifĀ x > y, otherwiseĀ class 2ā.
The perfect separation would created by the function
y=x
which is illustrated as reference as dashed line. Indeed aĀ large margin classifierĀ like aĀ support vector machine [8] would come up with a similar solution. But letās see what decision trees tempt to learn instead:The image illustrates the regions in which a standard decision tree with increasing depths would classify a data point as class 1 (orange) or class 2 (blue).
A decision tree will approximate the linear function using a step function
This is due to the type of test / split rule which decision trees use. They are all of the schemeĀ
attributeĀ
<
Ā threshold
Ā which will result in axis-parallel hyperplanes. In the 2D-space its just like rectangles getting ācut outā. In 3D-space it would be cuboidsā¦ and so on.Furthermore, the decision tree starts to model the noise within the data when having 8 levels already, i.e., it isĀ overfitting. While doing so it never finds a good approximation of the real linear function. To verify this, I used a typical train test split (2:1) of the data and evaluated the trees' scores, which are 93.84%, 93,03%, 90,81%Ā for the test set andĀ 94,54%, 96,57%, 98,81%Ā for the training setĀ (ordered by the trees' depth 4, 8, 16). While the test accuracy is decreasing the training accuracy is increasing.
Having an increasing training performance and a dropping test performance is the hallmark of overfitting
The resulting decision trees are also quite complex for such a simple function. The most simple of the them (depth 4) visualized by scikit learn looks already like that:
Quite a bummer, I will save you from the more complex ones. The next section will start by solving that problem using the HDTree package. HDTree will enable the user to incorporate knowledge about the data (just like knowing it is linearly separable in the example). Also it will allow you finding playing with alternative solutions to your problem.
Using the HDTree package š²
This section will guide you through the basics of HDTree. I will try to touch some parts of it API. Please feel free to ask in the comments section or contact me, if you have any questions on that. I will happily answer and extend the article if needed.
Installing HDTree š»
This is slightly more complicated thanĀ pip install hdtree. Sorry š
.
- Create an empty directory and create an folder named āhdtreeā inside there (
)your_folder/hdtree
- Download or clone theĀ repositoryĀ into the hdtree directory (not in a sub sub directory)
- Install the necessary dependencies / packages (numpy, pandas, graphviz, sklearn, Python ā„ 3.5)
- Add
into yourĀ python search path. This will include it into Pythons importing mechanism. You can use it as a common python package thenyour_folder
Alternatively you could addĀ hdtreeĀ to theĀ site-packagesĀ folder of your python installation. I might add an installation file later. At the moment of writing, the code is not available in the pip repository.
All codes that generate the graphics and outputs below (and also the previously seen) are inside the repository or directly hostedĀ here.
Solving this linear problem with a one-level decision tree š
Letās start with some code right away:
Yes, the resulting tree is only one level high and has the ideal solution to that problem. Admittedly, this example is somehow staged to show the effect. Still, I hope it makes the point clear: having an intuition about the data or by just providing the decision tree with various options to partition the feature space, it might come up with anĀ simplerĀ and sometimes evenĀ more accurate solution.
Imagine having to interpret the rules from those previous trees presented until here to find some actionable information. Which one you can understand first and which one you wouldĀ trustĀ more? The complicated one which uses multiple step functions or the small accurate tree? I guess the answer is quite simple.
But letās dig a bit into the actual code. When initializing the HDTreeClassifier, the most important thing youāll have to provide are the
allowed_splits
.Ā There you provide a list containing possible tests / split rules which the algorithm tries at construction / training time for every node in order to find a good local partition of the data.In this case we solely supplied aĀ
SmallerThanSplit
.Ā This split does exactly what you see: it takes two attributes (it will try any combination) and separates the data in the schemeĀ a_i < a_j
.Ā Which (not too much by random) fits the given data as good as it could be.This type of split is denoted as aĀ multivariate split.Ā Which means the split is usingĀ more than one attributeĀ for the partition-decision. This is unlike the univariate splitsĀ that are used in most other trees like the scikit-tree (see above for details) that take exactlyĀ one attributeĀ into account. Surely,
HDTree
also has options to achieve a ānormal splitā like the ones in the scikit-trees (QuantileSplit
-Family). We will learn more splits along our journey.The other unfamiliar thing you might see in the code is the models hyper parameterĀ
information_measure
. There, you provide the model with the measurement that is used to estimate the value of a single node or a complete split (a parent node with its children). The chosen option here is based on Entropy [10]. Maybe you have also heard of the Gini-Index, which would be another valid option here. Surely enough you can provide your very own information measures by just implementing the appropriate interface. If youād like: go ahead and implement the Gini-Index, which you could drop directly into the tree without re-implementing anything else by copying the EntropyMeasure()
and adopting it.Lets dig a bit deeper ā The Titanic disaster š¢
I kind of a fan of learning by example. Thatās why we I think I show you some more of the trees features by using an concrete example instead of just just some generated data.
The data set
We will go on using the famous 101 machine learning data set: the titanic disaster data set. We do this for convenience. Itās quite an easy data set which is not too big but features multiple different data types and missing values while not being entirely trivial. On top of that itās understandable by a human and many people already worked with that data set. To bring us all on the same page, the data looks like the following:
You might notice that there all sorts of attributes. Numerical, Categorical, Integer-Types and even missing values (look at the Cabin-column). The task is to predict if a passenger survived the titanic disaster by being provided with the given passenger information. A description of the attribute-meanings is givenĀ here.
When you walk through ML-tutorials using that data set, they will do all sorts ofĀ preprocessingsĀ in order to be able to use it with the common machine learning models, e.g. by removing missing values (
NaN
) by Imputing [12] orĀ dropping rows / columns, one-hot-encoding [13] categorical data (e.g.,Ā EmbarkedĀ &Ā Sex) or binning values in order to have a valid data set which ML-model accepts.This is technically not necessary for
HDTree
.Ā You can feed the data as is and it will happily accept it. Only change the data if you do some actual feature engineering. Thatās an actual simplification to get started.Train the first HDTree on the titanic data
We will just take the data as is and feed it to the model. The basic code is like the above, however in this example there will be way more splits allowed.
Letās dive a bit into what we see. We have created a Decision Tree having three levels, which chose to use 3 out of the 4 possibleĀ SplitRules.Ā They are marked withĀ S1, S2, S3, respectively.Ā I will shortly explain what they do.
- S1:Ā FixedValueSplit.Ā This split works on categorical data and picks one of the possible values. The data then is partitioned into one part having that value set and one part not having it set. e.gĀ PClass=1Ā andĀ Pclass ā 1.
- S2: (Twenty)QuantileRangeSplit. Those will work on numerical data. They willĀ divideĀ the relevant value-range of the evaluated attributeĀ intoĀ a fixed amount ofĀ quantiles [14] and spanĀ intervalsĀ which range overĀ consecutive subsetsĀ thereof (e.g., quantile 1 to quantile 5). Each quantile includes the exact same amount of data points. The start quantile and the end quantile (=size of the interval) is searched to optimize theĀ information measure. The data is split into (i) having the value within that interval or (ii) outside of it. There are splits for different amounts of quantiles available.
- S3: (Twenty)QuantileSplit. Similar to the Range-Splits (S2), but will separate the data on a threshold value. This is basically what normal decision trees do, except that they commonly try every possible threshold instead of a fixed number thereof.
You might have noticed, that theĀ SingleCategorySplitĀ was not used. I will still take the chance to explain it since it will pop up later:
- S4:Ā SingleCategorySplit.Ā This one will work similar to theĀ
. However, it will create a child node for every single possible value, e.g.: for the attributeĀ PClassĀ those would be 3 children (each forĀ Class 1, Class 2 and Class 3).Ā Note that the FixedValueSplit is identical to the SingleValueSplit if there are only two possible categories.FixedValueSplit
The individual splits are somewhatĀ smartĀ in regards to theĀ data types/valuesĀ theyāreĀ accepting. Up to some extend they know, under which circumstances they apply and under which they donāt.
This HDTree was also trained on training data using a 2:1 train-/test split. The performance is atĀ 80.37% train accuracyĀ andĀ 81,69 test accuracy. Not too bad.
This HDTree was also trained on training data using a 2:1 train-/test split. The performance is atĀ 80.37% train accuracyĀ andĀ 81,69 test accuracy. Not too bad.
Restricting the the Splits
Letās assume you are for some reason not too happy with the decisions found. Maybe you decide that having the very first split on top of the tree is too trivial (splitting on the attributeĀ sex). HDTree got you covered. The easiest solution would be to disallow theĀ FixedValueSplitĀ (and for that matter the equivalentĀ SingleCategorySplit) from appearing at the top. That is fairly easy. You will have to change the initialization of the Splits as follows:
I will present the resulting HDTree as whole, since we also can observe the missing split (S4) there inside the newly generated tree.
By disallowing the split on the attributeĀ
sex
Ā appearing at the root (due to parameterĀ min_level=1
; hint: surely enough you can also provide a max_level
), we totally restructured the tree. Its performance now is at 80.37% and 81,69% (train/test). It didnāt change at all, even if we have taken away the supposedly best split at the root node.Due to the fact that decision trees are building up in aĀ greedy fashion, they will only find aĀ localĀ best split for each node, which isnāt necessarily theĀ globallyĀ best option. Actually finding an ideal solution to the Decision Tree problem is proven to be NP-Complete [15] long ago.
So the best we can wish for are heuristics. Back to the example: Notice that we already got a non-trivial data insight? While it is trivial to say that men will only have a low chance of survival, less so might be the finding that being a man in the first or second class (
PClass
) departing from CherbourgĀ (Embarked=C
)Ā might increase your odds on survival. Or that even if youāre a man inĀ PClassĀ 3
but youāre under 33 years, your odds also increase? Remember: itās womanĀ andĀ children first. Its a good exercise to make those findings on your own by interpreting the visualization. Those findings were only possible due to restricting the tree. Who knows what else might be to uncover by using different restrictions? Try it out!
As a last example of that kind, I want to show you how to restrict splits to specificĀ attributes. This not only can be used toĀ preventĀ the tree from learningĀ unwanted correlationsĀ or forceĀ alternative solutionsĀ but alsoĀ narrowsĀ down theĀ search space. Especially when using multivariate splits this may drastically decrease the runtime. If you watch back to the previous example, you might spot the node which checks for the attribute
As a last example of that kind, I want to show you how to restrict splits to specificĀ attributes. This not only can be used toĀ preventĀ the tree from learningĀ unwanted correlationsĀ or forceĀ alternative solutionsĀ but alsoĀ narrowsĀ down theĀ search space. Especially when using multivariate splits this may drastically decrease the runtime. If you watch back to the previous example, you might spot the node which checks for the attribute
PassengerId
, which might be a think we do not want to model, since it at leastĀ shouldĀ not contribute to the information if a passenger survives. A check on that might be a sign ofĀ overfittingĀ again. Letās change that by using the parameterĀ blacklist_attribute_indices.You might wonder why
name length
would appear at all. Consider that having long names (double names or (noble) titles) may indicate a wealthy background increasing your survival chances.Additional hint: You can always add the sameĀĀ type twice. If you want to blacklist an attribute only for certain level(s) of theSplitRule
, just add the sameHDTree
without that restriction around that level(s).SplitRule
Predicting data points
As you might already have noticed, you can basically use the common scikit-learn interface to predict data. That is theĀ
As you might already have noticed, you can basically use the common scikit-learn interface to predict data. That is theĀ
.predict()
, .predict_proba()
Ā and also theĀ .score()
-method. But you can go further. There is an additional .explain_decision()
-method, which can print out a textual representation of a decision as follows (hdtree_titanic_3
is supposed to be the last change we made on the tree):will print:
Query:
{'PassengerId': 273, 'Pclass': 2, 'Sex': 'female', 'Age': 41.0, 'SibSp': 0, 'Parch': 1, 'Fare': 19.5, 'Cabin': nan, 'Embarked': 'S', 'Name Length': 41}
Predicted sample as "Survived" because of:
Explanation 1:
Step 1: Sex doesn't match value male
Step 2: Pclass doesn't match value 3
Step 3: Fare is OUTSIDE range [134.61, ..., 152.31[(19.50 is below range)
Step 4: Leaf. Vote for {'Survived'}
This even works for missing data. Letās set attribute index 2 (Sex) to missing (None):
Query:
{'PassengerId': 273, 'Pclass': 2, 'Sex': None, 'Age': 41.0, 'SibSp': 0, 'Parch': 1, 'Fare': 19.5, 'Cabin': nan, 'Embarked': 'S', 'Name Length': 41}
Predicted sample as "Death" because of:
Explanation 1:
Step 1: Sex has no value available
Step 2: Age is OUTSIDE range [28.00, ..., 31.00[(41.00 is above range)
Step 3: Age is OUTSIDE range [18.00, ..., 25.00[(41.00 is above range)
Step 4: Leaf. Vote for {'Death'}
---------------------------------
Explanation 2:
Step 1: Sex has no value available
Step 2: Pclass doesn't match value 3
Step 3: Fare is OUTSIDE range [134.61, ..., 152.31[(19.50 is below range)
Step 4: Leaf. Vote for {'Survived'}
---------------------------------
This will print all the decision paths (which are more than one because at some nodes no decision can be made!). The final result will be the most common class ofĀ among all leaves.
... other useful things to do
You can go ahead and get a representation of the tree as text just by printing it:
Level 0, ROOT: Node having 596 samples and 2 children with split rule "Split on Sex equals male" (Split Score: 0.251)
-Level 1, Child #1: Node having 390 samples and 2 children with split rule "Age is within range [28.00, ..., 31.00[" (Split Score: 0.342)
--Level 2, Child #1: Node having 117 samples and 2 children with split rule "Name Length is within range [18.80, ..., 20.00[" (Split Score: 0.543)
---Level 3, Child #1: Node having 14 samples and no children with
- SNIP -
or access all nodes that are clean (have high score)
['Node having 117 samples and 2 children with split rule "Name Length is within range [18.80, ..., 20.00[" (Split Score: 0.543)',
'Node having 14 samples and no children with split rule "no split rule" (Node Score: 1)',
'Node having 15 samples and no children with split rule "no split rule" (Node Score: 0.647)',
'Node having 107 samples and 2 children with split rule "Fare is within range [134.61, ..., 152.31[" (Split Score: 0.822)',
'Node having 102 samples and no children with split rule "no split rule" (Node Score: 0.861)']
There isĀ a lot moreĀ to explore. More than I can write into one story (although I might extend this or that later). Feel free to check out other methods and ask whatever question you might have in the comments. If youāre interested I will write a follow up on more need-greedy details š.
3. Extending HDTree š
The most valuable thing, which you may want to add to the system is your customĀ
SplitRule
.Ā A split rule really can do whatever it wants to do in order to separate the data. You can implement a SplitRule
by implementing theĀ AbstractSplitRule
-class
, which is quite complicated, since you would have to handle data acceptance, performance evaluation and all of that on your own. For that reasons there are MixinsĀ within the package which you can add into the implementation depending on the type of split you want to create which do most of the hard part for you.Since the article is quite lengthy already, I will not provide a complete how to within another article, which I hopefully will be able to write within the next weeks to come. If you have questions meanwhile, I will happily respond, though. Just drop me a message! The best for now is just take one existing Split Rule, copy it and adapt it to your demands. If youāre not too fluent in Python or OOP [11], some syntax may be overwhelming. Anyways, you do not need to understand every bit and bolt of it to make it work. Most stuff is quite self-explanatory. Many of the functions are just for displaying-purposes in order to generate human-readable texts.
I hope you enjoyed your time reading this story and maybe even learned something new on the go. If I made some mistakes (I surely did š), or you have any questions regarding the article or the software: feel free to drop me a message.
Again, please be aware that the Software will be flawed. Please do not drop small bugs here, but rather in the Repo. Same goes for suggestions or feature requests etc. .
Thanks again for your timeĀ š šš
Bibilography š
[1] Wikipedia article on Decision Trees https://en.wikipedia.org/wiki/Decision_tree
[2] Medium 101 article on Decision Trees https://medium.com/machine-learning-101/chapter-3-decision-tree-classifier-coding-ae7df4284e99
[3] Breiman, Leo, Joseph H Friedman, R. A. Olshen and C. J. Stone. āClassification and Regression Trees.ā (1983).
[4] scikit-learn documentation: Decision Tree Classifier https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html?highlight=decision%20tree#sklearn.tree.DecisionTreeClassifier
[5] Cython project page https://cython.org
[6] Wikipedia article on pruning: https://en.wikipedia.org/wiki/Decision_tree_pruning
[7] sklearn documentation: plot a Decision Tree https://scikit-learn.org/stable/modules/generated/sklearn.tree.plot_tree.html
[8] Wikipedia article Support Vector Machine https://de.wikipedia.org/wiki/Support_Vector_Machine
[9] MLExtend Python library http://rasbt.github.io/mlxtend/
[10] Wikipedia Article Entropy in context of Decision Trees https://en.wikipedia.org/wiki/ID3_algorithm#Entropy
[11] Wikepedia Article about OOP https://en.wikipedia.org/wiki/Object-oriented_programming
[12] Wikipedia Article on imputing https://en.wikipedia.org/wiki/Imputation_(statistics)#:~:text=In%20statistics%2C%20imputation%20is%20the,known%20as%20"item%20imputation
[13] Hackernoon article about one-hot-encoding https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
[14] Wikipedia Article about Quantiles https://en.wikipedia.org/wiki/Quantile
[15] Hyafil, Laurent; Rivest, Ronald L. āConstructing optimal binary decision trees is NP-completeā (1976)
[16] Hackernoon Article on Decision Trees https://hackernoon.com/what-is-a-decision-tree-in-machine-learning-15ce51dc445