A Novel Statistical Theoretical Split Metric for Decision Tree Classification
Authors: Mainak Biswas
Publishing Date: 20-01-2024
ISBN: 978-81-955020-7-3
Abstract
Decision trees (DTs) are a significant category of logical tools in machine learning (ML), used to classify both text and numerical data. Over the years, two primary criteria for splitting DTs have been prevalent: information gain, which hinges on Shannon’s entropy, and the Gini index. Both these criteria rely on the empirical probabilities of classes within the attribute space of the dataset. In this study, a novel split criteria is introduced, rooted in the principles of statistical mechanics. This measure draws inspiration from the second law of Thermodynamics, which stipulates that in a closed system with unchanging external conditions and entropy, the internal energy of the system will decrease and reach a minimum at equilibrium. This novel split criterion was tested on four datasets, each containing at least 100 instances. The results demonstrated a comprehensive enhancement in accuracy, precision, recall, and F1-score.
Keywords
Statistical mechanics, splitting criteria, decision tree
Cite as
Mainak Biswas, "A Novel Statistical Theoretical Split Metric for Decision Tree Classification", In: Ashish Kumar Tripathi and Vivek Shrivastava (eds), Advancements in Communication and Systems, SCRS, India, 2024, pp. 65-80. https://doi.org/10.56155/978-81-955020-7-3-6