Description:
A decision tree is a transparent model where the rules are visible and can represent the logic of classification. However, this structure might allow attackers to infer confidential information if the rules carry some sensitive information. Thus, a tree pruning methodology based on an IP truncation anonymisation scheme is proposed in this paper to prune the real IP addresses. However, the possible drawback of carelessly designed tree pruning might degrade the performance of the original tree as some information is intentionally opted out for the tree’s consideration. In this work, the 6-percent-GureKDDCup’99, full-version-GureKDDCup’99, UNSW-NB15, and CIDDS-001 datasets are used to evaluate the performance of the proposed pruning method. The results are also compared to the original unpruned tree model to observe its tolerance and trade-off. The tree model adopted in this work is the C4.5 tree. The findings from our empirical results are very encouraging and spell two main advantages: the sensitive IP addresses can be “pruned” (hidden) throughout the classification process to prevent any potential user profiling, and the number of nodes in the tree is tremendously reduced to make the rule interpretation possible while maintaining the classification accuracy.