- Impossibility of Depth Reduction in Explainable Clustering(arXiv)

Author : Chengyuan Deng, Surya Teja Gavva, Karthik C. S., Parth Patel, Adarsh Srinivasan

Abstract : Over the last few years Explainable Clustering has gathered a lot of attention. Dasgupta et al. [ICML’20] initiated the study of explainable k-means and k-median clustering problems where the explanation is captured by a threshold decision tree which partitions the space at each node using axis parallel hyperplanes. Recently, Laber et al. [Pattern Recognition’23] made a case to consider the depth of the decision tree as an additional complexity measure of interest. In this work, we prove that even when the input points are in the Euclidean plane, then any depth reduction in the explanation incurs unbounded loss in the k-means and k-median cost. Formally, we show that there exists a data set X in the Euclidean plane, for which there is a decision tree of depth k-1 whose k-means/k-median cost matches the optimal clustering cost of X, but every decision tree of depth less than k-1 has unbounded cost w.r.t. the optimal cost of clustering. We extend our results to the k-center objective as well, albeit with weaker guarantees

2. The Price of Explainability for Clustering(arXiv)

Author : Anupam Gupta, Madhusudhan Reddy Pittu, Ola Svensson, Rachel Yuan

Abstract : Given a set of points in d-dimensional space, an explainable clustering is one where the clusters are specified by a tree of axis-aligned threshold cuts. Dasgupta et al. (ICML 2020) posed the question of the price of explainability: the worst-case ratio between the cost of the best explainable clusterings to that of the best clusterings. We show that the price of explainability for k-medians is at most 1+Hk−1; in fact, we show that the popular Random Thresholds algorithm has exactly this price of explanability, matching the known lower bound constructions. We complement our tight analysis of this particular algorithm by constructing instances where the price of explanability (using any algorithm) is at least (1−o(1))lnk, showing that our result is best possible, up to lower-order terms. We also improve the price of explanability for the k-means problem to O(klnlnk) from the previous O(klnk), considerably closing the gap to the lower bounds of Ω(k). Finally, we study the algorithmic question of finding the best explainable clustering: We show that explainable k-medians and k-means cannot be approximated better than O(lnk), under standard complexity-theoretic conjectures. This essentially settles the approximability of explainable k-medians and leaves open the intriguing possibility to get significantly better approximation algorithms for k-means than its price of explainability.