Poster
Coresets for Decision Trees of Signals
Ibrahim Jubran · Ernesto Evgeniy Sanches Shayda · Ilan I Newman · Dan Feldman
Keywords: [ Machine Learning ]
Abstract:
A -decision tree (or -tree) is a recursive partition of a matrix (2D-signal) into block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix of entries (labels) is the sum of squared differences over every label in and its assigned label by .Given an error parameter , a -coreset of is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of . In particular, the optimal -tree of is a -approximation to the optimal -tree of .We provide the first algorithm that outputs such a -coreset for \emph{every} such matrix . The size of the coreset is polynomial in , and its construction takes time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x, while keeping similar accuracy. Full open source code is provided.
Chat is not available.