[2603.06602] Khatri-Rao Clustering for Data Summarization

Dataemia
2 Min Read


[Submitted on 17 Feb 2026]

View a PDF of the paper titled Khatri-Rao Clustering for Data Summarization, by Martino Ciaperoni and 3 other authors

View PDF
HTML (experimental)

Abstract:As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms of few prototypes, each representing a cluster in the data. Despite their wide adoption, the resulting data summaries often contain redundancies, limiting their effectiveness particularly in datasets characterized by a large number of underlying clusters. To overcome this limitation, we introduce the Khatri-Rao clustering paradigm that extends traditional centroid-based clustering to produce more succinct but equally accurate data summaries by postulating that centroids arise from the interaction of two or more succinct sets of protocentroids.

We study two central approaches to centroid-based clustering, namely the well-established k-Means algorithm and the increasingly popular topic of deep clustering, under the lens of the Khatri-Rao paradigm. To this end, we introduce the Khatri-Rao k-Means algorithm and the Khatri-Rao deep clustering framework. Extensive experiments show that Khatri-Rao k-Means can strike a more favorable trade-off between succinctness and accuracy in data summarization than standard k-Means. Leveraging representation learning, the Khatri-Rao deep clustering framework offers even greater benefits, reducing even more the size of data summaries given by deep clustering while preserving their accuracy.

Submission history

From: Martino Ciaperoni [view email]
[v1]
Tue, 17 Feb 2026 14:16:12 UTC (8,847 KB)



Source link

Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

error: Content is protected !!