Léna Carel, Pierre Alquier

Mixture models are among the most popular tools for model based clustering.However, when the dimension and the number of clusters is large, the estimationas well as the interpretation of the clusters become challenging. We propose areduced-dimension mixture model, where the $K$ components parameters arecombinations of words from a small dictionary – say $H$ words with $H \ll K$.Including a Nonnegative Matrix Factorization (NMF) in the EM algorithm allowsto simultaneously estimate the dictionary and the parameters of the mixture. Wepropose the acronym NMF-EM for this algorithm. This original approach ismotivated by passengers clustering from ticketing data: we apply NMF-EM toticketing data from two Transdev public transport networks. In this case, thewords are easily interpreted as typical slots in a timetable.