%0 Journal Article %A Saraiva, E. F. %A Pereira, C. A. B. %A Suzuki, A. K. %D 2019 %T A data-driven selection of the number of clusters in the Dirichlet allocation model via Bayesian mixture modelling %U https://tandf.figshare.com/articles/journal_contribution/A_data-driven_selection_of_the_number_of_clusters_in_the_Dirichlet_allocation_model_via_Bayesian_mixture_modelling/8949107 %R 10.6084/m9.figshare.8949107.v1 %2 https://tandf.figshare.com/ndownloader/files/16356413 %K Mixture model %K Bayesian approach %K Gibbs sampling %K Metropolis–Hastings %K split-merge update %K Kullback–Leibler divergence %K 62M05 %X

In this paper, we consider a Bayesian mixture model that allows us to integrate out the weights of the mixture in order to obtain a procedure in which the number of clusters is an unknown quantity. To determine clusters and estimate parameters of interest, we develop an MCMC algorithm denominated by sequential data-driven allocation sampler. In this algorithm, a single observation has a non-null probability to create a new cluster and a set of observations may create a new cluster through the split-merge movements. The split-merge movements are developed using a sequential allocation procedure based in allocation probabilities that are calculated according to the Kullback–Leibler divergence between the posterior distribution using the observations previously allocated and the posterior distribution including a ‘new’ observation. We verified the performance of the proposed algorithm on the simulated data and then we illustrate its use on three publicly available real data sets.

%I Taylor & Francis