A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is
Lecture Notes in Computer Science
For statistical modelling of multivariate binary data, such as text documents, datum instances are typically represented as vectors over a global vocabulary of attributes. Apart from the issue of high dimensionality, this also faces us with the problem of uneven importance of various attribute presences/absences. This problem has been largely overlooked in the literature, however it may create difficulties in obtaining reliable estimates of unsupervised probabilistic representation models. Indoi:10.1007/11508069_6 fatcat:644zsjnbnraqlfstjqgtwxih6q