Generalized MDL approach for data summarization

Xiaodong Zhou
There are many applications that identify some data of interest. Usually, such applications just return a set of records that satisfy the criteria applied. However, such results cannot provide enough information for the user. A concise description is more preferable than the individual data. Minimum Description Length (MDL) is a well-known approach to handle such problems. In this thesis, we extend the MDL principle to the Generalized MDL (GMDL) principle by including some "do not care" data.
more » ... o not care" data. We apply the MDL and GMDL principles to solve the problem of data summarization both in the spatial case and in the hierarchical case. For the spatial case, we improve one current top-down algorithm for high-dimensional data. We also study the GMDL problem for the hierarchical case and find that there exists a unique, non-redundant, and bluemaximal MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to find MDL covering and GMDL covering respectively in the hierarchical case. The experimental results show that GMDL coverings have a much shorter description than MDL covering in the hierarchical case.
doi:10.14288/1.0051405 fatcat:pghikf52offkvn2466gy7o6wzy