High Speed Document Clustering in Reconfigurable Hardware

G. Covington, Charles Comstock, Andrew Levine, John Lockwood, Young Cho
2006 2006 International Conference on Field Programmable Logic and Applications  
High-performance document clustering systems enable similar documents to be automatically organized into groups. In the past, the large amount of computational time needed to cluster documents prevented practical use of such systems with a large number of documents. A full hardware implementation of the K-means clustering algorithm has been designed and implemented in reconfigurable hardware that clusters 512k documents rapidly. This implementation, uses four parallel cosine distance metrics to
more » ... cluster document vectors that each have 4000 dimensions. The synthesized hardware runs on the Field Programmable Port Extender (FPX) platform at a clock rate of 80 MHz. Although the clock rate on the Xilinx VirtexE 2000 is slower than a CPU, the implementation runs 26 times faster than an algorithmically equivalent software algorithm running on an Intel 3.60 GHz Xeon. The same architecture was used to synthesize a faster and larger design for the Xilinx Virtex4 LX200. This larger implementation can contain up to 25 parallel cosine distance metrics. The implementation synthesized with a clock rate of 250 Mhz and outperforms the equivalent software by a factor of 328.
doi:10.1109/fpl.2006.311245 dblp:conf/fpl/CovingtonCLLC06 fatcat:jxtdkaz3knbaxlqneg6lao3pue