Recommendations for Different Tasks Based on the Uniform Multimodal Joint Representation
Content curation social networks (CCSNs), such as Pinterest and Huaban, are interest driven and content centric. On CCSNs, user interests are represented by a set of boards, and a board is composed of various pins. A pin is an image with a description. All entities, such as users, boards, and categories, can be represented as a set of pins. Therefore, it is possible to implement entity representation and the corresponding recommendations on a uniform representation space from pins. Furthermore,
... lots of pins are re-pinned from others and the pin's re-pin sequences are recorded on CCSNs. In this paper, a framework which can learn the multimodal joint representation of pins, including text representation, image representation, and multimodal fusion, is proposed. Image representations are extracted from a multilabel convolutional neural network. The multiple labels of pins are automatically obtained by the category distributions in the re-pin sequences, which benefits from the network architecture. Text representations are obtained with the word2vec tool. Two modalities are fused with a multimodal deep Boltzmann machine. On the basis of the pin representation, different recommendation tasks are implemented, including recommending pins or boards to users, recommending thumbnails to boards, and recommending categories to boards. Experimental results on a dataset from Huaban demonstrate that the multimodal joint representation of pins contains the information of user interests. Furthermore, the proposed multimodal joint representation outperformed unimodal representation in different recommendation tasks. Experiments were also performed to validate the effectiveness of the proposed recommendation methods.