Network representation learning: models, methods and applications

Anuraj Mohan, K. V. Pramod
2019 SN Applied Sciences  
With the rise of large-scale social networks, network mining has become an important sub-domain of data mining. Generating an efficient network representation is one important challenge in applying machine learning to network data. Recently, representation learning methods are widely used in various domains to generate low dimensional latent features from complex high dimensional data. A significant amount of research effort is made in the past few years to generate node representations from
more » ... ph-structured data using representation learning methods. Here, we provide a detailed study of the latest advancements in the field of network representation learning (also called network embedding). We first discuss the basic concepts and models of network embedding. Further, we build a taxonomy of network embedding methods based on the type of networks and review the major research works that come under each category. We then cover the major datasets used in network embedding research and describe the major applications of network embedding with respect to various network mining tasks. Finally, we provide various directions for future work which enhance further research. and dynamic, and the embedding method should adapt to all such situations. A few efforts are already made to survey [22, 38, 46 , 89] the various approaches for network embedding. In this survey, we focus on the recent methods for node embedding which are inspired by the recent advancements in representation learning. We provide a taxonomy of node embedding methods based on the type of the networks. Networks are classified into broader categories such as homogeneous networks, heterogeneous networks, attributed networks, signed networks, and dynamic networks. We discuss the common models of network representation learning and reviews the major works which come under each model with respect to each type of network. Further, we discuss the applications of network embedding along with the data sets used in the network embedding research. Terminologies and problem definition Definition 1 A Network is a graph is the set of vertices and e ∈ E is an edge between any two vertices. An adjacency matrix A defines the connectivity of G, A ij = 1 if v i and v j are connected, else A ij = 0. Definition 2 A homogeneous network is a network G = (V , E) , where each node v i ∈ V belongs to the same type and each edge e i ∈ E also belong to the same type. Definition 3 A attribute network can be defined as G A = (V , E, A, F) where V is the set of vertices, E is the set of edges, A is the adjacency matrix and F ∈ R n×k , ith row of F denotes the k dimensional attribute vector of node i. Definition 4 A heterogeneous network is a network G = (V , E) , where each node v i ∈ V and each edge e i ∈ E , are associated with mapping functions F(v) ∶ V → T v and f (e) ∶ E → T e , where T v and T e denotes the entity and relationship types respectively. Definition 5 A signed network is a network G = (V , E) , v ∈ V , e ∈ E and for each edge, e ij = +1 or e ij = −1 , denoting a positive link or a negative link between v i and v j . Definition 6 A dynamic network can be defined as a series of snapshots G = {G 1 , G 2 ... G n } where G i = (V i , E i ) and n is the number of snapshots.
doi:10.1007/s42452-019-1044-9 fatcat:zvlbj4qozzfw3dxoyevb6wgska