Filters








861 Hits in 6.3 sec

Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding [article]

Hidetaka Kamigaito, Katsuhiko Hayashi
2021 arXiv   pre-print
In knowledge graph embedding, the theoretical relationship between the softmax cross-entropy and negative sampling loss functions has not been investigated.  ...  We attempted to solve this problem by using the Bregman divergence to provide a unified interpretation of the softmax cross-entropy and negative sampling loss functions.  ...  We used its 1vsAll setting for SCE-based loss functions and negative sampling setting for NS-based loss functions. We modified LibKGE to be able to use label smoothing on the 1vsAll setting.  ... 
arXiv:2106.07250v2 fatcat:k7ahr62cgnfkpl5yq7sqnk62ba

Unified Interpretation of Softmax Cross Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

Hidetaka Kamigaito, Katsuhiko Hayashi
2021 Journal of Natural Language Processing  
上垣外, 林 Unified Interpretation of SCE and NS and Hayashi 2021) を参照していただきたい.  ...  Bregman 距離とソフトマックス関数 Ψ(z) を微分可能な関数とすると,分布 f と g の間の Bregman 距離は以下のように定義される: d Ψ(z) (f, g) = Ψ(f ) − Ψ(g) − ∇Ψ(g) T (f − g). (1) Ψ(z) を変えることによって,様々な距離を表現することが可能となる.論文中では NCE を対 象とした研究 (Gutmann and Hirayama  ...  ., and Gemulla, R. (2020)  ... 
doi:10.5715/jnlp.28.1336 fatcat:dyjzvh6gp5ghjdyto2tkvm67pa

Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With Case Study for Knowledge Graph Embedding

Hidetaka Kamigaito, Katsuhiko Hayashi
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)   unpublished
We attempted to solve this problem by using the Bregman divergence to provide a unified interpretation of the softmax cross-entropy and negative sampling loss functions.  ...  In knowledge graph embedding, the theoretical relationship between the softmax crossentropy and negative sampling loss functions has not been investigated.  ...  Acknowledgements This work was partially supported by JSPS Kakenhi Grant nos. 19K20339, 21H03491, and 21K17801.  ... 
doi:10.18653/v1/2021.acl-long.429 fatcat:cddz766fwfam5erafpekq5oqpq

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks [article]

Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin
2018 arXiv   pre-print
However, deep neural networks with the softmax classifier are known to produce highly overconfident posterior distributions even for such abnormal samples.  ...  While most prior methods have been evaluated for detecting either out-of-distribution or adversarial samples, but not both, the proposed method achieves the state-of-the-art performances for both cases  ...  Also, we use ResNet with 34 layers and dropout rate 0. 2 The softmax classifier is used, and each model is trained by minimizing the cross-entropy loss using SGD with Nesterov momentum.  ... 
arXiv:1807.03888v2 fatcat:kkgl5zrfdfhztk6hajgpvmhr5q

Graph-Guided Network for Irregularly Sampled Multivariate Time Series [article]

Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, Marinka Zitnik
2022 arXiv   pre-print
This model can be interpreted as a graph neural network that sends messages over graphs that are optimized for capturing time-varying dependencies among sensors.  ...  In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors  ...  Detailed description of data, scripts, and configurations along with examples of usage are also provided.  ... 
arXiv:2110.05357v2 fatcat:6tdi5aon3nd5larqkw5hy3qb3y

A comprehensive study on the prediction reliability of graph neural networks for virtual screening [article]

Soojung Yang, Kyung Hoon Lee, Seongok Ryu
2020 arXiv   pre-print
This work aims to propose guidelines for training reliable models, we thus provide methodological details and ablation studies on the following train principles.  ...  For decision makings in virtual screening, researchers find it useful to interpret an output of classification system as probability, since such interpretation allows them to filter out more desirable  ...  Acknowledgements We would like to appreciate Yongchan Kwon for his valuable comments on the effects of regularizations and experimental analysis.  ... 
arXiv:2003.07611v1 fatcat:ntyjvwu4dzfvzemixhuyaotmpm

Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning [article]

Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang
2020 arXiv   pre-print
Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating  ...  Specifically, we explore cross-attentive second-order visual statistics to compact the semantic-free representation, and design an adaptive margin Softmax to maximize inter-class divergences.  ...  The hyper-parameter of σ is set to be 0.5 for most cases, and τ will be analyzed later.  ... 
arXiv:2003.13261v2 fatcat:w2oyekgre5h2jmwiqvhlngjive

Drug-Drug Interaction Prediction with Wasserstein Adversarial Autoencoder-based Knowledge Graph Embeddings [article]

Yuanfei Dai, Chenhao Guo, Wenzhong Guo, Carsten Eickhoff
2020 arXiv   pre-print
In this paper, we propose a new knowledge graph embedding framework by introducing adversarial autoencoders (AAE) based on Wasserstein distances and Gumbel-Softmax relaxation for drug-drug interactions  ...  Recently, several knowledge graph embedding approaches have received increasing attention in the DDI domain due to their capability of projecting drugs and interactions into a low-dimensional feature space  ...  Conclusions The goal of this study is to find a new approach to negative sampling that improves the performance of drug-drug interaction knowledge graph embedding models.  ... 
arXiv:2004.07341v2 fatcat:zwmlp2vjvzeonp33hqy2veilwi

Network Representation Learning: From Preprocessing, Feature Extraction to Node Embedding [article]

Jingya Zhou, Ling Liu, Wenqi Wei, Jianxi Fan
2021 arXiv   pre-print
Network representation learning (NRL) advances the conventional graph mining of social networks, knowledge graphs, and complex biomedical and physics information networks.  ...  With this unifying reference framework, we highlight the representative methods, models, and techniques used at different stages of the node embedding model learning process.  ...  These representative embedding models are often coupled with optimization techniques, such as hierarchical softmax, negative sampling, and attention mechanism, for better embedding effects.  ... 
arXiv:2110.07582v1 fatcat:gbjn3evwwzf4xkeobrsfo6hope

Learning from Very Few Samples: A Survey [article]

Jiang Lu, Pinghua Gong, Jieping Ye, Changshui Zhang
2020 arXiv   pre-print
Few sample learning (FSL) is significant and challenging in the field of machine learning.  ...  In this context, we extensively review 300+ papers of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive survey for FSL.  ...  ACKNOWLEDGMENTS The authors would like to thank the pioneer researchers in few sample learning and other related fields.  ... 
arXiv:2009.02653v2 fatcat:fytfbeifmnhbfodat6czwxsqeu

A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses [article]

Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, Ismail Ben Ayed
2021 arXiv   pre-print
Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for convoluted sample-mining heuristics.  ...  The standard cross-entropy loss for classification has been largely overlooked in DML.  ...  .: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2015) 27.  ... 
arXiv:2003.08983v3 fatcat:ip2xetx56zcgxiscf6claxthtq

Relation-aware Graph Attention Model With Adaptive Self-adversarial Training [article]

Xiao Qin, Nasrullah Sheikh, Berthold Reinwald, Lingfei Wu
2021 arXiv   pre-print
Existing message passing-based graph neural networks use edges either for graph traversal and/or selection of message encoding functions.  ...  RelGNN also adopts a self-attention mechanism to balance the importance of attribute features and topological features for generating the final entity embeddings.  ...  ))) + (−1, dr(f (vm), f (vn))) , (8) where θ denotes the model parameters, is usually defined as cross entropy,v m ,v n ∈ V are parts of a negative relationship sample, i.e.  ... 
arXiv:2102.07186v1 fatcat:qzevzifl6jgstcrnzjg6qlv6yu

Commonsense Knowledge-Aware Prompt Tuning for Few-Shot NOTA Relation Classification

Bo Lv, Li Jin, Yanan Zhang, Hao Wang, Xiaoyu Li, Zhi Guo
2022 Applied Sciences  
The model needs to make full use of the syntactic information and word meaning information learned in the pre-training stage to distinguish the NOTA category and the support sample category in the embedding  ...  In this paper, we propose the commonsense knowledge-aware prompt tuning (CKPT) method for a few-shot NOTA relation classification task.  ...  We can observe that for the model using softmax cross-entropy loss, some features of positive samples and negative samples are mixed, and the boundary between positive and negative samples is not clear  ... 
doi:10.3390/app12042185 fatcat:b3ebyubls5a6dkmbt3wipbjm7u

Learning to Make Predictions on Graphs with Autoencoders

Phi Vu Tran
2018 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)  
We present a novel autoencoder architecture capable of learning a joint representation of both local graph structure and available node features for the multi-task learning of link prediction and node  ...  We provide a comprehensive empirical evaluation of our models on nine benchmark graph-structured datasets and demonstrate significant improvement over related methods for graph representation learning.  ...  Acknowledgment The author thanks Edward Raff and Jared Sylvester for insightful discussions, and gracious reviewers for constructive feedback on the paper.  ... 
doi:10.1109/dsaa.2018.00034 dblp:conf/dsaa/Tran18 fatcat:5esavcggqve27fk2szcyq5q3wq

Large Margin Few-Shot Learning [article]

Yong Wang, Xiao-Ming Wu, Qimai Li, Jiatao Gu, Wangmeng Xiang, Lei Zhang, Victor O.K. Li
2018 arXiv   pre-print
with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method.  ...  To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training.  ...  Although large margin methods have been widely used in machine learning, it is the first time it is used for few-shot learning, to the best of our knowledge.  ... 
arXiv:1807.02872v2 fatcat:5baju76agjf5djn5pqkb62htk4
« Previous Showing results 1 — 15 out of 861 results