Filters








291 Hits in 1.5 sec

Conditions on Consistency of Probabilistic Tree Adjoining Grammars [article]

Anoop Sarkar
1998 arXiv   pre-print
Much of the power of probabilistic methods in modelling language comes from their ability to compare several derivations for the same string in the language. An important starting point for the study of such cross-derivational properties is the notion of _consistency_. The probability model defined by a probabilistic grammar is said to be _consistent_ if the probabilities assigned to all the strings in the language sum to one. From the literature on probabilistic context-free grammars (CFGs),
more » ... know precisely the conditions which ensure that consistency is true for a given CFG. This paper derives the conditions under which a given probabilistic Tree Adjoining Grammar (TAG) can be shown to be consistent. It gives a simple algorithm for checking consistency and gives the formal justification for its correctness. The conditions derived here can be used to ensure that probability models that use TAGs can be checked for _deficiency_ (i.e. whether any probability mass is assigned to strings that cannot be generated).
arXiv:cs/9809027v1 fatcat:kedomff4znhq7f4swwhpcuhznm

Automatic Extraction of Subcategorization Frames for Czech [article]

Anoop Sarkar, Daniel Zeman
2000 arXiv   pre-print
We present some novel machine learning techniques for the identification of subcategorization information for verbs in Czech. We compare three different statistical techniques applied to this problem. We show how the learning algorithm can be used to discover previously unknown subcategorization frames from the Czech Prague Dependency Treebank. The algorithm can then be used to label dependents of a verb in the Czech treebank as either arguments or adjuncts. Using our techniques, we ar able to achieve 88% precision on unseen parsed text.
arXiv:cs/0009003v1 fatcat:xnaolebamrbcxm2aus57i4o52i

Separating Dependency from Constituency in a Tree Rewriting System [article]

Anoop Sarkar
1998 arXiv   pre-print
Moreover, the linguistic analyses presented in Sarkar and Joshi, 1996) can be easily adopted in the current formalism.  ...  from the constituency gives a better formal understanding of its representation when compared to previous approaches that use tree-rewriting systems which conflate the two issues, as in (Joshi, 1990; Sarkar  ... 
arXiv:cs/9809028v1 fatcat:aklukbymb5eftdrvqpu2umotje

CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation [article]

Nishant Kambhatla, Logan Born, Anoop Sarkar
2022 arXiv   pre-print
We propose a novel data-augmentation technique for neural machine translation based on ROT-k ciphertexts. ROT-k is a simple letter substitution cipher that replaces a letter in the plaintext with the kth letter after it in the alphabet. We first generate multiple ROT-k ciphertexts using different values of k for the plaintext which is the source side of the parallel data. We then leverage this enciphered training data along with the original parallel data via multi-source training to improve
more » ... ral machine translation. Our method, CipherDAug, uses a co-regularization-inspired training procedure, requires no external data sources other than the original training data, and uses a standard Transformer to outperform strong data augmentation techniques on several datasets by a significant margin. This technique combines easily with existing approaches to data augmentation, and yields particularly strong results in low-resource settings.
arXiv:2204.00665v1 fatcat:nihokvsxzjhkpf5fmr56bhyykq

Better Neural Machine Translation by Extracting Linguistic Information from BERT [article]

Hassan S. Shavarani, Anoop Sarkar
2021 arXiv   pre-print
Adding linguistic information (syntax or semantics) to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT (Devlin et al., 2019) has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine-tuned vector-based linguistic
more » ... from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT.
arXiv:2104.02831v1 fatcat:wjloxl2lsfhyxifchmb56c7fai

A Note on Typing Feature Structures

Shuly Wintner, Anoop Sarkar
2002 Computational Linguistics  
In a new implementation of XTAG (Sarkar, 2000) , feature structure specifications are not evaluated as structures are being constructed; rather, they are deferred to the final stage of processing, when  ... 
doi:10.1162/089120102760276027 fatcat:mlqllv7bdjgphajs4v3anulh3m

Interrogating the Explanatory Power of Attention in Neural Machine Translation [article]

Pooya Moradi, Nishant Kambhatla, Anoop Sarkar
2019 arXiv   pre-print
Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model's decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of
more » ... trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that attention models by themselves cannot reliably explain the decisions made by a NMT model.
arXiv:1910.00139v1 fatcat:eqtln62sorhsfhjbwhf7i6raza

Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation [article]

Jetic Gū, Hassan S. Shavarani, Anoop Sarkar
2019 arXiv   pre-print
Neural machine translation (NMT) systems require large amounts of high quality in-domain parallel corpora for training. State-of-the-art NMT systems still face challenges related to out-of-vocabulary words and dealing with low-resource language pairs. In this paper, we propose and compare several models for fusion of bilingual lexicons with an end-to-end trained sequence-to-sequence model for machine translation. The result is a fusion model with two information sources for the decoder: a
more » ... conditional language model and a bilingual lexicon. This fusion model learns how to combine both sources of information in order to produce higher quality translation output. Our experiments show that our proposed models work well in relatively low-resource scenarios, and also effectively reduce the parameter size and training cost for NMT without sacrificing performance.
arXiv:1909.07907v1 fatcat:sl5je73qkjfjbebuoucr7lhgpi

On the Generalized Hardy-Rellich Inequalities [article]

T. V. Anoop, Ujjal Das, Abhishek Sarkar
2018 arXiv   pre-print
In this article, we look for the weight functions (say g) that admits the following generalized Hardy-Rellich type inequality: ∫_Ω g(x) u^2 dx ≤ C ∫_Ω |Δ u|^2 dx, ∀ u ∈D^2,2_0(Ω), for some constant C>0, where Ω is an open set in R^N with N> 1. We find various classes of such weight functions, depending on the dimension N and the geometry of Ω. Firstly, we use the Muckenhoupt condition for the one dimensional weighted Hardy inequalities and a symmetrization inequality to obtain admissible
more » ... in certain Lorentz-Zygmund spaces. Secondly, using the fundamental theorem of integration we obtain the weight functions in certain weighted Lebesgue spaces. As a consequence of our results, we obtain simple proofs for the embeddings of D^2,2_0(Ω) into certain Lorentz-Zygmund spaces proved by Hansson and later by Brezis and Wainger.
arXiv:1801.03197v1 fatcat:7hhjqtaefzbsjolzprzdlhwcuq

Analysis of Semi-Supervised Learning with the Yarowsky Algorithm [article]

Gholam Reza Haffari, Anoop Sarkar
2012 arXiv   pre-print
The Yarowsky algorithm is a rule-based semi-supervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney's work and show that some of his proposed algorithms actually optimize (an upper-bound on) an objective function based on a new
more » ... definition of cross-entropy which is based on a particular instantiation of the Bregman distance between probability distributions. Moreover, we suggest some new algorithms for rule-based semi-supervised learning and show connections with harmonic functions and minimum multi-way cuts in graph-based semi-supervised learning.
arXiv:1206.5240v1 fatcat:k554g6a6abbexgvezgvdzivrre

Coordination in Tree Adjoining Grammars: Formalization and Implementation [article]

Anoop Sarkar and Aravind Joshi (Dept of Computer and Information Science, University of Pennsylvania)
1996 arXiv   pre-print
The complete and formal description of the parsing algorithm is given in (Sarkar and Joshi, 1996) .  ...  Obtaining a tree structure from a derived structure built by the conjoin operation is discussed in (Sarkar and Joshi, 1996) .  ... 
arXiv:cmp-lg/9606006v1 fatcat:dprsshmgvzaqzn2zjbcwbqroge

Semi-supervised model adaptation for statistical machine translation

Nicola Ueffing, Gholamreza Haffari, Anoop Sarkar
2007 Machine Translation  
ssl-smt-mtjournal07.tex; 6/06/2008; 1:34; p.11 ssl-smt-mtjournal07.tex; 6/06/2008; 1:34; p.12 Semi-supervised model adaptation for SMT ssl-smt-mtjournal07.tex; 6/06/2008; 1:34; p.13 Ueffing, Haffari, Sarkar  ... 
doi:10.1007/s10590-008-9036-3 fatcat:6xk6bjz4zfag7bwfedlai5oy3e

Voting Between Multiple Data Representations for Text Chunking [chapter]

Hong Shen, Anoop Sarkar
2005 Lecture Notes in Computer Science  
This paper considers the hypothesis that voting between multiple data representations can be more accurate than voting between multiple learning models. The main contribution of this paper is that a single learning method, in our case a simple trigram Hidden Markov Model can use voting between multiple data representations to obtain results equal to the best on the ConLL-2000 text chunking data set. Using no additional knowledge sources, we achieved 94.01 F β=1 score for arbitrary phrase
more » ... ication and 95.23 F β=1 score for Base NP identification. In addition, the significance comparsion showed our Base NP identification score is significantly better than the previous comparable state-of-the-art score of 94.22.
doi:10.1007/11424918_40 fatcat:nqrs2s4fw5fzlfuon7zqomfr2e

Training Global Linear Models for Chinese Word Segmentation [chapter]

Dong Song, Anoop Sarkar
2009 Lecture Notes in Computer Science  
Global Linear Models for Chinese Word Segmentation Possible segmentations Score for each segmentation  Find the most plausible word segmentation y' for an un-segmented Chinese sentence x: Feature weight Features of candidate y  Global linear models (Collins, 2002) can be trained using perceptron (voted or averaged variants); max-margin methods; and even CRFs, by normalizing the score above to give log(p(y|x))
doi:10.1007/978-3-642-01818-3_15 fatcat:fjomt3gbtjgatcs3fcoynsqyoq

An Easily Extensible HMM Word Aligner

Jetic Gū, Anahita Mansouri Bigvand, Anoop Sarkar
2018 Prague Bulletin of Mathematical Linguistics  
In this paper, we present a new word aligner with built-in support for alignment types, as well as comparisons between various models and existing aligner systems. It is an open source software that can be easily extended to use models of users' own design. We expect it to suffice the academics as well as scientists working in the industry to do word alignment, as well as experimenting on their own new models. Here in the present paper, the basic designs and structures will be introduced. Examples and demos of the system are also provided.
doi:10.2478/pralin-2018-0008 fatcat:67jyrpoopvh2bnubyttvap5mfu
« Previous Showing results 1 — 15 out of 291 results