Filters








362 Hits in 1.9 sec

"Building a search engine for algorithms" by Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles with Martin Vesely as coordinator

Suppawong Tuarob, Prasenjit Mitra, C. Lee Giles
2014 ACM SIGWEB Newsletter  
Currently, the metadata of a pseudocode includes its caption, textual summary (generated by the document element summarization algorithm proposed by Bhatia and Mitra [2012] ), and the metadata of the  ...  TUAROB, S., MITRA, P.,AND GILES, C. L. 2013. A classification scheme for algorithm citation function in scholarly works. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries.  ... 
doi:10.1145/2559858.2559863 fatcat:ull5x5zdufel3jr6ngoijruxlm

On Summarizing Graph Streams [article]

Nan Tang, Qing Chen, Prasenjit Mitra
2015 arXiv   pre-print
Graph streams, which refer to the graph with edges being updated sequentially in a form of a stream, have wide applications such as cyber security, social networks and transportation networks. This paper studies the problem of summarizing graph streams. Specifically, given a graph stream G, directed or undirected, the objective is to summarize G as S with much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that S allows many queries
more » ... ver G to be approximately conducted efficiently. Due to the sheer volume and highly dynamic nature of graph streams, summarizing them remains a notoriously hard, if not impossible, problem. The widely used practice of summarizing data streams is to treat each element independently by e.g., hash- or sampling-based method, without keeping track of the connections between elements in a data stream, which gives these summaries limited power in supporting complicated queries over graph streams. This paper discusses a fundamentally different philosophy for summarizing graph streams. We present gLava, a probabilistic graph model that, instead of treating an edge (a stream element) as the operating unit, uses the finer grained node in an element. This will naturally form a new graph sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported graph queries and establish theoretical error bounds for basic queries.
arXiv:1510.02219v1 fatcat:rcszktypbbgl3gdw6n55silmyi

Abstractive Meeting Summarization UsingDependency Graph Fusion [article]

Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama
2016 arXiv   pre-print
Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim to generate a one sentence summary from the most important utterances using an integer linear
more » ... ming-based sentence fusion approach. Experimental results show that our method can generate more informative summaries than the baselines.
arXiv:1609.07035v1 fatcat:navnfivvencbxc7jlsy6jcwuba

YASS

Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra, Kalyankumar Datta
2007 ACM Transactions on Information Systems  
Stemmers attempt to reduce a word to its stem or root form and are used widely in information retrieval tasks to increase the recall rate. Most popular stemmers encode a large number of languagespecific rules built over a length of time. Such stemmers with comprehensive rules are available only for a few languages. In the absence of extensive linguistic resources for certain languages, statistical language processing tools have been successfully used to improve the performance of IR systems. In
more » ... this article, we describe a clustering-based approach to discover equivalence classes of root words and their morphological variants. A set of string distance measures are defined, and the lexicon for a given text collection is clustered using the distance measures to identify these equivalence classes. The proposed approach is compared with Porter's and Lovin's stemmers on the AP and WSJ subcollections of the Tipster dataset using 200 queries. Its performance is comparable to that of Porter's and Lovin's stemmers, both in terms of average precision and the total number of relevant documents retrieved. The proposed stemming algorithm also provides consistent improvements in retrieval performance for French and Bengali, which are currently resource-poor.
doi:10.1145/1281485.1281489 fatcat:yxgt3w2tt5f7bi3izrkcqv6txe

Recognition of Implicit Geographic Movement in Text [article]

Scott Pezanowski, Prasenjit Mitra
2022 arXiv   pre-print
to make better use of this underutilized information source, we created a corpus of statements that describe geographic movement at both a small gold-standard level verified by humans (Pezanowski and Mitra  ... 
arXiv:2201.12799v1 fatcat:5catnvp5djdunn5gx5ii43322i

Federated Unlearning with Knowledge Distillation [article]

Chen Wu and Sencun Zhu and Prasenjit Mitra
2022 arXiv   pre-print
Federated Learning (FL) is designed to protect the data privacy of each client during the training process by transmitting only models instead of the original data. However, the trained model may memorize certain information about the training data. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's
more » ... n by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without using any data from the clients. This method does not have any restrictions on the type of neural networks and does not rely on clients' participation, so it is practical and efficient in the FL system. We further introduce backdoor attacks in the training process to help evaluate the unlearning effect. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method.
arXiv:2201.09441v1 fatcat:aepyuq2qenh5dpxm42fiq2nlwy

Effectively Searching Maps in Web Documents [article]

Qingzhao Tan, Prasenjit Mitra, C. Lee Giles
2009 arXiv   pre-print
Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to
more » ... ll through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.
arXiv:0901.3939v1 fatcat:ug6sggl3t5cj3dr7y35kgfxknu

Learning To Describe Player Form in The MLB [article]

Connor Heaton, Prasenjit Mitra
2021 arXiv   pre-print
Major League Baseball (MLB) has a storied history of using statistics to better understand and discuss the game of baseball, with an entire discipline of statistics dedicated to the craft, known as sabermetrics. At their core, all sabermetrics seek to quantify some aspect of the game, often a specific aspect of a player's skill set - such as a batter's ability to drive in runs (RBI) or a pitcher's ability to keep batters from reaching base (WHIP). While useful, such statistics are fundamentally
more » ... limited by the fact that they are derived from an account of what happened on the field, not how it happened. As a first step towards alleviating this shortcoming, we present a novel, contrastive learning-based framework for describing player form in the MLB. We use form to refer to the way in which a player has impacted the course of play in their recent appearances. Concretely, a player's form is described by a 72-dimensional vector. By comparing clusters of players resulting from our form representations and those resulting from traditional abermetrics, we demonstrate that our form representations contain information about how players impact the course of play, not present in traditional, publicly available statistics. We believe these embeddings could be utilized to predict both in-game and game-level events, such as the result of an at-bat or the winner of a game.
arXiv:2109.05280v1 fatcat:cd4atptpcvg2tlhqkcq3unv3fu

UV Absorbing Property of Ageratum conyzoides Linn Leaves

Tanaya Ghosh, Prasenjit Mitra, Prasanta Kumar Mitra
2020 Acta Scientific Pharmaceutical Sciences  
Citation: Prasanta Kumar Mitra., et al. "UV Absorbing Property of Ageratum conyzoides Linn Leaves". Acta Scientific Pharmaceutical Sciences 4.3 (2020): 51-55.  ... 
doi:10.31080/asps.2020.04.0506 fatcat:voq7fes4ozhblkwp64riyx6qya

Isolation of Anti Solar Compound from Costus Speciosus Leaves

Prasenjit Mitra, Tanaya Ghosh, Prasanta Kumar Mitra
2020 Scholars Academic Journal of Pharmacy  
Original Research Article Ultraviolet (UV) radiation is required by humans for synthesis of vitamin -D in body. Vitamin -D is important for formation and maintenance of bones. Vitamin -D is also involved in different metabolic processes. UV radiation is, therefore, good for humans. But, UV radiation has many bad effects too. Eyes and skins are affected. Prolonged exposure of UV radiation may cause skin cancer and develop cataract. Therefore there is continuous search for anti solar compounds
more » ... m different sources including plants and herbs. Recently we found that leaves of Costus speciosus (C. speciosus), a leafy green herb having many pharmacological properties, can absorb ultraviolet radiation. Aim of the present work was to isolate the anti solar compound from C. speciosus leaves. Leaves of C. speciosus were collected, identified by taxonomist and processed for isolation work by standard methodologies. Solvent extraction and acid hydrolysis were done. These were followed by solvent treatment and chromatographic experiments. A compound was crystallized. UV absorption property of the isolated compound was studied. The compound showed maximum ultraviolet absorption at 200 nm. The compound, therefore, may be used in the preparation of sun screen lotion as anti solar compound.
doi:10.36347/sajp.2020.v09i01.005 fatcat:ke45itbjynej5n4d7pnbrwsodm

Summarizing Situational and Topical Information During Crises [article]

Koustav Rudra, Siddhartha Banerjee, Niloy Ganguly, Pawan Goyal, Muhammad Imran, Prasenjit Mitra
2016 arXiv   pre-print
Banerjee, Mitra, and Sugiyama proposed a graph-based abstractive summarization method on news articles [2] .  ... 
arXiv:1610.01561v1 fatcat:qrvz63akrnepnec77kr4r3w3la

Anti Solar Activity of Costus Speciosus Leaves of Sikkim Himalayas

Prasenjit Mitra, Tanaya Ghosh, Prasanta Kumar Mitra
2020 Scholars Academic Journal of Pharmacy  
Original Research Article Since long Costus speciosus (C. speciosus) has been used in different system of medicine for medical treatment. The plant has several pharmacological properties like anti inflammatory, anti oxidant, anti microbial, anti cancer, gastro protective, anti diabetic, anti gastric ulcer, hepato protective etc. But anti solar activity of C. speciosus leaves of Sikkim Himalaya is not known in literature. Aim of the present study was, therefore, to examine anti solar activity of
more » ... C. speciosus leaves, if any and if so effect of extraction solvents on the activity. Leaves of C. speciosus were collected and identified by the taxonomist. Solvent extractions of the leaves were made separately by using ethanol, chloroform, methanol, acetone, benzene, and ethyl acetate. The extracts were separately exposed for absorption of UV ray to a spectrophotometer using UV region. Result showed that all extracts of C. speciosus leaves had UV absorption property but ethanol extract had maximum activity. Ethanol extract of C. speciosus leaves, therefore, may be further studied for isolation of the active compound responsible for UV absorbing property for its use in preparation of sun screen lotions.
doi:10.36347/sajp.2020.v09i01.002 fatcat:jcwh7v7li5efdem6oskkxonocu

Multi-document abstractive summarization using ILP based multi-sentence compression [article]

Siddhartha Banerjee, Prasenjit Mitra, Kazunari Sugiyama
2016 arXiv   pre-print
Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the
more » ... ences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.
arXiv:1609.07034v1 fatcat:saoq4q7kh5aexkbwywjyvgsjje

Seasonal Effect on UV Absorbing Property of Ageratum conyzoides Linn Leaves

Tanaya Ghosh, Prasenjit Mitra, Prasanta Kumar Mitra
2020 Acta Scientific Pharmaceutical Sciences  
Mitra., et al. isolated anti solar compounds from Murrya koenigii and Costus Speciosus leaves [23, 24] .  ... 
doi:10.31080/asps.2020.04.0504 fatcat:p534jd2s3rgzhpxkg57azh6nje

Protein sequence classification using feature hashing

Cornelia Caragea, Adrian Silvescu, Prasenjit Mitra
2012 Proteome Science  
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques
more » ... be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.
doi:10.1186/1477-5956-10-s1-s14 pmid:22759572 pmcid:PMC3380737 fatcat:hipcrki6qnhdxetl44ztv5iwcu
« Previous Showing results 1 — 15 out of 362 results