A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
"Building a search engine for algorithms" by Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles with Martin Vesely as coordinator
2014
ACM SIGWEB Newsletter
Currently, the metadata of a pseudocode includes its caption, textual summary (generated by the document element summarization algorithm proposed by Bhatia and Mitra [2012] ), and the metadata of the ...
TUAROB, S., MITRA, P.,AND GILES, C. L. 2013. A classification scheme for algorithm citation function in scholarly works. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. ...
doi:10.1145/2559858.2559863
fatcat:ull5x5zdufel3jr6ngoijruxlm
On Summarizing Graph Streams
[article]
2015
arXiv
pre-print
Graph streams, which refer to the graph with edges being updated sequentially in a form of a stream, have wide applications such as cyber security, social networks and transportation networks. This paper studies the problem of summarizing graph streams. Specifically, given a graph stream G, directed or undirected, the objective is to summarize G as S with much smaller (sublinear) space, linear construction time and constant maintenance cost for each edge update, such that S allows many queries
arXiv:1510.02219v1
fatcat:rcszktypbbgl3gdw6n55silmyi
more »
... ver G to be approximately conducted efficiently. Due to the sheer volume and highly dynamic nature of graph streams, summarizing them remains a notoriously hard, if not impossible, problem. The widely used practice of summarizing data streams is to treat each element independently by e.g., hash- or sampling-based method, without keeping track of the connections between elements in a data stream, which gives these summaries limited power in supporting complicated queries over graph streams. This paper discusses a fundamentally different philosophy for summarizing graph streams. We present gLava, a probabilistic graph model that, instead of treating an edge (a stream element) as the operating unit, uses the finer grained node in an element. This will naturally form a new graph sketch where edges capture the connections inside elements, and nodes maintain relationships across elements. We discuss a wide range of supported graph queries and establish theoretical error bounds for basic queries.
Abstractive Meeting Summarization UsingDependency Graph Fusion
[article]
2016
arXiv
pre-print
Automatic summarization techniques on meeting conversations developed so far have been primarily extractive, resulting in poor summaries. To improve this, we propose an approach to generate abstractive summaries by fusing important content from several utterances. Any meeting is generally comprised of several discussion topic segments. For each topic segment within a meeting conversation, we aim to generate a one sentence summary from the most important utterances using an integer linear
arXiv:1609.07035v1
fatcat:navnfivvencbxc7jlsy6jcwuba
more »
... ming-based sentence fusion approach. Experimental results show that our method can generate more informative summaries than the baselines.
YASS
2007
ACM Transactions on Information Systems
Stemmers attempt to reduce a word to its stem or root form and are used widely in information retrieval tasks to increase the recall rate. Most popular stemmers encode a large number of languagespecific rules built over a length of time. Such stemmers with comprehensive rules are available only for a few languages. In the absence of extensive linguistic resources for certain languages, statistical language processing tools have been successfully used to improve the performance of IR systems. In
doi:10.1145/1281485.1281489
fatcat:yxgt3w2tt5f7bi3izrkcqv6txe
more »
... this article, we describe a clustering-based approach to discover equivalence classes of root words and their morphological variants. A set of string distance measures are defined, and the lexicon for a given text collection is clustered using the distance measures to identify these equivalence classes. The proposed approach is compared with Porter's and Lovin's stemmers on the AP and WSJ subcollections of the Tipster dataset using 200 queries. Its performance is comparable to that of Porter's and Lovin's stemmers, both in terms of average precision and the total number of relevant documents retrieved. The proposed stemming algorithm also provides consistent improvements in retrieval performance for French and Bengali, which are currently resource-poor.
Recognition of Implicit Geographic Movement in Text
[article]
2022
arXiv
pre-print
to make better use of this underutilized information source, we created a corpus of statements that describe geographic movement at both a small gold-standard level verified by humans (Pezanowski and Mitra ...
arXiv:2201.12799v1
fatcat:5catnvp5djdunn5gx5ii43322i
Federated Unlearning with Knowledge Distillation
[article]
2022
arXiv
pre-print
Federated Learning (FL) is designed to protect the data privacy of each client during the training process by transmitting only models instead of the original data. However, the trained model may memorize certain information about the training data. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's
arXiv:2201.09441v1
fatcat:aepyuq2qenh5dpxm42fiq2nlwy
more »
... n by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without using any data from the clients. This method does not have any restrictions on the type of neural networks and does not rely on clients' participation, so it is practical and efficient in the FL system. We further introduce backdoor attacks in the training process to help evaluate the unlearning effect. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method.
Effectively Searching Maps in Web Documents
[article]
2009
arXiv
pre-print
Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to
arXiv:0901.3939v1
fatcat:ug6sggl3t5cj3dr7y35kgfxknu
more »
... ll through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.
Learning To Describe Player Form in The MLB
[article]
2021
arXiv
pre-print
Major League Baseball (MLB) has a storied history of using statistics to better understand and discuss the game of baseball, with an entire discipline of statistics dedicated to the craft, known as sabermetrics. At their core, all sabermetrics seek to quantify some aspect of the game, often a specific aspect of a player's skill set - such as a batter's ability to drive in runs (RBI) or a pitcher's ability to keep batters from reaching base (WHIP). While useful, such statistics are fundamentally
arXiv:2109.05280v1
fatcat:cd4atptpcvg2tlhqkcq3unv3fu
more »
... limited by the fact that they are derived from an account of what happened on the field, not how it happened. As a first step towards alleviating this shortcoming, we present a novel, contrastive learning-based framework for describing player form in the MLB. We use form to refer to the way in which a player has impacted the course of play in their recent appearances. Concretely, a player's form is described by a 72-dimensional vector. By comparing clusters of players resulting from our form representations and those resulting from traditional abermetrics, we demonstrate that our form representations contain information about how players impact the course of play, not present in traditional, publicly available statistics. We believe these embeddings could be utilized to predict both in-game and game-level events, such as the result of an at-bat or the winner of a game.
UV Absorbing Property of Ageratum conyzoides Linn Leaves
2020
Acta Scientific Pharmaceutical Sciences
Citation: Prasanta Kumar Mitra., et al. "UV Absorbing Property of Ageratum conyzoides Linn Leaves". Acta Scientific Pharmaceutical Sciences 4.3 (2020): 51-55. ...
doi:10.31080/asps.2020.04.0506
fatcat:voq7fes4ozhblkwp64riyx6qya
Isolation of Anti Solar Compound from Costus Speciosus Leaves
2020
Scholars Academic Journal of Pharmacy
Original Research Article Ultraviolet (UV) radiation is required by humans for synthesis of vitamin -D in body. Vitamin -D is important for formation and maintenance of bones. Vitamin -D is also involved in different metabolic processes. UV radiation is, therefore, good for humans. But, UV radiation has many bad effects too. Eyes and skins are affected. Prolonged exposure of UV radiation may cause skin cancer and develop cataract. Therefore there is continuous search for anti solar compounds
doi:10.36347/sajp.2020.v09i01.005
fatcat:ke45itbjynej5n4d7pnbrwsodm
more »
... m different sources including plants and herbs. Recently we found that leaves of Costus speciosus (C. speciosus), a leafy green herb having many pharmacological properties, can absorb ultraviolet radiation. Aim of the present work was to isolate the anti solar compound from C. speciosus leaves. Leaves of C. speciosus were collected, identified by taxonomist and processed for isolation work by standard methodologies. Solvent extraction and acid hydrolysis were done. These were followed by solvent treatment and chromatographic experiments. A compound was crystallized. UV absorption property of the isolated compound was studied. The compound showed maximum ultraviolet absorption at 200 nm. The compound, therefore, may be used in the preparation of sun screen lotion as anti solar compound.
Summarizing Situational and Topical Information During Crises
[article]
2016
arXiv
pre-print
Banerjee, Mitra, and Sugiyama proposed a graph-based abstractive summarization method on news articles [2] . ...
arXiv:1610.01561v1
fatcat:qrvz63akrnepnec77kr4r3w3la
Anti Solar Activity of Costus Speciosus Leaves of Sikkim Himalayas
2020
Scholars Academic Journal of Pharmacy
Original Research Article Since long Costus speciosus (C. speciosus) has been used in different system of medicine for medical treatment. The plant has several pharmacological properties like anti inflammatory, anti oxidant, anti microbial, anti cancer, gastro protective, anti diabetic, anti gastric ulcer, hepato protective etc. But anti solar activity of C. speciosus leaves of Sikkim Himalaya is not known in literature. Aim of the present study was, therefore, to examine anti solar activity of
doi:10.36347/sajp.2020.v09i01.002
fatcat:jcwh7v7li5efdem6oskkxonocu
more »
... C. speciosus leaves, if any and if so effect of extraction solvents on the activity. Leaves of C. speciosus were collected and identified by the taxonomist. Solvent extractions of the leaves were made separately by using ethanol, chloroform, methanol, acetone, benzene, and ethyl acetate. The extracts were separately exposed for absorption of UV ray to a spectrophotometer using UV region. Result showed that all extracts of C. speciosus leaves had UV absorption property but ethanol extract had maximum activity. Ethanol extract of C. speciosus leaves, therefore, may be further studied for isolation of the active compound responsible for UV absorbing property for its use in preparation of sun screen lotions.
Multi-document abstractive summarization using ILP based multi-sentence compression
[article]
2016
arXiv
pre-print
Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other documents to generate clusters of similar sentences. Second, we generate K-shortest paths from the
arXiv:1609.07034v1
fatcat:saoq4q7kh5aexkbwywjyvgsjje
more »
... ences in each cluster using a word-graph structure. Finally, we select sentences from the set of shortest paths generated from all the clusters employing a novel integer linear programming (ILP) model with the objective of maximizing information content and readability of the final summary. Our ILP model represents the shortest paths as binary variables and considers the length of the path, information score and linguistic quality score in the objective function. Experimental results on the DUC 2004 and 2005 multi-document summarization datasets show that our proposed approach outperforms all the baselines and state-of-the-art extractive summarizers as measured by the ROUGE scores. Our method also outperforms a recent abstractive summarization technique. In manual evaluation, our approach also achieves promising results on informativeness and readability.
Seasonal Effect on UV Absorbing Property of Ageratum conyzoides Linn Leaves
2020
Acta Scientific Pharmaceutical Sciences
Mitra., et al. isolated anti solar compounds from Murrya koenigii and Costus Speciosus leaves [23, 24] . ...
doi:10.31080/asps.2020.04.0504
fatcat:p534jd2s3rgzhpxkg57azh6nje
Protein sequence classification using feature hashing
2012
Proteome Science
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques
doi:10.1186/1477-5956-10-s1-s14
pmid:22759572
pmcid:PMC3380737
fatcat:hipcrki6qnhdxetl44ztv5iwcu
more »
... be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.
« Previous
Showing results 1 — 15 out of 362 results