228 Hits in 2.8 sec

SystemT: A Declarative Information Extraction System

Yunyao Li, Frederick Reiss, Laura Chiticariu
2011 Annual Meeting of the Association for Computational Linguistics  
This paper presents SystemT, a declarative IE system that addresses these challenges and has been deployed in a wide range of enterprise applications.  ...  Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems.  ...  declarative approach to information extraction.  ... 
dblp:conf/acl/LiRC11 fatcat:jv3msn56nnejni33h7q7zo72ci


Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu
2009 SIGMOD record  
By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks.  ...  In this paper, we motivate the SystemT approach to information extraction.  ...  CONCLUSIONS AND FUTURE WORK In this paper, we presented SystemT, a declarative information extraction system that represents a paradigm shift in the way rule-based information extraction systems are built  ... 
doi:10.1145/1519103.1519105 fatcat:2kqncizhqbeg7et3fidoybrssu

The SystemT IDE

Laura Chiticariu, Sriram Raghavan, Frederick R. Reiss, Shivakumar Vaithyanathan, Huaiyu Zhu, Vivian Chu, Sajib Dasgupta, Thilo W. Goetz, Howard Ho, Rajasekar Krishnamurthy, Alexander Lang, Yunyao Li (+1 others)
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rulebased IE system from IBM Research that has been successfully embedded in multiple IBM  ...  Information Extraction (IE) -the problem of extracting structured information from unstructured text -has become the key enabler for many enterprise applications such as semantic search, business analytics  ...  INTRODUCTION Information Extraction (IE) -the problem of extracting structured information from unstructured text -has emerged as a critical building block to many enterprise applications.  ... 
doi:10.1145/1989323.1989479 dblp:conf/sigmod/ChiticariuCDGHKLLLRRVZ11 fatcat:ka2wiqpse5cjxisalzvp57ofbm

SystemT: An Algebraic Approach to Declarative Information Extraction

Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, Shivakumar Vaithyanathan
2010 Annual Meeting of the Association for Computational Linguistics  
SystemT uses a declarative rule language, AQL, and an optimizer that generates high-performance algebraic execution plans for AQL rules.  ...  In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars.  ...  SystemT SystemT is a declarative IE system based on an algebraic framework. In SystemT, developers write rules in a language called AQL.  ... 
dblp:conf/acl/ChiticariuKLRRV10 fatcat:rbd3p25pcjeelfytco376b5u3a

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

David E. Simmen, Frederick Reiss, Yunyao Li, Suresh Thalamati
2009 Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09  
Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub.  ...  Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations.  ...  SystemT employs a novel algebraic approach to information extraction, wherein annotators are expressed in a high-level declarative language called AQL.  ... 
doi:10.1145/1559845.1559999 dblp:conf/sigmod/SimmenRLT09 fatcat:hrd6xbzfufg37fl2hw2s6zpsfu

Towards a Scalable Enterprise Content Analytics Platform

Kevin S. Beyer, Vuk Ercegovac, Rajasekar Krishnamurthy, Sriram Raghavan, Jun Rao, Frederick Reiss, Eugene J. Shekita, David E. Simmen, Sandeep Tata, Shivakumar Vaithyanathan, Huaiyu Zhu
2009 IEEE Data Engineering Bulletin  
Two core components of this platform are SystemT, a high-performance rule-based information extraction engine, and Jaql, a declarative language for expressing transformations over semi-structured data.  ...  In this paper, we present our overall vision of the platform, describe how SystemT and Jaql fit into this vision, and briefly describe some of the other components that are under active development.  ...  JAQL Function Wrapper SystemT Runtime Input Adapter Output Adapter Information Extraction using SystemT SystemT is a system for rule-based information extraction that has been under development at IBM  ... 
dblp:journals/debu/BeyerEKRRRSSTVZ09 fatcat:leunst5opncbzpy7o3sgav4jkq

Giving Text Analytics a Boost

Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner, H. Peter Hofstee, Frederick R. Reiss, Huaiyu Zhu, Eva Sitaridi
2014 IEEE Micro  
IBM's SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data.  ...  We show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude  ...  IBM's SystemT software [5] couples a declarative rule language with a modular runtime based on relational algebra, augmented with special operators for information extraction primitives such as regular  ... 
doi:10.1109/mm.2014.69 fatcat:htqz25kpz5etdktyni7xagcyqy


Torsten Kilias, Alexander Löser, Periklis Andritsos
2013 Proceedings of the sixteenth international workshop on Data warehousing and OLAP - DOLAP '13  
We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries.  ...  As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage  ...  information extraction in a declarative fashion (e.g. by applying SQL queries) to infer rules.  ... 
doi:10.1145/2513190.2513196 dblp:conf/dolap/KiliasLA13 fatcat:xyxifmgeyjecnndwsqay36nse4

Resource-efficient regular expression matching architecture for text analytics

Kubilay Atasu
2014 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors  
SystemT: an algebraic approach to declarative information extraction  distill structured data from unstructured and semi-structured text  exploit the extracted data in your applications For years  ...  A simple SystemT information extraction rule  Find the names (regex) that are at most 20 chars after a title (dict.)  ... 
doi:10.1109/asap.2014.6868623 dblp:conf/asap/Atasu14 fatcat:gucafuwlm5danmnc5ozkhafrga

Enterprise information extraction

Laura Chiticariu, Yunyao Li, Sriram Raghavan, Frederick R. Reiss
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
a System R for information extraction?"  ...  We then survey recent technological advances towards addressing these requirements, broadly categorized as: (1) Languages for specifying extraction programs in a declarative way, thus allowing database-style  ...  [3] challenged the database community to "build a System R for Information Extraction" -that is, to build an information extraction system that meets the practical needs of real-world enterprise applications  ... 
doi:10.1145/1807167.1807339 dblp:conf/sigmod/ChiticariuLRR10 fatcat:6blwd4zdebdabhuzyhuhendfey

Compiling text analytics queries to FPGAs

Raphael Polig, Kubilay Atasu, Heiner Giefers, Laura Chiticariu
2014 2014 24th International Conference on Field Programmable Logic and Applications (FPL)  
Extracting information from unstructured text data is a compute-intensive task. The performance of general-purpose processors cannot keep up with the rapid growth of textual data.  ...  We evaluate the performance, power consumption and hardware utilization of our approach for a set of different queries compiled to a Stratix IV FPGA.  ...  SystemT uses a declarative rule language called Annotation Query Language (AQL) to define the information to be extracted from a text source.  ... 
doi:10.1109/fpl.2014.6927500 dblp:conf/fpl/PoligAGC14 fatcat:scmijbpytvcynaiuhqf7lxsqky

Next generation data analytics at IBM research

Oktie Hassanzadeh, Anastasios Kementsietsidis, Benny Kimelfeld, Rajasekar Krishnamurthy, Fatma Özcan, Ippokratis Pandis
2013 Proceedings of the VLDB Endowment  
One such technology is SystemT [5] that exploits AQL, a declarative rule language for Information Extraction (IE), where an intuitive IE algebra [10] is decoupled from the runtime optimization.  ...  SystemT is also used for backend analytics in an enterprise search system driven by a comprehensive, domain adaptable search architecture developed in IBM Research [1, 9] .  ... 
doi:10.14778/2536222.2536246 fatcat:dvt4wqvbpvajvlw3i25kkmmfc4


Sreeram Balakrishnan, Christine M. Robson, Lei Shi, Ioana R. Stanoi, Edison L. Ting, Shivakumar Vaithyanathan, Huahai Yang, Vivian Chu, Mauricio A. Hernández, Howard Ho, Rajasekar Krishnamurthy, Shi Xia Liu (+3 others)
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
As a first step in this direction, we have built a system that extracts and integrates information from regulatory filings submitted to the U.S.  ...  The primary goal of the Midas project is to build a system that enables easy and scalable integration of unstructured and semi-structured information present across multiple data sources.  ...  We present Midas, a system that unleashes the value of information buried in SEC by extracting, conceptualizing, integrating, and aggregating data from semi-structured or text filings.  ... 
doi:10.1145/1807167.1807315 dblp:conf/sigmod/BalakrishnanCHHKLPPPRSSTVY10 fatcat:vpl4yhbkircmxf66yyk52ctnpi

A System for Extracting Sentiment from Large-Scale Arabic Social Data [article]

Hao Wang, Vijay R. Bommireddipalli, Ayman Hanafy, Mohamed Bahgat, Sara Noeman, Ossama S. Emam
2015 arXiv   pre-print
First, we give an overview of the Big Data system for information extraction from multilingual social data from a variety of sources.  ...  This paper describes an enterprise system we developed for extracting sentiment from large volumes of social data in Arabic dialects.  ...  This work is a part of the effort to add support for Arabic. In SDA, a component called SystemT is used to build extractors to extract all the above-mentioned information.  ... 
arXiv:1511.04661v1 fatcat:exqxj4w5ujbizi36ibtwwm7pt4

UIMA Ruta Workbench: Rule-based Text Annotation

Peter Klügl, Martin Toepfer, Philip-Daniel Beck, Georg Fette, Frank Puppe
2014 International Conference on Computational Linguistics  
UIMA Ruta is a rule-based system designed for information extraction tasks, but it is also applicable for many natural language processing use cases.  ...  This demonstration gives an overview of the UIMA Ruta Workbench, which provides a development environment and tooling for the rule language.  ...  Acknowledgements This work was supported by the Competence Network Heart Failure, funded by the German Federal Ministry of Education and Research (BMBF01 EO1004), and is used for information extraction  ... 
dblp:conf/coling/KluglTBFP14 fatcat:cgt5t5pbpngtbcqijrsckskd5y
« Previous Showing results 1 — 15 out of 228 results