271 Hits in 1.8 sec

Active data

Anthony Simonet, Gilles Fedak, Matei Ripeanu, Samer Al-Kiswany
2013 Proceedings of the 8th Parallel Data Storage Workshop on - PDSW '13  
Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different
more » ... ment approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.
doi:10.1145/2538542.2538566 dblp:conf/sc/SimonetFRA13 fatcat:2rdqaw2vgbfnpcuyqskypr7idq

A Blockchain-based, Semantically-enriched Software Framework for Trustworthy Decentralized Applications [article]

Thanasis G. Papaioannou, Petar Kochovski, Klevis Shkembi, Caroline Barelle, Anthony Simonet-Boulogne, Marco Ciaramella, Alberto Ciaramella, Vlado Stankovski
2022 Zenodo  
Multiple threats have been identified when citizens interact with online services such as unknown provenance of information, unknown quality of service providers, spread of fake news, fraud, personal privacy violation, centralisation of power to name a few. Blockchain has been considered as key technology to address many of these challenges; however, in reality, building trustworthy decentralized applications (Dapps) is not straightforward as much blockchain-based functionality needs to be
more » ... oped from scratch and combined with data semantics. In this paper, we propose a new software framework, namely ONTOCHAIN, that leverages semantic web and blockchain technology to build, as distinct value for the Next Generation Internet, fundamental support for trustworthy data/services exchange and trustworthy content handling. It comprises a novel protocol suite grouped into high-level application protocols, such as data provenance, reputation models, decentralised oracles, market mechanisms, ontology representation and management, privacy aware and secure data exchange, multi-source identity verification, value sharing and incentives, and lower-level core protocols that include authorisation, certification, privacy-aware data processing, cross-chain gateways, identity management, secure data exchange, and data semantics in smart contracts. We demonstrate that these protocols are already available and combined to implement interesting NGI Dapps.
doi:10.5281/zenodo.6811328 fatcat:vxsm76solnfxvat27u46dtfcd4

Exploring Trade-offs in Dynamic Task Triggering for Loosely Coupled Scientific Workflows [article]

Zhe Wang, Pradeep Subedi, Shaohua Duan, Yubo Qin, Philip Davis, Anthony Simonet, Ivan Rodero, Manish Parashar
2020 arXiv   pre-print
In order to achieve near-time insights, scientific workflows tend to be organized in a flexible and dynamic way. Data-driven triggering of tasks has been explored as a way to support workflows that evolve based on the data. However, the overhead introduced by such dynamic triggering of tasks is an under-studied topic. This paper discusses different facets of dynamic task triggers. Particularly, we explore different ways of constructing a data-driven dynamic workflow and then evaluate the
more » ... ds introduced by such design decisions. We evaluate workflows with varying data size, percentage of interesting data, temporal data distribution, and number of tasks triggered. Finally, we provide advice based upon analysis of the evaluation results for users looking to construct data-driven scientific workflows.
arXiv:2004.10381v1 fatcat:7jf3lcczpbbnbcpqigggj5ibsq

Revising OpenStack to Operate Fog/Edge Computing Infrastructures

Adrien Lebre, Jonathan Pastor, Anthony Simonet, Frederic Desprez
2017 2017 IEEE International Conference on Cloud Engineering (IC2E)  
Academic and industry experts are now advocating for going from large-centralized Cloud Computing infrastructures to smaller ones massively distributed at the edge of the network. Among the obstacles to the adoption of this model is the development of a convenient and powerful IaaS system capable of managing a significant number of remote data-centers in a unified way. In this paper, we introduce the premises of such a system by revising the OpenStack software, a leading IaaS manager in the
more » ... stry. The novelty of our solution is to operate such an Internet-scale IaaS platform in a fully decentralized manner, using P2P mechanisms to achieve high flexibility and avoid single points of failure. More precisely, we describe how we revised the OpenStack Nova service by leveraging a distributed key/value store instead of the centralized SQL backend. We present experiments that validate the correct behavior and gives performance trends of our prototype through an emulation of several data-centers using Grid'5000 testbed. In addition to paving the way to the first large-scale and Internet-wide IaaS manager, we expect this work will attract a community of specialists from both distributed system and network areas to address the Fog/Edge Computing challenges within the OpenStack ecosystem.
doi:10.1109/ic2e.2017.35 dblp:conf/ic2e/LebrePSD17 fatcat:vnswwl5yh5gbzpl47a2a5bxwpe

Deploying Distributed Cloud Infrastructures: Who and at What Cost?

Anthony Simonet, Adrien Lebre, Anne-Cecile Orgerie
2016 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW)  
Academics and industry experts are now advocating for going from large-centralized Cloud Computing (CC) infrastructures to smaller ones massively distributed at the edge of the network. Referred to as "fog/edge/local computing", such a dawning paradigm is attracting growing interest as it improves the whole services agility in addition to bringing computing resources closer to end-users. While several initiatives investigate how such Distributed Cloud Computing (DCC) infrastructures can be
more » ... ted, the economical viability of such solutions is still questionable, especially if the objective is to propose attractive prices in comparison to those proposed by giant actors such as Amazon, Microsoft and Google. In this article, we go beyond the state of the art of the current cost model of DCC infrastructures. First, we provide a classification of the different ways of deploying DCC platforms. Then, we propose a versatile cost model that can help new actors evaluate the viability of deploying a DCC solution. We illustrate the relevance of our proposal by instantiating it over three use-cases and compare them according to similar computation capabilities provided by the AWS solution. Such a study clearly shows that deploying a DCC infrastructure makes sense for telecom operators as well as new actors willing to enter the game.
doi:10.1109/ic2ew.2016.48 dblp:conf/ic2e/SimonetLO16 fatcat:nlj7nmkff5d5tg5677nmuudhb4

Active Data: A programming model to manage data life cycle across heterogeneous systems and infrastructures

Anthony Simonet, Gilles Fedak, Matei Ripeanu
2015 Future generations computer systems  
h i g h l i g h t s • We present a formal model to represent the life cycle of data distributed and replicated on many systems. • We leverage this model to propose a programming model that allows users to react to life cycle progression. • We illustrate the approach with examples of applications that we programmed with this model. a b s t r a c t The Big Data challenge consists in managing, storing, analyzing and visualizing these huge and ever growing data sets to extract sense and knowledge.
more » ... s the volume of data grows exponentially, the management of these data becomes more complex in proportion. A key point is to handle the complexity of the data life cycle, i.e. the various operations performed on data: transfer, archiving, replication, deletion, etc. Indeed, data-intensive applications span over a large variety of devices and e-infrastructures which implies that many systems are involved in data management and processing. We propose Active Data, a programming model to automate and improve the expressiveness of data management applications. We first define the concept of data life cycle and introduce a formal model that allows to expose data life cycle across heterogeneous systems and infrastructures. The Active Data programming model allows code execution at each stage of the data life cycle: routines provided by programmers are executed when a set of events (creation, replication, transfer, deletion) happen to any data. We implement and evaluate the model with four use cases: a storage cache to Amazon-S3, a cooperative sensor network, an incremental implementation of the MapReduce programming model and automated data provenance tracking across heterogeneous systems. Altogether, these scenarios illustrate the adequateness of the model to program applications that manage distributed and dynamic data sets. We also show that applications that do not leverage on data life cycle can still benefit from Active Data to improve their performances.
doi:10.1016/j.future.2015.05.015 fatcat:lvgmxiialbh4zb3pgk6vt7xziy

Using Active Data to Provide Smart Data Surveillance to E-Science Users

Anthony Simonet, Kyle Chard, Gilles Fedak, Ian Foster
2015 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing  
Modern scientific experiments often involve multiple storage and computing platforms, software tools, and analysis scripts. The resulting heterogeneous environments make data management operations challenging; the significant number of events and the absence of data integration makes it difficult to track data provenance, manage sophisticated analysis processes, and recover from unexpected situations. Current approaches often require costly human intervention and are inherently error prone. The
more » ... difficulties inherent in managing and manipulating such large and highly distributed datasets also limits automated sharing and collaboration. We study a real world e-Science application involving terabytes of data, using three different analysis and storage platforms, and a number of applications and analysis processes. We demonstrate that using a specialized data life cycle and programming model-Active Data-we can easily implement global progress monitoring, and sharing; recover from unexpected events; and automate a range of tasks.
doi:10.1109/pdp.2015.76 dblp:conf/pdp/SimonetCFF15 fatcat:7rkvxivepbhobdrqlovsgacwna

Toward a Holistic Framework for Conducting Scientific Evaluations of OpenStack

Ronan-Alexandre Cherrueau, Dimitri Pertin, Anthony Simonet, Adrien Lebre, Matthieu Simonin
2017 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)  
By massively adopting OpenStack for operating small to large private and public clouds, the industry has made it one of the largest running software project, overgrowing the Linux kernel. However, with success comes increased complexity; facing technical and scientific challenges, developers are in great difficulty when testing the impact of individual changes on the performance of such a large codebase, which will likely slow down the evolution of OpenStack. Thus, we claim it is now time for
more » ... e scientific community to join the effort and get involved in the development of OpenStack, like it has been once done for Linux. In this spirit, we developed Enos, an integrated framework that relies on container technologies for deploying and evaluating OpenStack on any testbed. Enos allows researchers to easily express different configurations, enabling fine-grained investigations of OpenStack services. Enos collects performance metrics at runtime and stores them for post-mortem analysis and sharing. The relevance of the Enos approach to reproducible research is illustrated by evaluating different OpenStack scenarios on the Grid'5000 testbed.
doi:10.1109/ccgrid.2017.87 dblp:conf/ccgrid/CherrueauPSLS17 fatcat:kel6xjez6fg5bpx3bna5exxlpy

Energy-Aware Massively Distributed Cloud Facilities: The DISCOVERY Initiative

Frederic Desprez, Shadi Ibrahim, Adrien Lebre, Anne-Cecile Orgerie, Jonathan Pastor, Anthony Simonet
2015 2015 IEEE International Conference on Data Science and Data Intensive Systems  
doi:10.1109/dsdis.2015.58 dblp:conf/dsdis/DesprezILOPS15 fatcat:kpc2x4zl7zdxrga4bnb3tzxmxm

Reconstituted IMPDH polymers accommodate both catalytically active and inactive conformations

Sajitha A. Anthony, Anika L. Burrell, Matthew C. Johnson, Krisna C. Duong-Ly, Yin-Ming Kuo, Jacqueline C. Simonet, Peter Michener, Andrew Andrews, Justin M. Kollman, Jeffrey R. Peterson, Diane Barber
2017 Molecular Biology of the Cell  
Several metabolic enzymes undergo reversible polymerization into macromolecular assemblies. The function of these assemblies is often unclear, but in some cases they regulate enzyme activity and metabolic homeostasis. The guanine nucleotide biosynthetic enzyme inosine monophosphate dehydrogenase (IMPDH) forms octamers that polymerize into helical chains. In mammalian cells, IMPDH filaments can associate into micron-length assemblies. Polymerization and enzyme activity are regulated in part by
more » ... nding of purine nucleotides to an allosteric regulatory domain. ATP promotes octamer polymerization, whereas guanosine triphosphate (GTP) promotes a compact, inactive conformation whose ability to polymerize is unknown. Also unclear is whether polymerization directly alters IMPDH catalytic activity. To address this, we identified point mutants of human IMPDH2 that either prevent or promote polymerization. Unexpectedly, we found that polymerized and nonassembled forms of recombinant IMPDH have comparable catalytic activity, substrate affinity, and GTP sensitivity and validated this finding in cells. Electron microscopy revealed that substrates and allosteric nucleotides shift the equilibrium between active and inactive conformations in both the octamer and the filament. Unlike other metabolic filaments, which selectively stabilize active or inactive conformations, recombinant IMPDH filaments accommodate multiple states. These conformational states are finely tuned by substrate availability and purine balance, while polymerization may allow cooperative transitions between states. Monitoring Editor
doi:10.1091/mbc.e17-04-0263 pmid:28794265 pmcid:PMC5620369 fatcat:rkum2tmvnfc5bmy43nlyhf36iq

D3-MapReduce: Towards MapReduce for Distributed and Dynamic Data Sets

Haiwu He, Anthony Simonet, Julio Anjos Jose-Francisco Saray, Gilles Fedak, Bing Tang, Lu Lu, Xuanhua Shi, Hai Jin, Mircea Moca, Gheorghe Cosmin Silaghi, Asma Ben Cheikh, Heithem Abbes
2015 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity)  
Since its introduction in 2004 by Google, MapReduce has become the programming model of choice for processing large data sets. Although MapReduce was originally developed for use by web enterprises in large data-centers, this technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis (including geographic, high energy physics, genomics, etc.). So far MapReduce has been mostly designed for batch processing of bulk data. The
more » ... of D 3 -MapReduce is to extend the MapReduce programming model and propose efficient implementation of this model to: i) cope with distributed data sets, i.e. that span over multiple distributed infrastructures or stored on network of loosely connected devices; ii) cope with dynamic data sets, i.e. which dynamically change over time or can be either incomplete or partially available. In this paper, we draw the path towards this ambitious goal. Our approach leverages Data Life Cycle as a key concept to provide MapReduce for distributed and dynamic data sets on heterogeneous and distributed infrastructures. We first report on our attempts at implementing the MapReduce programming model for Hybrid Distributed Computing Infrastructures (Hybrid DCIs). We present the architecture of the prototype based on BitDew, a middleware for large scale data management, and Active Data, a programming model for data life cycle management. Second, we outline the challenges in term of methodology and present our approaches based on simulation and emulation on the Grid'5000 experimental testbed. We conduct performance evaluations and compare our prototype with Hadoop, the industry reference MapReduce implementation. We present our work in progress on dynamic data sets that has lead us to implement an incremental MapReduce framework. Finally, we discuss our achievements and outline the challenges that remain to be addressed before obtaining a complete D 3 -MapReduce environment.
doi:10.1109/smartcity.2015.141 dblp:conf/smartcity/HeSSFTLSJMSCA15 fatcat:er5vyaxk7vgqpmqqiofqovajzq

Muscle histone deacetylase 4 upregulation in amyotrophic lateral sclerosis: potential role in reinnervation ability and disease progression

Gaëlle Bruneteau, Thomas Simonet, Stéphanie Bauché, Nathalie Mandjee, Edoardo Malfatti, Emmanuelle Girard, Marie-Laure Tanguy, Anthony Behin, Frédéric Khiami, Elhadi Sariali, Caroline Hell-Remy, François Salachas (+9 others)
2013 Brain  
doi:10.1093/brain/awt164 pmid:23824486 fatcat:pujo7u4cube4looe5tsiqn277e

A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning

Kevin Fauvel, Daniel Balouek-Thomert, Diego Melgar, Pedro Silva, Anthony Simonet, Gabriel Antoniu, Alexandru Costan, Véronique Masson, Manish Parashar, Ivan Rodero, Alexandre Termier
Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify
more » ... edium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems.In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.
doi:10.1609/aaai.v34i01.5376 fatcat:rlrhesacb5gbjff3jg6thqfuny

Scalable data management for map-reduce-based data-intensive applications: a view for cloud and hybrid infrastructures

Gabriel Antoniu, Alexandru Costan, Julien Bigot, Frédéric Desprez, Gilles Fedak, Sylvain Gault, Christian Pérez, Anthony Simonet, Bing Tang, Christophe Blanchet, Raphael Terreux, Luc Bougé (+5 others)
2013 International Journal of Cloud Computing  
As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under a Corresponding author G. Antoniu et al. massive data access concurrency, scheduling, volatility and faulttolerance. We place our discussion in the perspective of the current
more » ... n towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with reallife bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.
doi:10.1504/ijcc.2013.055265 fatcat:zgckseqpkzc5ne2x3a2yub7i34

Page 647 of Florida Bar Journal Vol. 38, Issue 8 [page]

1964 Florida Bar Journal  
FR 7-4627 SIMONET, Jose, Immigration & Natz. Service, 4706 Hollyridge, Son Antonio, Texas . GE 3-3396 SIMONET, William Floyd, 401 E. Robinson St., Orlando . 425-4631 SIMONHOFF, Harry, 5925 N.  ...  JE 8-1461 SICKING, Richard Anthony, 704 Ainsley Bldg., Miami . 377-1505 SICKLES, Blaine T., 2749 Wellesley Dr., Columbus, Ohio . 486-5847 SIDWELL, Benjamin C., 25 Western Union Bldg., Tampa . 229-8018  ... 
« Previous Showing results 1 — 15 out of 271 results