A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
ROOT's C++ Python bindings: PyROOT, new features in ROOT 6.14, forthcoming features, new PyROOT.doi:10.5281/zenodo.1419167 fatcat:ya4fvvkbvbh7bhpbz4ghb7kptq
Slides from the presentation on the latest and/or new PyROOT functionality.doi:10.5281/zenodo.4147311 fatcat:v6rlv5fp7zcmpdvwzzf7uiuvy4
Making Grids Work
This paper presents the Integrated Toolkit, a framework which enables the easy development of Grid-unaware applications. While keeping the Grid transparent to the programmer, the Integrated Toolkit tries to optimize the performance of such applications by exploiting their inherent concurrency when executing them on the Grid. The Integrated Toolkit is designed to follow the Grid Component Model (GCM) and is therefore formed by several components, each one encapsulating a given functionalitydoi:10.1007/978-0-387-78448-9_11 dblp:conf/coregrid/TejedorBKG07 fatcat:hprmocthffaudhutyacglkvtvu
more »... ified in the GRID superscalar runtime. Currently, a first functional prototype of the Integrated Toolkit is under development. On the one hand, we have chosen ProActive as the GCM implementation and, on the other, we have used JavaGAT as a uniform interface to abstract from the underlying Grid middleware when performing job submission and file transfer operations. Thus far, we have tested our prototype with several simple applications, showing that they maintain the same behaviour as if they were executed locally and sequentially.
Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame's programming model is general enough to accommodate multiple implementations or backends: usersdoi:10.5281/zenodo.3599498 fatcat:b72o3y6qjzahvmxfqlol7fnnby
more »... ld write their code once and execute it as is locally or distributedly, just by selecting the corresponding backend. This abstract introduces PyRDF, a new python library developed on top of RDataFrame to seamlessly switch from local to distributed environments in a transparent way for users. Programmers are provided with ergonomic interfaces, integrated with web-based services, which allow to dynamically plug in new resources, as well as to write, execute, monitor and debug distributed applications in the most intuitive way possible.
PyROOT is the name of ROOT's automatic Python bindings, which allow to access all the ROOT functionality implemented in C++ from Python. Thanks to the ROOT type system and the Cling C++ interpreter, PyROOT creates Python proxies for C++ entities on the fly, thus avoiding to generate static bindings beforehand. PyROOT has been enhanced and modernised to meet the demands of the HEP Python community. On the one hand, it has been redesigned on top of the new Cppyy library, in order to benefit fromdoi:10.5281/zenodo.3599090 fatcat:5i64ubiimjfxvkmbk6wxvsfote
more »... he modern C++ features supported by the latter. On the other hand, PyROOT is now interoperable with other tools from the Python data science ecosystem, such as NumPy and pandas, being able to expose ROOT data to those tools and vice-versa. Moreover, PyROOT improved on customizing Python language features for C++ objects to blend in seamlessly in the Python ecosystem.
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further emphasized this issue underlying the need of increasing the implicit parallelism expressed within the ROOT I/O. In this contribution we highlight thearXiv:1804.03326v1 fatcat:bnqyvqlsqva2hkj4eq7kclmwqu
more »... provements of the ROOT I/O subsystem which targeted a satisfactory scaling behaviour in a multithreaded context. The effect of parallelism on the individual steps which are chained by ROOT to read and write data, namely (de)compression, (de)serialisation, access to storage backend, are discussed. Performance measurements are discussed through real life examples coming from CMS production workflows on traditional server platforms and highly parallel architectures such as Intel Xeon Phi.
The continuously increasing size of biological sequence databases has motivated the development of analysis suites that, by means of parallelization, are capable of performing faster searches on such databases. However, many of these tools are not suitable for execution on mid-to-large scale parallel infrastructures such as computational Grids. This paper shows how COMP Superscalar can be used to effectively parallelize on the Grid a sequence analysis program. In particular, we present adoi:10.1016/j.procs.2010.04.296 fatcat:2vudddibevfbrkftpg46wuilpq
more »... ial version of the HMMER hmmpfam tool that, when run with COMP Superscalar, is decomposed into tasks and run on a set of distributed resources, not burdening the programmer with parallelization efforts. Although performance is not a main objective of this work, we also present some test results where COMP Superscalar, using a new pre-scheduling technique, clearly outperforms a well-known parallelization of the hmmpfam algorithm.
SWAN (Service for Web based ANalysis) is a platform to perform interactive data analysis in the cloud. SWAN allows users to write and run their data analyses with only a web browser, leveraging on the widely-adopted Jupyter notebook interface. The user code, executions and data live entirely in the cloud. SWAN makes it easier to produce and share results and scientific code, access scientific software, produce tutorials and demonstrations as well as preserve analyses. Furthermore, it is also adoi:10.1016/j.future.2016.11.035 fatcat:feq7a454gzguxpd3plxzixut4i
more »... owerful tool for non-scientific data analytics. This paper describes how a pilot of the SWAN service was implemented and deployed at CERN. Its backend combines state-of-the-art software technologies with a set of existing IT services such as user authentication, virtual computing infrastructure, mass storage, file synchronisation and sharing, specialised clusters and batch systems. The added value of this combination of services is discussed, with special focus on the opportunities offered by the CERNBox service and its massive storage backend, EOS. In particular, it is described how a cloud-based analysis model benefits from synchronised storage and sharing capabilities.
Widespread distributed processing of big datasets has been around for more than a decade now thanks to Hadoop, but only recently higher-level abstractions have been proposed for programmers to easily operate on those datasets, e.g. Spark. ROOT has joined that trend with its RDataFrame tool for declarative analysis, which currently supports local multi-threaded parallelisation. However, RDataFrame's programming model is general enough to accommodate multiple implementations or backends: usersdoi:10.1051/epjconf/202024503009 fatcat:hifykovhcvdb5bughnyxrmjrhu
more »... ld write their code once and execute it as-is locally or distributedly, just by selecting the corresponding backend. This abstract introduces PyRDF, a new python library developed on top of RDataFrame to seamlessly switch from local to distributed environments with no changes in the application code. In addition, PyRDF has been integrated with a service for web-based analysis, SWAN, where users can dynamically plug in new resources, as well as write, execute, monitor and debug distributed applications via an intuitive interface.
Python is nowadays one of the most widely-used languages for data science. Its rich ecosystem of libraries together with its simplicity and readability are behind its popularity. HEP is also embracing that trend, often using Python as an interface language to access C++ libraries for the sake of performance. PyROOT, the Python bindings of the ROOT software toolkit, plays a key role here, since it allows to automatically and dynamically invoke C++ code from Python without the generation of anydoi:10.1051/epjconf/202024506004 fatcat:3ilnb5vwcbazfco37de3fu4zr4
more »... atic wrappers beforehand. In that sense, this paper presents the efforts to create a new PyROOT with three main qualities: modern, able to exploit the latest C++ features from Python; pythonic, providing Python syntax to use C++ classes; interoperable, able to interact with the most important libraries of the Python data science toolset.
SWAN (Service for Web-based ANalysis) is a CERN service that allows users to perform interactive data analysis in the cloud, in a "software as a service" model. It is built upon the widely-used Jupyter notebooks, allowing users to write - and run - their data analysis using only a web browser. By connecting to SWAN, users have immediate access to storage, software and computing resources that CERN provides and that they need to do their analyses. Besides providing an easier way of producingdoi:10.1051/epjconf/201921407022 fatcat:ocuqyechevhrll45vktwz75t4u
more »... ntific code and results, SWAN is also a great tool to create shareable content. From results that need to be reproducible, to tutorials and demonstrations for outreach and teaching, Jupyter notebooks are the ideal way of distributing this content. In one single file, users can include their code, the results of the calculations and all the relevant textual information. By sharing them, it allows others to visualize, modify, personalize or even re-run all the code. In that sense, this paper describes the efforts made to facilitate sharing in SWAN. Given the importance of collaboration in our scientific community, we have brought the sharing functionality from CERNBox, CERN's cloud storage service, directly inside SWAN. SWAN users have available a new and redesigned interface where theycan share "Projects": a special kind of folder containing notebooks and other files, e.g., like input datasets and images. When a user shares a Project with some other users, the latter can immediately see andwork with the contents of that project from SWAN.
This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Apache Spark is currently getting the most traction from various user communities and new ways to deploy Spark such as Apache Mesos or Spark ondoi:10.1051/epjconf/201921407020 fatcat:nd3s4cqnjzc3babezy5xjsghum
more »... es have started to evolve rapidly. Meanwhile, notebook web applications such as Jupyter offer the ability to perform interactive data analytics and visualizations without the need to install additional software. CERN already provides a web platform, called SWAN (Service for Web-based ANalysis), where users can write and run their analyses in the form of notebooks, seamlessly accessing the data and software they need. The first part of the presentation talks about several recent integrations and optimizations to the Apache Spark computing platform to enable HEP data processing and CERN accelerator logging system analytics. The optimizations and integrations, include, but not limited to, access of kerberized resources, xrootd connector enabling remote access to EOS storage and integration with SWAN for interactive data analysis, thus forming a truly Unified Analytics Platform. The second part of the talk touches upon the evolution of the Apache Spark data analytics platform, particularly sharing the recent work done to run Spark on Kubernetes on the virtualized and container-based infrastructure in Openstack. This deployment model allows for elastic scaling of data analytics workloads enabling efficient, on-demand utilization of resources in private or public clouds.
Designing a job management system for the Grid is a non-trivial task. While a complex middleware can give a lot of features, it often implies sacrificing performance. Such performance loss is especially noticeable for small jobs. A Job Manager's design also affects the capabilities of the monitoring system. We believe that monitoring a job or asking for a job status should be fast and easy, like doing a simple 'ps'. In this paper, we present the job management of XtreemOS -a Linux-baseddoi:10.1109/grid.2010.5697954 dblp:conf/grid/NouGCTFPC10 fatcat:lmri5u4mgjbafomgymhcnt66gu
more »... g system to support Virtual Organizations for Grid. This management is performed inside the Application Execution Manager (AEM). We evaluate its performance using only one job manager plus the built-in monitoring infrastructure. Furthermore, we present a set of real-world applications using AEM and its features. In XtreemOS we avoid reinventing the wheel and use the Linux paradigm as an abstraction.
The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly express parallelism. This document discusses the declarative analysis engine of ROOT, RDataFrame, and gives details about how it allows to profitablydoi:10.1051/epjconf/201921406029 fatcat:i5fdvivpcrc3znyhewddoz3dbu
more »... xploit commodity hardware as well as high-end servers and manycore accelerators thanks to the synergy with the existing parallelised ROOT components. Real-life analyses of LHC experiments' data expressed in terms of RDataFrame are presented, highlighting the programming model provided to express them in a concise and powerful way. The recent developments which make RDataFrame a lightweight data processing framework are described, such as callbacks and I/O capabilities. Finally, the flexibility of RDataFrame and its ability to read data formats other than ROOT's are characterised, as an example it is discussed how RDataFrame can directly read and analyse LHCb's raw data format MDF.
ROOT is high energy physics' software for storing and mining data in a statistically sound way, to publish results with scientific graphics. It is evolving since 25 years, now providing the storage format for more than one exabyte of data; virtually all high energy physics experiments use ROOT. With another significant increase in the amount of data to be handled scheduled to arrive in 2027, ROOT is preparing for a massive upgrade of its core ingredients. As part of a review of crucial softwarearXiv:2205.06121v1 fatcat:f7rk3km77feifmiqz6dia4sq5y
more »... for high energy physics, the ROOT team has documented its R&D plans for the coming years.
« Previous Showing results 1 — 15 out of 114 results