Filters








1,712 Hits in 3.4 sec

Using Jupyter for reproducible scientific workflows

Marijan Beg, Juliette Belin, Thomas Kluyver, Alexander Konovalov, Min Ragan-Kelley, Nicolas Thiery, Hans Fangohr
2021 Computing in science & engineering (Print)  
This enables high-level control of simulations and computation, interactive exploration of computational results, batch processing on HPC resources, and reproducible workflow documentation in Jupyter notebooks  ...  In light of these case studies, we discuss the benefits of this approach, including progress toward more reproducible and reusable research results and outputs, notably through the use of infrastructure  ...  In this section, we discuss the benefits of using the Jupyter environment for reproducible scientific workflows.  ... 
doi:10.1109/mcse.2021.3052101 fatcat:qdac3unk75fzdophnssiv7lyha

Demo: Extending Jupyter to Support Interactive High Performance Computing

Shreyas Cholia, Matthew Henderson, Oliver Evans
2017 Figshare  
This demonstration will showcase ourwork in integrating the Jupyter platform with HPCresources, including extensions and modificationsthat enable "human in the loop" interactivesupercomupting.  ...  We discuss how combining Jupyter notebooks with scientific workflows has the potential to dramatically increase reproducibility and collaboration among scientific researchers by allowing complex sequences  ...  Jupyter notebooks provide visual communication of complex ideas and computations and can be used in place of or in addition to traditional script and code based scientific analyses and tasks.  ... 
doi:10.6084/m9.figshare.5501137.v1 fatcat:zwgrmdotcvfafpylmzimjorbc4

Towards Interactive, Reproducible Analytics at Scale on HPC Systems

Shreyas Cholia, Lindsey Heagy, Matthew Henderson, Drew Paine, Jon Hays, Ludovico Bianchi, Devarshi Ghoshal, Fernando Perez, Lavanya Ramakrishnan
2020 2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC)  
These workflows need interactive, reproducible analytics at scale. The Jupyter platform provides core capabilities for interactivity but was not designed for HPC systems.  ...  Our core platform addresses three key areas of the scientific analysis workflow -reproducibility, scalability, and interactivity.  ...  We thank Rollin Thomas for helping us with the Jupyter support at NERSC. This work and the resources at NERSC are supported by the U.S.  ... 
doi:10.1109/urgenthpc51945.2020.00011 fatcat:3riyidbo4nghdmtymzn5x5avum

Interactive Supercomputing With Jupyter

Rollin Thomas, Shreyas Cholia, Kathryn Mohror, John M. Shalf
2021 Computing in science & engineering (Print)  
ACKNOWLEDGMENTS This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S.  ...  The authors would like to thank the core Jupyter team, and especially our collaborators at U.C. Berkeley for technical guidance and support around the Jupyter and JupyterHub ecosystem.  ...  Creating a general pattern of reproducible Jupyter HPC workflows with containers similar to Binder 1 is a topic of current work.  ... 
doi:10.1109/mcse.2021.3059037 fatcat:txtv4sjbwvhkpktrjma7j6cy6e

Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles [article]

Sheeba Samuel, Frank Löffler, Birgitta König-Ries
2020 arXiv   pre-print
We present our preliminary results on the role of our tool, ProvBook, in capturing and comparing provenance of ML experiments and their reproducibility using Jupyter Notebooks.  ...  We investigate which factors beyond the availability of source code and datasets influence reproducibility of ML experiments. We propose ways to apply FAIR data practices to ML workflows.  ...  : Exploring Intelligent Systems" for "Digitization -explore the basics, use applications".  ... 
arXiv:2006.12117v1 fatcat:fldgjlz2o5gpfj52stkbhwvene

FAIR Workflows Data publishing of scientific protocols

João Moreira, Tobias Kuhn, Michel Dumontier, Remzi Celebi, Ahmed Hassan, Harald Schmidt, Lars Ridder, Valentina Maccatrozzo, Roel Zinkstok, Carlos Martinez
2021 Zenodo  
and systematic reviews OpenPREDICT use case Machine learning algorithm for cardiovascular drug repositioning IPython/Jupyter notebook Vasilevsky, N.  ...  of workflow systems FAIR technologies: Data Points, Nanopublications, Projectors and Accessors Prototype plug-in for IPython/Jupyter notebooks Validation Case studies and user study with drug repositioning  ... 
doi:10.5281/zenodo.5211221 fatcat:huf3jvhkyvcdxlwf6wbnyxtq7y

Notebook-as-a-VRE (NaaVRE): from private notebooks to a collaborative cloud virtual research environment [article]

Zhiming Zhao, Spiros Koulouzis, Riccardo Bianchi, Siamak Farshidi, Zeshun Shi, Ruyue Xin, Yuandou Wang, Na Li, Yifang Shi, Joris Timmermans, W. Daniel Kissling
2021 arXiv   pre-print
Jupyter can support several popular languages that are used by data scientists, such as Python, R, and Julia.  ...  We demonstrate how such a solution can enhance a legacy workflow that uses Light Detection and Ranging (LiDAR) data from country-wide airborne laser scanning surveys for deriving geospatial data products  ...  ., for managing scientific workflows and sharing their research results.  ... 
arXiv:2111.12785v1 fatcat:r5yku5fiibb7xmbmlh77nxlrry

Embedding containerized workflows inside data science notebooks enhances reproducibility [article]

Jiaming Hu, Ling-Hong Hung, Ka Yee Yeung
2018 bioRxiv   pre-print
Data science notebooks, such as Jupyter, combine text documentation with dynamically editable and executable code and have become popular for sharing computational methods.  ...  We present nbdocker, an extension that integrates Docker software containers into Jupyter notebooks. nbdocker transforms notebooks into autonomous, self-contained, executable and reproducible modules that  ...  Wes Lloyd for helpful discussions in group meetings. We would like  ... 
doi:10.1101/309567 fatcat:d6rg4lyzkrgwnnwybg7jmdg7qy

AGU2018- IN53A-03: Pangeo and Binder: Scalable, shareable and reproducible scientific computing environments for the geosciences (Invited) [article]

Joseph Hamman, Ryan Abernathey, Chris Holdgraf, Yuvi Panda, Matthew Rocklin
2018 Figshare  
Abstract: Cloud computing and containerization offer a new paradigm for scientific research by providing a platform for scalable computing and frameworks that can be used to improve reproducibility.  ...  In this presentation, we will describe how Pangeo, a community driven effort for open-source big-data approaches in the geosciences, is enabling scalable cloud-based workflows using tools such as Kubernetes  ...  Big Data: datasets are growing too rapidly and legacy software tools for scientific analysis can't handle them. This is a major obstacle to scientific progress. 2.  ... 
doi:10.6084/m9.figshare.7492661.v1 fatcat:aqxz3uxmsreotod2pspqbuspse

Scaling Reproducible Research with Jupyter

Carol Willing
2019 Zenodo  
Jupyter Notebooks have taken the scientific and open data world by storm the past five years.  ...  way, and educate others in their scientific discipline and beyond.  ...  Project Jupyter grant funders are also generously thanked for their vision and support.  ... 
doi:10.5281/zenodo.3567220 fatcat:yytsjru52fcv7cdhladwpp3pta

Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study

Bernadette M. Randles, Irene V. Pasquetto, Milena S. Golshan, Christine L. Borgman
2017 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)  
As scientific work becomes more computational and dataintensive, research processes and results become more difficult to interpret and reproduce.  ...  In this poster, we show how the Jupyter notebook, a tool originally designed as a free version of Mathematica notebooks, has evolved to become a robust tool for scientists to share code, associated computation  ...  We are grateful to Fernando Perez for discussions about the origins and goals of Jupyter Notebooks. We also thank Peter T. Darch for commenting on earlier drafts of this paper.  ... 
doi:10.1109/jcdl.2017.7991618 dblp:conf/jcdl/RandlesPGB17 fatcat:myizxudmqfcqzcm3iar5p74pnu

Reproducible Bioconductor Workflows Using Browser-Based Interactive Notebooks And Containers [article]

Reem Almugbel, Ling-Hong Hung, Jiaming Hu, Abeer M. Almutairy, Nicole E. Ortogero, Yashaswi Tamta, Ka Yee Yeung
2017 bioRxiv   pre-print
This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need for installation of any software and ensuring reproducibility.  ...  Materials and methods: We present three different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets and process RNA-seq  ...  Reproducibility of Bioinformatics workflows using Bioconductor Reproducibility is essential for verification and advancement of scientific research.  ... 
doi:10.1101/144816 fatcat:dohinknpmvadpkp4ovzph57oiu

Reproducible Bioconductor workflows using browser-based interactive notebooks and containers

Reem Almugbel, Ling-Hong Hung, Jiaming Hu, Abeer Almutairy, Nicole Ortogero, Yashaswi Tamta, Ka Yee Yeung
2017 JAMIA Journal of the American Medical Informatics Association  
Materials and methods: We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq  ...  The use of software containers to mirror the original software environment ensures reproducibility of results.  ...  Reproducibility of bioinformatics workflows using Bioconductor Reproducibility is essential for verification and advancement of scientific research.  ... 
doi:10.1093/jamia/ocx120 pmid:29092073 fatcat:tyzs65gzbzb3dm4aaxovkwupn4

The Story of an Open Science Experiment [article]

Sheeba Samuel
2021 figshare.com  
Slides presented for the invited speaker talk on "The Story of an Open Science Experiment" in the Max Planck Digital Library (MPDL) Open Science Days 2021 on 20th October 2021.  ...  ➢ Support ➢ Reproducibility ➢ ➢ Extensibility ➢ Ease of use  ...  capture, visualize, represent and difference of ML notebooks ➢ ReproduceMeGit A tool for analyzing the reproducibility of Jupyter Notebooks End-to-end provenance management of scientific experiments  ... 
doi:10.6084/m9.figshare.16837060.v1 fatcat:udlzn6unpjbhdpnmfgby53emkm

Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers

Björn A. Grüning, Eric Rasche, Boris Rebolledo-Jaramillo, Carl Eberhard, Torsten Houwaart, John Chilton, Nate Coraor, Rolf Backofen, James Taylor, Anton Nekrutenko, Francis Ouellette
2017 PLoS Computational Biology  
It aims to fully encompass and simplify the "raw data-to-publication" pathway and make it reproducible.  ...  First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., the list of variable sites).  ...  Acknowledgments We are grateful to the members of Galaxy development team for their help with preparation of this manuscript.  ... 
doi:10.1371/journal.pcbi.1005425 pmid:28542180 pmcid:PMC5444614 fatcat:gzdqap3ymzcfjkov2b6efbtjum
« Previous Showing results 1 — 15 out of 1,712 results