A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Coiled: Dask as a Service
2021
Zenodo
Pangeo Showcase seminar series talk. Coiled is a company that provides Dask as a service. In this talk we provide a short motivation for and introduction to the company and product, then give a live demo.
doi:10.5281/zenodo.4964489
fatcat:kayii7fworctvoeafoe53a4jge
Zarr in Pangeo
[article]
2019
Figshare
This presentation was given in July 2019 at the Earth Science Information Partners (ESIP) Summer Meeting held in Tacoma, Washington.
doi:10.6084/m9.figshare.9701684.v1
fatcat:zk75mlwfg5aq3d2g34ohl6ex6m
On Clustering on Graphs with Multiple Edge Types
[article]
2011
arXiv
pre-print
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics. For instance similarity between two papers can be based on common authors, where they are published, keyword similarity, citations, etc. As such, graphs with multiple edges is a more accurate model to describe similarities between objects. Each edge/metric provides only partial information about the data; recovering full information requires
arXiv:1109.1605v1
fatcat:hhkh7zl4yjdxfguson3zcxl3w4
more »
... aggregation of all the similarity metrics. Clustering becomes much more challenging in this context, since in addition to the difficulties of the traditional clustering problem, we have to deal with a space of clusterings. We generalize the concept of clustering in single-edge graphs to multi-edged graphs and investigate problems such as: Can we find a clustering that remains good, even if we change the relative weights of metrics? How can we describe the space of clusterings efficiently? Can we find unexpected clusterings (a good clustering that is distant from all given clusterings)? If given the ground-truth clustering, can we recover how the weights for edge types were aggregated? %In this paper, we discuss these problems and the underlying algorithmic challenges and propose some solutions. We also present two case studies: one based on papers on Arxiv and one based on CIA World Factbook.
On Clustering on Graphs with Multiple Edge Types
2013
Internet Mathematics
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured by many different metrics. For instance, similarity between two papers can be based on common authors, where they were published, keyword similarity, citations, etc. As such, graphs with multiple edges give a more accurate model to describe similarities between objects than models using single-edge graphs. Each edge/metric provides only partial information about the
doi:10.1080/15427951.2012.678191
fatcat:durcqj6lcjcddnzoi7vzzwwgga
more »
... ta; recovering full information requires aggregation of all the similarity metrics. Clustering becomes much more challenging in this context, since in addition to the difficulties of the traditional clustering problem, we have to deal with a space of clusterings. Reducing the multidimensional space into a single dimension poses significant challenges. At the same time, the multidimensional space can contain latent structures, and searching this multidimensional space can reveal important information about the graph. We generalize the concept of clustering in single-edge graphs to multiedged graphs and investigate problems such as the following: Can we find a clustering that remains good, even if we change the relative weights of metrics? How can we describe the space of clusterings efficiently? Can we find unexpected clusterings (a good clustering that is distant from all given clusterings)? If we are given the ground-truth clustering, can we recover how the weights for edge types were aggregated?
Latent Clustering on Graphs with Multiple Edge Types
[chapter]
2011
Lecture Notes in Computer Science
We study clustering on graphs with multiple edge types. Our main motivation is that similarities between objects can be measured in many different metrics, and so allowing graphs with multivariate edges significantly increases modeling power. In this context the clustering problem becomes more challenging. Each edge/metric provides only partial information about the data; recovering full information requires aggregation of all the similarity metrics. We generalize the concept of clustering in
doi:10.1007/978-3-642-21286-4_4
fatcat:n22twfsjvbewhetqqsbzjiecwa
more »
... ngle-edge graphs to multi-edged graphs and discuss how this generates a space of clusterings. We describe a meta-clustering structure on this space and propose methods to compactly represent the metaclustering structure. Experimental results on real and synthetic data are presented.
Pangeo NSF Earthcube Proposal
2017
Figshare
Rocklin. ...
The creator of Dask, Matt Rocklin of Continuum Analytics, is a PI on this proposal. Dask can be used at either a high level or low level. ...
doi:10.6084/m9.figshare.5361094.v1
fatcat:lgj5vrhhnfa45haoj7kfizfgfi
Computing an Aggregate Edge-Weight Function for Clustering Graphs with Multiple Edge Types
[chapter]
2010
Lecture Notes in Computer Science
We investigate the community detection problem on graphs in the existence of multiple edge types. Our main motivation is that similarity between objects can be defined by many different metrics and aggregation of these metrics into a single one poses several important challenges, such as recovering this aggregation function from ground-truth, investigating the space of different clusterings, etc. In this paper, we address how to find an aggregation function to generate a composite metric that
doi:10.1007/978-3-642-18009-5_4
fatcat:7rjttfqrkraqxckcatfttkc7xm
more »
... st resonates with the ground-truth. We describe two approaches: solving an inverse problem where we try to find parameters that generate a graph whose clustering gives the ground-truth clustering, and choosing parameters to maximize the quality of the ground-truth clustering. We present experimental results on real and synthetic benchmarks.
AGU2018- IN53A-03: Pangeo and Binder: Scalable, shareable and reproducible scientific computing environments for the geosciences (Invited)
[article]
2018
Figshare
Abstract: Cloud computing and containerization offer a new paradigm for scientific research by providing a platform for scalable computing and frameworks that can be used to improve reproducibility. In this presentation, we will describe how Pangeo, a community driven effort for open-source big-data approaches in the geosciences, is enabling scalable cloud-based workflows using tools such as Kubernetes, Jupyter, Dask, and Xarray. We will also demonstrate how the Pangeo approach can be combined
doi:10.6084/m9.figshare.7492661.v1
fatcat:aqxz3uxmsreotod2pspqbuspse
more »
... ith data-proximate deployments of BinderHub, a tool that packages and deploys software onto a cloud-based JupyterHub, to make those scalable workflows easy to share and reproduce.
Computing an Aggregate Edge-Weight Function for Clustering Graphs with Multiple Edge Types
[article]
2011
arXiv
pre-print
We investigate the community detection problem on graphs in the existence of multiple edge types. Our main motivation is that similarity between objects can be defined by many different metrics and aggregation of these metrics into a single one poses several important challenges, such as recovering this aggregation function from ground-truth, investigating the space of different clusterings, etc. In this paper, we address how to find an aggregation function to generate a composite metric that
arXiv:1103.0368v2
fatcat:ijqqrk54fbg43kuqotrrvkkuum
more »
... st resonates with the ground-truth. We describe two approaches: solving an inverse problem where we try to find parameters that generate a graph whose clustering gives the ground-truth clustering, and choosing parameters to maximize the quality of the ground-truth clustering. We present experimental results on real and synthetic benchmarks.
A Computational Framework for Uncertainty Quantification and Stochastic Optimization in Unit Commitment With Wind Power Generation
2011
IEEE Transactions on Power Systems
Matthew Rocklin is a Ph.D. computational mathematics student in the Computer Science Department at the University of Chicago. ...
doi:10.1109/tpwrs.2010.2048133
fatcat:cslwklj55nbt5jjrft7msgmj2u
SymPy: symbolic computing in Python
2017
PeerJ Computer Science
SymPy is an open source computer algebra system written in pure Python. It is built with a focus on extensibility and ease of use, through both interactive and programmatic applications. These characteristics have led SymPy to become a popular symbolic library for the scientific Python ecosystem. This paper presents the architecture of SymPy, a description of its features, and a discussion of select submodules. The supplementary material provide additional examples and further outline details of the architecture and features of SymPy.
doi:10.7717/peerj-cs.103
fatcat:f2mwkmqosrd5lepcej76cwalt4
Large-scale design and refinement of stable proteins using sequence-only models
[article]
2021
bioRxiv
pre-print
(2013) ; Rocklin et al. (2017) ). ...
Library
Name of dataset
Source of
designs
Source of
experimental
data
Library 1
Rocklin
Rocklin et al.,
2017
Rocklin et al.,
2017
Library 2
Eva1
Linsky et al.,
2021
This paper
Eva2 ...
doi:10.1101/2021.03.12.435185
fatcat:uxkfj2fuebb3xohix76b4dxfs4
Computational design of a synthetic PD-1 agonist
2021
Proceedings of the National Academy of Sciences of the United States of America
Programmed cell death protein-1 (PD-1) expressed on activated T cells inhibits T cell function and proliferation to prevent an excessive immune response, and disease can result if this delicate balance is shifted in either direction. Tumor cells often take advantage of this pathway by overexpressing the PD-1 ligand PD-L1 to evade destruction by the immune system. Alternatively, if there is a decrease in function of the PD-1 pathway, unchecked activation of the immune system and autoimmunity can
doi:10.1073/pnas.2102164118
pmid:34272285
pmcid:PMC8307378
fatcat:7yp4oefx2zg2padq47z6h2jdh4
more »
... result. Using a combination of computation and experiment, we designed a hyperstable 40-residue miniprotein, PD-MP1, that specifically binds murine and human PD-1 at the PD-L1 interface with a Kd of ∼100 nM. The apo crystal structure shows that the binder folds as designed with a backbone RMSD of 1.3 Å to the design model. Trimerization of PD-MP1 resulted in a PD-1 agonist that strongly inhibits murine T cell activation. This small, hyperstable PD-1 binding protein was computationally designed with an all-beta interface, and the trimeric agonist could contribute to treatments for autoimmune and inflammatory diseases.
Deformation of Crystals: Connections with Statistical Physics
2017
Annual review of materials research (Print)
We give a bird's-eye view of the plastic deformation of crystals aimed at the statistical physics community, and a broad introduction into the statistical theories of forced rigid systems aimed at the plasticity community. Memory effects in magnets, spin glasses, charge density waves, and dilute colloidal suspensions are discussed in relation to the onset of plastic yielding in crystals. Dislocation avalanches and complex dislocation tangles are discussed via a brief introduction to the
doi:10.1146/annurev-matsci-070115-032036
fatcat:llwsa3phljeedpshd25zxdom4q
more »
... ization group and scaling. Analogies to emergent scale invariance in fracture, jamming, coarsening, and a variety of depinning transitions are explored. Dislocation dynamics in crystals challenges non equilibrium statistical physics. Statistical physics provides both cautionary tales of subtle memory effects in nonequilibrium systems, and systematic tools designed to address complex scale-invariant behavior on multiple length and time scales.
Uncertainty Modeling with SymPy Stats
2012
Proceedings of the 11th Python in Science Conference
unpublished
We add a random variable type to a mathematical modeling language. We demonstrate through examples how this is a highly separable way to introduce uncertainty and produce and query stochastic models. We motivate the use of symbolics and thin compilers in scientific computing.
doi:10.25080/majora-54c7f2c8-009
fatcat:m53ycuaorbb3toihjp7ejq6ffe
« Previous
Showing results 1 — 15 out of 284 results