15,819 Hits in 7.6 sec

Challenges in Materials Discovery – Synthetic Generator and Real Datasets

Ronan Le Bras, Richard Bernstein, John Gregoire, Santosh Suram, Carla Gomes, Bart Selman, R. Bruce Van Dover
In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation.  ...  The bottleneck of this discovery cycle lies, however, in the analysis of the materials data.  ...  Acknowledgments The authors would like to thank the reviewers for their positive feedback and constructive comments.  ... 
doi:10.1609/aaai.v28i1.8770 fatcat:gcht4dkyubc3fhit4vbbj4cmpe

Crystallography companion agent for high-throughput materials discovery [article]

Phillip M. Maffettone, Lars Banko, Peng Cui, Yury Lysogorskiy, Marc A. Little, Daniel Olds, Alfred Ludwig, Andrew I. Cooper
2021 arXiv   pre-print
It was demonstrated on a diverse set of organic and inorganic materials characterization challenges.  ...  The discovery of new structural and functional materials is driven by phase identification, often using X-ray diffraction (XRD).  ...  Accelerating materials discovery with total scattering via machine learning, the Leverhulme Trust via the Leverhulme Research Centre for Functional Materials Design, and German Research Foundation (DFG  ... 
arXiv:2008.00283v2 fatcat:3x76navyrzga7hosstyhsehcqe

Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [article]

Xi Shen, Alexei A. Efros, Armand Joulin, Mathieu Aubry
2022 arXiv   pre-print
We present a simple yet surprisingly effective approach to overcome this difficulty: we generate synthetic training pairs by selecting segments in an image and copy-pasting them into another image.  ...  We find that it is crucial to predict the correspondences as an auxiliary task and to use Poisson blending and style transfer on the training pairs to generalize on real data.  ...  Acknowledgement This work was supported in part by ANR project EnHerit ANR-17-CE23-0008, project Rapid Tabasco, and IDRIS under the allocation AD011011160R1 made by GENCI.  ... 
arXiv:2110.15904v2 fatcat:yvxk6kh7sbhslm2ubmhb57deym

The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record

Scott McLachlan, Kudakwashe Dube, Thomas Gallagher, Bridget Daley, Jason Walonoski
2018 Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies  
The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and  ...  framework for characterizing, achieving and validating realism in Synthetic Data Generation (SDG).  ...  For Danika, Thomas, Liam and James.  ... 
doi:10.5220/0006677602200230 dblp:conf/biostec/McLachlanDGDW18 fatcat:fqioptababfxtng2oxvpw23puu

UnrealStereo: Controlling Hazardous Factors to Analyze Stereo Vision [article]

Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille
2018 arXiv   pre-print
The observations from synthetic images are further validated by annotating hazardous regions in real-world datasets Middlebury and KITTI (which gives a sparse sampling of the hazards).  ...  We generate a large synthetic image dataset with automatically computed hazardous regions and analyze algorithms on these regions.  ...  The discovery from synthetic images can be validated using real images, and this validation only requires a small amount of test images (hence avoiding the need for excessive annotation of real images)  ... 
arXiv:1612.04647v2 fatcat:t6memxoxv5bzloueek2ja3wszm

X-ray Scattering Image Classification Using Deep Learning [article]

Boyu Wang, Kevin Yager, Dantong Yu, Minh Hoai
2016 arXiv   pre-print
Experiments show that deep learning methods outperform previously published methods by 10\% on synthetic and real datasets.  ...  To acquire enough training data for deep learning, we use simulation software to generate synthetic x-ray scattering images.  ...  X-ray Materials Discovery Dataset (XMD) X-ray Materials Discovery Dataset (XMD) [6] contains 2832 x-ray scattering images collected from thirteen x-ray scattering measurement runs.  ... 
arXiv:1611.03313v1 fatcat:jufy6be3jbct7cg6pny7lpujhe

X-Ray Scattering Image Classification Using Deep Learning

Boyu Wang, Kevin Yager, Dantong Yu, Minh Hoai
2017 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)  
Experiments show that deep learning methods outperform previously published methods by 10% on synthetic and real datasets.  ...  To acquire enough training data for deep learning, we use simulation software to generate synthetic x-ray scattering images.  ...  X-ray Materials Discovery Dataset (XMD) X-ray Materials Discovery Dataset (XMD) [6] contains 2832 x-ray scattering images collected from thirteen x-ray scattering measurement runs.  ... 
doi:10.1109/wacv.2017.83 dblp:conf/wacv/WangYYH17 fatcat:c53hczftebaj7po7xxdpsrauue

Data augmentation in microscopic images for material data mining

Boyuan Ma, Xiaoyan Wei, Chuni Liu, Xiaojuan Ban, Haiyou Huang, Hao Wang, Weihua Xue, Stephen Wu, Mingfei Gao, Qing Shen, Michele Mukeshimana, Adnan Omer Abuassba (+2 others)
2020 npj Computational Materials  
Recent progress in material data mining has been driven by high-capacity models trained on large datasets.  ...  This strategy realizes the fusion of real and simulated data and the augmentation of training data in a data mining procedure.  ...  materials discoveries [2] [3] [4] [5] [6] [7] .  ... 
doi:10.1038/s41524-020-00392-6 fatcat:r7a6erirnvgbffbchmjd5cteuy

Causal Datasheet for Datasets: An Evaluation Guide for Real-World Data Analysis and Data Collection Design Using Bayesian Networks

Bradley Butcher, Vincent S. Huang, Christopher Robinson, Jeremy Reffin, Sema K. Sgaier, Grace Charles, Novi Quadrianto
2021 Frontiers in Artificial Intelligence  
To generate results for such a Causal Datasheet, a tool was developed which can generate synthetic Bayesian networks and their associated synthetic datasets to mimic real-world datasets.  ...  However, BNs have not been widely adopted by global health professionals, and in real-world applications, confidence in the results of BNs generally remains inadequate.  ...  AUTHOR CONTRIBUTIONS All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.  ... 
doi:10.3389/frai.2021.612551 fatcat:3vevt756dreqthld4cw4i6umpi

MetaBoot: a machine learning framework of taxonomical biomarker discovery for different microbial communities based on metagenomic data

Xiaojun Wang, Xiaoquan Su, Xinping Cui, Kang Ning
2015 PeerJ  
With the fast accumulation of metagenomic samples and the advance of next-generation sequencing techniques, it is now possible to qualitatively and quantitatively assess all taxa (features) in a microbial  ...  MetaBoot has been tested and compared with other methods on well-designed simulated datasets considering normal and gamma distribution as well as publicly available metagenomic datasets.  ...  The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.  ... 
doi:10.7717/peerj.993 pmid:26213658 pmcid:PMC4512773 fatcat:cgxfzhxumzcj3lrmg5qiylzc34

Predicting 3D shapes, masks, and properties of materials inside transparent containers, using the TransProteus CGI dataset

Sagi Eppel, Haoping Xu, Yi Ru Wang, Alan Aspuru-Guzik
2022 Digital Discovery  
We present TransProteus, a dataset, and methods for predicting the 3D structure and properties of materials inside transparent vessels from a single image.  ...  Manipulating materials in containers is essential in...  ...  However, the main challenge, in this case, is creating images that are realistic and general enough to capture the complexity of the real world.  ... 
doi:10.1039/d1dd00014d fatcat:gkjedljekvhblbi2o5x6tpq44u

Can synthetic data be a proxy for real clinical trial data? A validation study

Zahra Azizi, Chaoyi Zheng, Lucy Mosquera, Louise Pilote, Khaled El Emam
2021 BMJ Open  
similar between the real and synthetic datasets.  ...  were 1543 patients in the control arm that were included in our analysis.Primary and secondary outcome measuresAnalyses from a study published on the real dataset were replicated on synthetic data to  ...  Acknowledgements The work in this paper was performed in collaboration with the GOING FWD consortium.  ... 
doi:10.1136/bmjopen-2020-043497 pmid:33863713 fatcat:gpu7ob2vsrekzkmtge6dlnl3ta

Do learned representations respect causal relationships? [article]

Lan Wang, Vishnu Naresh Boddeti
2022 arXiv   pre-print
It is trained purely on synthetically generated representations and can be applied to real representations, and is specifically designed to mitigate the domain gap between the two.  ...  We answer this question in three steps. First, we introduce NCINet, an approach for observational causal discovery from high-dimensional data.  ...  Department of Commerce, National Institute of Standards and Technology.  ... 
arXiv:2204.00762v2 fatcat:ozjigkai2nchlgmp44o7hbdgnq

Synthetic data in machine learning for medicine and healthcare

Richard J. Chen, Ming Y. Lu, Tiffany Y. Chen, Drew F. K. Williamson, Faisal Mahmood
2021 Nature Biomedical Engineering  
Acknowledgements This work was supported in part by internal funds from BWH Pathology, a Google Cloud Research Grant, the Nvidia GPU Grant Program and NIGMS R35GM138216 (F.M.).  ...  The content is solely the responsibility of the authors and does not reflect the official views of the National Science Foundation or the National Institutes of Health.  ...  In grounding synthetic data with biological priors, the generation of synthetic data can also be used as a tool for scientific discovery.  ... 
doi:10.1038/s41551-021-00751-8 pmid:34131324 fatcat:6m22rgym5rc4fjn6cd2tvsowbe

Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models

Suranga N Kasthurirathne, Gregory Dexter, Shaun J Grannis
2021 AMIA Annual Symposium Proceedings  
There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets.  ...  Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics.  ...  real and synthetic datasets.  ... 
pmid:34457148 pmcid:PMC8378601 fatcat:jz4rs4zwujhonkk5sfc2uv4ika
« Previous Showing results 1 — 15 out of 15,819 results