30 Hits in 4.1 sec

Going big: a large-scale study on what big data developers ask

Mehdi Bagherzadeh, Raffi Khatchadourian
2019 Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2019  
To conduct the study, we develop a set of big data tags to extract big data posts from Stackoverflow; use topic modeling to group these posts into big data topics; group similar topics into categories  ...  To help these developers it is necessary to understand big data topics that they are interested in and the difficulty of finding answers for questions in these topics.  ...  [16] propose BigSift for automatic identification of the root cause of an error. BigSift uses delta debugging and data provenance to identify the root cause of an error in a Spark program. Li et al  ... 
doi:10.1145/3338906.3338939 dblp:conf/sigsoft/BagherzadehK19 fatcat:fjo23bl5tncczhrfbylr5rhi4m

RICON: A ML framework for real-time and proactive intervention to prevent customer churn [article]

Arnab Chakraborty, Vikas Raturi, Shrutendra Harsola
2022 arXiv   pre-print
Moreover, we execute an extensive comparative study to justify our modeling choices for RICON.  ...  In this paper we present RICON, a flexible, cost-effective and robust machine learning system to predict customer churn propensities in real-time using clickstream data.  ...  The advantage of this featuriozation is three-fold: (a) this featurization can easily be performed at scale by using PySpark ML CountVectorizer operation, (b) the length of the feature vector is uniform  ... 
arXiv:2203.16155v2 fatcat:6sy45rjytjg57ovtwmzo2kiaia

Driving Behaviour Analysis Using Machine and Deep Learning Methods for Continuous Streams of Vehicular Data

Nikolaos Peppes, Theodoros Alexakis, Evgenia Adamopoulou, Konstantinos Demestichas
2021 Sensors  
In the last few decades, vehicles are equipped with a plethora of sensors which can provide useful measurements and diagnostics for both the vehicle's condition as well as the driver's behaviour.  ...  The reduction of CO2 emissions and the minimization of the environmental footprint is, undeniably, of utmost importance for the protection of the environment.  ...  Deep learning algorithms are used for more complex datasets and architectures such as object, signal and/or image identification.  ... 
doi:10.3390/s21144704 fatcat:qdmjsnpbfnhldndfh6n2oyzumi

ExtremeEarth Meets Satellite Data From Space

Desta Haileselassie Hagos, Theofilos Kakantousis, Vladimir Vlassov, Sina Sheikholeslami, Tianze Wang, Jim Dowling, Claudia Paris, Daniele Marinelli, Giulio Weikmann, Lorenzo Bruzzone, Salman Khaleghian, Thomas Krmer (+14 others)
2021 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
Furthermore, we present the integration of Hopsworks with the Polar and Food Security use cases and the flow of events for the products offered through the TEPs.  ...  These techniques and corresponding software presented in this paper are to be integrated with and used in two ESA TEPs, namely Polar and Food Security TEPs.  ...  His current research interests are on algorithms for automated analysis of SAR images for sea ice applications.  ... 
doi:10.1109/jstars.2021.3107982 fatcat:fxmpayska5bvlj7ibw3peqhuzu

OntoTouTra: Tourist Traceability Ontology Based on Big Data Analytics

Juan Francisco Mendoza-Moreno, Luz Santamaria-Granados, Anabel Fraga Vázquez, Gustavo Ramirez-Gonzalez
2021 Applied Sciences  
For the above, we propose OntoTouTra, an ontology that uses formal specifications to represent knowledge of tourist traceability systems.  ...  A knowledge base provides us with information on the preparation, planning, and implementation or operation stages.  ...  Software Use Function Spark/PySpark data mining PySpark Dataframe for Big Data entities: reviews, hotel services, and scores.  ... 
doi:10.3390/app112211061 fatcat:gzabwc344zfolhoxnoaf7ygxae

Toward the application of artificial intelligence in academic content: An autonomous recommendation system [chapter]

Edwin Montoya-Jaramillo, Universidad EAFIT, Jose Aguilar, Julián Alberto Monsalve-Pulido, Marvin Jiménez-Narváez, Daniela Varela-Tabares, Edwin Montoya-Jaramillo
2020 Education 4.0: A view from different digital proposals  
In this project, an automatic feature engineering methodology was proposed for the audio data, which can automatically extract, analyze, and select the best features for such data (Jimenez et al., 2020  ...  For the audio modality, 6,373 features (interspeech 2013 compare feature set (Schuller et al., 2013) ) were extracted from each audio sample.  ... 
doi:10.17230/9789587207002lch1 fatcat:ajqvxwh6zvfrnm7upzvz3kanhi

Massively Digitized Power Grid: Opportunities and Challenges of Use-inspired AI [article]

Le Xie, Xiangtian Zheng, Yannan Sun, Tong Huang, Tony Bruton
2022 arXiv   pre-print
This article presents a use-inspired perspective of the opportunities and challenges in a massively digitized power grid.  ...  Open challenges and research opportunities for data, computing, and AI algorithms are articulated within the context of the power industry's tremendous decarbonization efforts.  ...  ACKNOWLEDGEMENTS The authors sincerely thank Jimmy Liu, Steven Dennis, and Thomas Wilson for their help on the Oncor use cases presented in this paper.  ... 
arXiv:2205.05180v1 fatcat:ecmq2wqy2nhk7e2zcabwdkhltq

BEATS: Blocks of Eigenvalues Algorithm for Time series Segmentation

Aurora Gonzalez-Vidal, Payam Barnaghi, Antonio F. Skarmeta
2018 IEEE Transactions on Knowledge and Data Engineering  
BEATS is an effective mechanism to work with dynamic and multi-variate data, making it suitable for IoT data sources.  ...  When the split is not provided, which is the case in one of the datasets (the randomly generated by us), we use 75% of the samples for the training set and 25% of the samples for testing.  ...  Pyspark allows us to use the Spark Streaming functionalities that are needed in order to implement BEATS online.  ... 
doi:10.1109/tkde.2018.2817229 fatcat:qt6qe5j5gnhkdg4jn3ie5bxyoy

The CAMH Neuroinformatics Platform: A Hospital-Focused Brain-CODE Implementation

David J Rotenberg, Qing Chang, Natalia Potapova, Andy Wang, Marcia Hon, Marcos Sanches, Nikola Bogetic, Nathan Frias, Tommy Liu, Brendan Behan, Rachad El-Badrawi, Stephen C Strother (+11 others)
2018 Frontiers in Neuroinformatics  
data harmonization and integration for their combined use in research.  ...  Aggregation of high-dimensional datasets across brain disorders can increase sample sizes and may help identify underlying causes of brain dysfunction, however, additional barriers exist for effective  ...  The study was supported by a grant from the Canadian Foundation for Innovation. Funding for the Neuroinformatics Platform provided by the Government of Ontario.  ... 
doi:10.3389/fninf.2018.00077 pmid:30459587 pmcid:PMC6232622 fatcat:ecbxaeuqtjhjlcqylhlkjb5qae

Scalable Analysis of Multi-Modal Biomedical Data [article]

Jaclyn M Smith, Yao Shi, Michael Benedikt, Milos Nikolic
2020 bioRxiv   pre-print
We outline research and clinical applications for the platform, including data integration support for building feature sets for classification.  ...  The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis.  ...  Acknowledgements The authors would like to thank Omics Data Automation, Inc. for supplying hardware, compute time, and contributing to use case discussions.  ... 
doi:10.1101/2020.12.14.422781 fatcat:wscxoume7zeutbhbpxlwo5npgm

Scalable analysis of multi-modal biomedical data

Jaclyn Smith, Yao Shi, Michael Benedikt, Milos Nikolic
2021 GigaScience  
Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification.  ...  The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis.  ...  ., for supplying hardware and compute time and contributing to use case discussions.  ... 
doi:10.1093/gigascience/giab058 pmid:34508579 pmcid:PMC8434767 fatcat:fsrmqqfgcvag5dgmiqbu2t6fsy

Breast Cancer–Detection System Using PCA, Multilayer Perceptron, Transfer Learning, and Support Vector Machine

Huan-Jung Chiu, Tzuu-Hseng S. Li, Ping-Huan Kuo
2020 IEEE Access  
CLASSIFIER USING SUPPORT VECTOR MACHINE The SVM classifier is chosen for the final identification.  ...  The first six times, the sample numbers were 11, and the sample numbers were 12 at other times. For the results, Appendix Fig.  ... 
doi:10.1109/access.2020.3036912 fatcat:7pbw54qhazfybjh5j6n3teqtem

Data Analysis Methods for Software Systems

Jolita Bernatavičienė
2021 Vilnius University Proceedings  
This means that the topics of the conference are actual for business, too.  ...  This makes the conference the main annual meeting point for Lithuanian computer scientists.  ...  Large historical data from automatic identification systems (AIS) are analysed to solve the problem of ship trajectory prediction.  ... 
doi:10.15388/damss.12.2021 fatcat:iefv6bz3drcrfpcwxoaqmu3gra

The Power of Big Data and Data Analytics for AMI Data: A Case Study

Jenniffer Sidney Guerrero-Prado, Wilfredo Alfonso-Morales, Eduardo Caicedo-Bravo, Benjamín Zayas-Pérez, Alfredo Espinosa-Reza
2020 Sensors  
data-driven decisions for operating on the grid.  ...  In this context, the terms big data and data analytics become relevant, which are tools that allow using large volumes of information and the generation of valuable knowledge from raw data that can support  ...  Their method is based on spectral analysis of periodic patterns, using features in the frequency domain. They highlight the model's ability to perform online analysis [51] .  ... 
doi:10.3390/s20113289 pmid:32526976 fatcat:qbjl5u6cabcshk4z67ndbordtm

Understanding And Mapping Big Data In Transport Sector

Kim Hee, Naveed Mushtaq, Hevin Özmen, Marten Rosselli, Roberto V. Zicari, Minsung Hong, Rajendra Akerkar, Sophie Roizard, Rémy Russotto, Tharsis Teoh
2018 Zenodo  
It also indicates that the combination of different means and approaches will enhance the opportunities for successful big data services in the transport sector.  ...  Chapter 3 identifies several opportunities and challenges of big data in transportation, by using: several subject matter expert interviews, nineteen applied cases, and a literature review.  ...  Identification System ANPR Automatic Number Plate Recognition APC Automatic Passenger Counting API Application Programming Interface ARPS Average Revenue Per Session AVL Automatic Vehicle Location  ... 
doi:10.5281/zenodo.1465516 fatcat:tqw6cz3uabd75pyuc5wbpxer6u
« Previous Showing results 1 — 15 out of 30 results