Filters








36 Hits in 0.85 sec

FEDEX: An Explainability Framework for Data Exploration Steps [article]

Daniel Deutch, Amir Gilad, Tova Milo, Amit Mualem, Amit Somech
2022 arXiv   pre-print
When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this laborious process. In a nutshell, our solution pinpoints the most interesting (sets of) rows in each obtained dataframe. Uniquely, our definition of interest is based on the contribution of each row to the interestingness of different columns of the entire
more » ... rame, which, in turn, is defined using standard measures such as diversity and exceptionality. Intuitively, interesting rows are ones that explain why (some column of) the analysis query result is interesting as a whole. Rows are correlated in their contribution and so the interesting score for a set of rows may not be directly computed based on that of individual rows. We address the resulting computational challenge by restricting attention to semantically-related sets, based on multiple notions of semantic relatedness; these sets serve as more informative explanations. Our experimental study across multiple real-world datasets shows the usefulness of our system in various scenarios.
arXiv:2209.06260v1 fatcat:xm3ko34zfjdgzllha7vyvpoowe

SubStrat: A Subset-Based Strategy for Faster AutoML [article]

Teddy Lazebnik, Amit Somech, Abraham Itzhak Weinberg
2022 arXiv   pre-print
Automated machine learning (AutoML) frameworks have become important tools in the data scientists' arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual
more » ... uration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data subset which preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on two popular AutoML frameworks, Auto-Sklearn and TPOT, show that SubStrat reduces their running times by 79% (on average), with less than 2% average loss in the accuracy of the resulted ML pipeline.
arXiv:2206.03070v1 fatcat:p62tcvxbj5g4nmix5fm3simnfu

Selecting Sub-tables for Data Exploration [article]

Kathy Razmadze, Yael Amsterdamer, Amit Somech, Susan B. Davidson, Tova Milo
2022 arXiv   pre-print
We present a framework for creating small, informative sub-tables of large data tables to facilitate the first step of data science: data exploration. Given a large data table table T, the goal is to create a sub-table of small, fixed dimensions, by selecting a subset of T's rows and projecting them over a subset of T's columns. The question is: which rows and columns should be selected to yield an informative sub-table? We formalize the notion of "informativeness" based on two complementary
more » ... rics: cell coverage, which measures how well the sub-table captures prominent association rules in T, and diversity. Since computing optimal sub-tables using these metrics is shown to be infeasible, we give an efficient algorithm which indirectly accounts for association rules using table embedding. The resulting framework can be used for visualizing the complete sub-table, as well as for displaying the results of queries over the sub-table, enabling the user to quickly understand the results and determine subsequent queries. Experimental results show that we can efficiently compute high-quality sub-tables as measured by our metrics, as well as by feedback from user-studies.
arXiv:2203.02754v1 fatcat:abvjq27wzrgbjcnrkzgoh7ud4q

REACT

Tova Milo, Amit Somech
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
Data analysis may be a difficult task, especially for nonexpert users, as it requires deep understanding of the investigated domain and the particular context. In this demo we present REACT, a system that hooks to the analysis UI and provides the users with personalized recommendations of analysis actions. By matching the current user session to previous sessions of analysts working with the same or other data sets, REACT is able to identify the potentially best next analysis actions in the
more » ... n user context. Unlike previous work that mainly focused on individual components of the analysis work, REACT provides a holistic approach that captures a wider range of analysis action types by utilizing novel notions of similarity in terms of the individual actions, the analyzed data and the entire analysis workflow. We demonstrate the functionality of REACT, as well as its effectiveness through a digital forensics scenario where users are challenged to detect cyber attacks in real life data achieved from honeypot servers.
doi:10.1145/2882903.2899392 dblp:conf/sigmod/MiloS16 fatcat:ggse3ctmtbbhjpvjaj2fajeszq

December

Yael Amsterdamer, Tova Milo, Amit Somech, Brit Youngmann
2016 Proceedings of the VLDB Endowment  
Adequate crowd selection is an important factor in the success of crowdsourcing platforms, increasing the quality and relevance of crowd answers and their performance in different tasks. The optimal crowd selection can greatly vary depending on properties of the crowd and of the task. To this end, we present December, a declarative platform with novel capabilities for flexible crowd selection. December supports the personalized selection of crowd members via a dedicated query language
more » ... This language enables specifying and combining common crowd selection criteria such as properties of a crowd member's profile and history, similarity between profiles in specific aspects and relevance of the member to a given task. This holistic, customizable approach differs from previous work that has mostly focused on dedicated algorithms for crowd selection in specific settings. To allow efficient query execution, we implement novel algorithms in December based on our generic, semanticallyaware definitions of crowd member similarity and expertise. We demonstrate the effectiveness of December and Member-QL by using the VLDB community as crowd members and allowing conference participants to choose from among these members for different purposes and in different contexts.
doi:10.14778/3007263.3007290 fatcat:5qufv7grozgpnf7quaezskfvyq

Towards Autonomous, Hands-Free Data Exploration

Ori Bar El, Tova Milo, Amit Somech
2020 Conference on Innovative Data Systems Research  
Exploratory Data Analysis (EDA) is an important yet difficult task, currently performed by expert users, as it requires deep understanding of the data domain as well as profound analytical skills. In this work we make the case for the Hands-Free EDA (HFE) paradigm, in which the exploratory process is automatically conducted, requiring little or no human input -as in watching a "video" presenting selected highlights of the dataset. To that end, we suggest an end-to-end visionary system
more » ... re, coupled with a prototype implementation. Our preliminary experimental results demonstrate that HFE is achievable, and leads the way for improvement and optimization research.
dblp:conf/cidr/ElMS20 fatcat:n5i65rd4ivevffev36n52clisa

OASSIS

Yael Amsterdamer, Susan B. Davidson, Tova Milo, Slava Novgorodov, Amit Somech
2014 Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD '14  
Crowd data sourcing is increasingly used to gather information from the crowd and to obtain recommendations. In this paper, we explore a novel approach that broadens crowd data sourcing by enabling users to pose general questions, to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent, significant data patterns. Our approach is based on (1) a simple generic model that captures both ontological knowledge as well as the individual history
more » ... or habits of crowd members from which frequent patterns are mined; (2) a query language in which users can declaratively specify their information needs and the data patterns of interest; (3) an efficient query evaluation algorithm, which enables mining semantically concise answers while minimizing the number of questions posed to the crowd; and (4) an implementation of these ideas that mines the crowd through an interactive user interface. Experimental results with both real-life crowd and synthetic data demonstrate the feasibility and effectiveness of the approach.
doi:10.1145/2588555.2610514 dblp:conf/sigmod/AmsterdamerDMNS14 fatcat:36jmfq3fdfdlndr5vkznneza5q

ExplainED: Explanations for EDA Notebooks

Daniel Deutch, Amir Gilad, Tova Milo, Amit Somech
2020 Proceedings of the VLDB Endowment  
Exploratory Data Analysis (EDA) is an essential yet highly demanding task. To get a head start before exploring a new dataset, data scientists often prefer to view existing EDA notebooks -illustrative exploratory sessions that were created by fellow data scientists who examined the same dataset and shared their notebooks via online platforms. Unfortunately, creating an illustrative, well-documented notebook is cumbersome and time-consuming, therefore users sometimes share their notebook without
more » ... explaining their exploratory steps and their results. Such notebooks are difficult to follow and to understand. To address this, we present ExplainED, a system that automatically attaches explanations to views in EDA notebooks. ExplainED analyzes each view in order to detect what elements thereof are particularly interesting, and produces a corresponding textual explanation. The explanations are generated by first evaluating the interestingness of the given view using several measures capturing different interestingness facets, then computing the Shapely values of the elements in the view, w.r.t. the interestingness measure yielding the highest score. These Shapely values are then used to guide the generation of the textual explanation. We demonstrate the usefulness of the explanations generated by ExplainED on real-life, undocumented EDA notebooks.
dblp:journals/pvldb/DeutchGMS20 fatcat:ejtz442mlnbxxhrkz4xybokxey

Ontology assisted crowd mining

Yael Amsterdamer, Susan B. Davidson, Tova Milo, Slava Novgorodov, Amit Somech
2014 Proceedings of the VLDB Endowment  
We present OASSIS (for Ontology ASSISted crowd mining), a prototype system which allows users to declaratively specify their information needs, and mines the crowd for answers. The answers that the system computes are concise and relevant, and represent frequent, significant data patterns. The system is based on (1) a generic model that captures both ontological knowledge, as well as the individual knowledge of crowd members from which frequent patterns are mined; (2) a query language in which
more » ... sers can specify their information needs and types of data patterns they seek; and (3) an efficient query evaluation algorithm, for mining semantically concise answers while minimizing the number of questions posed to the crowd. We will demonstrate OASSIS using a couple of real-life scenarios, showing how users can formulate and execute queries through the OASSIS UI and how the relevant data is mined from the crowd.
doi:10.14778/2733004.2733039 fatcat:fk3votc25nclnpip3hw3ieruwy

Managing General and Individual Knowledge in Crowd Mining Applications

Yael Amsterdamer, Susan B. Davidson, Anna Kukliansky, Tova Milo, Slava Novgorodov, Amit Somech
2015 Conference on Innovative Data Systems Research  
Crowd mining frameworks combine general knowledge, which can refer to an ontology or information in a database, with individual knowledge obtained from the crowd, which captures habits and preferences. To account for such mixed knowledge, along with user interaction and optimization issues, such frameworks must employ a complex process of reasoning, automatic crowd task generation and result analysis. In this paper, we describe a generic architecture for crowd mining applications. This
more » ... ure allows us to examine and compare the components of existing crowdsourcing systems and point out extensions required by crowd mining. It also highlights new research challenges and potential reuse of existing techniques/components. We exemplify this for the OASSIS project and for other prominent crowdsourcing frameworks.
dblp:conf/cidr/AmsterdamerDKMN15 fatcat:e4f2cmx7zvhdjcrh6pec4uylya

SubTab: Data Exploration with Informative Sub-Tables

Kathy Razmadze, Yael Amsterdamer, Amit Somech, Susan B. Davidson, Tova Milo
2022 Proceedings of the 2022 International Conference on Management of Data  
We demonstrate SubTab, a framework for creating small, informative sub-tables of large data tables to speed up data exploration. Given a table with 𝑛 rows and 𝑚 columns where 𝑛 and 𝑚 are large, SubTab creates a sub-table 𝑇 𝑠𝑢𝑏 with 𝑘 << 𝑛 rows and 𝑙 << 𝑚 columns, i.e. a subset of 𝑘 rows of the table projected over a subset of 𝑙 columns. The rows and columns are chosen as representatives of prominent data patterns within and across columns in the input table. SubTab can also be used for query
more » ... ults, enabling the user to quickly understand the results and determine subsequent queries.
doi:10.1145/3514221.3520154 fatcat:eeik4ruqmbdkhi3lub6jc6gkny

Trough Concentrations of Specific Antibodies in Primary Immunodeficiency Patients Receiving Intravenous Immunoglobulin Replacement Therapy

Ori Hassin, Yahya Abu Freih, Ran Hazan, Atar Lev, Keren S. Zrihen, Raz Somech, Arnon Broides, Amit Nahum
2021 Journal of Clinical Medicine  
Immunoglobulin replacement therapy is a mainstay therapy for patients with primary immunodeficiency (PID). The content of these preparations was studied extensively. Nevertheless, data regarding the effective specific antibodies content (especially in the nadir period), and, in different groups of PID patients is limited. We studied trough IgG concentrations as well as anti-Pneumococcus, anti-Haemophilus influenzae b, anti-Tetanus, and anti-Measles antibody concentrations in 17 PID patients
more » ... iving intravenous immunoglobulin (IVIg) compared with healthy controls matched for age and ethnicity. We also analyzed these results according to the specific PID diagnosis: X-linked agammaglobulinemia (XLA), combined immunodeficiency (CID), and ataxia telangiectasia (AT). We recorded a higher concentration of anti-pneumococcal polysaccharide antibodies in healthy controls compared to the entire group of PID patients. We also found significantly higher anti-tetanus toxoid antibody concentrations in the XLA patients, compared to CID patients. Anti-Haemophilus Influenzae b antibody titers were overall similar between all the groups. Interestingly, there were overall low titers of anti-Measles antibodies below protective cutoff antibody concentrations in most patients as well as in healthy controls. We conclude that relying on total IgG trough levels is not necessarily a reflection of effective specific antibodies in the patient's serum. This is especially relevant to CID patients who may have production of nonspecific antibodies. In such patients, a higher target trough IgG concentration should be considered. Another aspect worth considering is that the use of plasma from adult donors with a waning immunity for certain pathogens probably affects the concentrations of specific antibodies in IVIg preparations.
doi:10.3390/jcm10040592 pmid:33557365 pmcid:PMC7915625 fatcat:7sa4yr4j2bfwvniquigxxy552y

First Year of Israeli Newborn Screening for Severe Combined Immunodeficiency—Clinical Achievements and Insights

Erez Rechavi, Atar Lev, Amos J. Simon, Tali Stauber, Suha Daas, Talia Saraf-Levy, Arnon Broides, Amit Nahum, Nufar Marcus, Suhair Hanna, Polina Stepensky, Ori Toker (+4 others)
2017 Frontiers in Immunology  
Severe combined immunodeficiency (SCID), the most severe form of T cell immunodeficiency, is detectable through quantification of T cell receptor excision circles (TRECs) in dried blood spots obtained at birth. Herein, we describe the results of the first year of the Israeli SCID newborn screening (NBS) program. This important, life-saving screening test is available at no cost for every newborn in Israel. Eight SCID patients were diagnosed through the NBS program in its first year, revealing
more » ... incidence of 1:22,500 births in the Israeli population. Consanguine marriages and Muslim ethnic origin were found to be a risk factor in affected newborns, and a founder effect was detected for both IL7Rα and DCLRE1C deficiency SCID. Lymphocyte subset analysis and TREC quantification in the peripheral blood appear to be sufficient for confirmation of typical and leaky SCID and ruling out false positive (FP) results. Detection of secondary targets (infants with non-SCID lymphopenia) did not significantly affect the management or outcomes of these infants in our cohort. In the general, non-immunodeficient population, TREC rises along with gestational age and birth weight, and is significantly higher in females and the firstborn of twin pairs. Low TREC correlates with both gestational age and birth weight in extremely premature newborns. Additionally, the rate of TREC increase per week consistently accelerates with gestational age. Together, these findings mandate a lower cutoff or a more lenient screening algorithm for extremely premature infants, in order to reduce the high rate of FPs within this group. A significant surge in TREC values was observed between 28 and 30 weeks of gestation, where median TREC copy numbers rise by 50% 2 Rechavi et al. Results from the First Year of NBS for SCID in Israel Frontiers in Immunology | www.frontiersin.org November 2017 | Volume 8 | Article 1448 over 2 weeks. These findings suggest a maturational step in T cell development around week 29 gestation, and imply moderate to late preterms should be screened with the same cutoff as term infants. The SCID NBS program is still in its infancy, but is already bearing fruit in the early detection and improved outcomes of children with SCID in Israel and other countries.
doi:10.3389/fimmu.2017.01448 pmid:29167666 pmcid:PMC5682633 fatcat:kz5j25ptnzazpe7hjigaqjfyym

Immune function in newborns with in-utero exposure to anti-TNFα therapy

Batia Weiss, Shomron Ben-Horin, Atar Lev, Efrat Broide, Miri Yavzori, Adi Lahat, Uri Kopylov, Orit Picard, Rami Eliakim, Yulia Ron, Irit Avni-Biron, Anat Yerushalmy-Feler (+3 others)
2022 Frontiers in Pediatrics  
and aimAnti-TNFα is measurable in infants exposed in utero up to 12 months of age. Data about the exposure effect on the infant's adaptive immunity are limited. We aimed to prospectively evaluate the distribution and function of T and B cells, in infants of females with inflammatory bowel disease, in utero exposed to anti-TNFα or azathioprine.MethodsA prospective multi-center study conducted 2014–2017. Anti-TNFα levels were measured in cord blood, and at 3 and 12 months. T-cell repertoire and
more » ... nction were analyzed at 3 and 12 months by flow-cytometry, expression of diverse T cell receptors (TCR) and T-cell receptor excision circles (TREC) quantification assay. Serum immunoglobulins and antibodies for inactivated vaccines were measured at 12 months. Baseline clinical data were retrieved, and 2-monthly telephonic interviews were performed regarding child infections and growth.Results24 pregnant females, age 30.6 (IQR 26.5–34.5) years were recruited, 20 with anti-TNFα (infliximab 8, adalimumab 12), and 4 with azathioprine treatment. Cord blood anti-TNFα was higher than maternal blood levels [4.3 (IQR 2.3–9.2) vs. 2.5 (IQR 1.3–9.7) mcg/ml], declining at 3 and 12 months. All infants had normal number of B-cells (n = 17), adequate levels of immunoglobulins (n = 14), and protecting antibody levels to Tetanus, Diphtheria, Hemophilus influenza-B and hepatitis B (n = 17). All had normal CD4+, CD8+ T-cells, and TREC numbers. TCR repertoire was polyclonal in 18/20 and slightly skewed in 2/20 infants. No serious infections requiring hospitalization were recorded.ConclusionWe found that T-cell and B-cell immunity is fully mature and immune function is normal in infants exposed in utero to anti-TNFα, as in those exposed to azathioprine. Untreated controls and large-scale studies are needed to confirm these results.
doi:10.3389/fped.2022.935034 pmid:36120653 pmcid:PMC9470929 fatcat:nrjfla3ltzfevbomoulzbte5ua

Reduced Function and Diversity of T Cell Repertoire and Distinct Clinical Course in Patients With IL7RA Mutation

Atar Lev, Amos J. Simon, Ortal Barel, Eran Eyal, Efrat Glick-Saar, Omri Nayshool, Ohad Birk, Tali Stauber, Amit Hochberg, Arnon Broides, Shlomo Almashanu, Ayal Hendel (+2 others)
2019 Frontiers in Immunology  
Copyright © 2019 Lev, Simon, Barel, Eyal, Glick-Saar, Nayshool, Birk, Stauber, Hochberg, Broides, Almashanu, Hendel, Lee and Somech.  ... 
doi:10.3389/fimmu.2019.01672 pmid:31379863 pmcid:PMC6650764 fatcat:mzvknh6vdngdhnxih3m4c4cvya
« Previous Showing results 1 — 15 out of 36 results