[{"content":"Welcome to my blog! Here I share insights from my research in artificial intelligence, machine learning applications in healthcare, and computational biology. I write about interesting developments in the field, lessons learned from my research, and thoughts on the intersection of technology and medicine.\n","date":null,"permalink":"/posts/","section":"Blog","summary":"Welcome to my blog!","title":"Blog"},{"content":"","date":null,"permalink":"/","section":"Home","summary":"","title":"Home"},{"content":"Here is a list of my academic publications and presentations. For a complete list of my work, please see my Google Scholar profile.\nIn all publications \u0026quot;*\u0026quot; denotes equal contribution.\n","date":null,"permalink":"/publications/","section":"Publications","summary":"Here is a list of my academic publications and presentations.","title":"Publications"},{"content":" Oran Lang, Doron Yaya-Stupp, Ilana Traynis, Heather Cole-Lewis, Chloe R. Bennett, Courtney Lyles, Charles Lau, Michal Irani, Christopher Semturs, Dale R. Webster, Greg S. Corrado, Avinatan Hassidim, Yossi Matias, Yun Liu, Naama Hammel, Boris Babenko. EBioMedicine, January 2024 DOI: 10.1016/j.ebiom.2024.105075 Abstract #Background: AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren\u0026rsquo;t yet known to experts.\nMethods: In this paper, we present a workflow for generating hypotheses to understand which visual signals in images are correlated with a classification model\u0026rsquo;s predictions for a given task. This approach leverages an automatic visual explanation algorithm followed by interdisciplinary expert review. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier (\u0026ldquo;StylEx\u0026rdquo;); (iii) Automatically detect, extract, and visualize the top visual attributes that the classifier is sensitive towards. For visualization, we independently modify each of these attributes to generate counterfactual visualizations for a set of images (i.e., what the image would look like with the attribute increased or decreased); (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, present the discovered attributes and corresponding counterfactual visualizations to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health (e.g., whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries).\nFindings: To demonstrate the broad applicability of our approach, we present results on eight prediction tasks across three medical imaging modalities-retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible, previously-unknown attributes based on the literature (e.g., differences in the fundus associated with self-reported sex, which were previously unknown).\nInterpretation: Our approach enables hypotheses generation via attribute visualizations and has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models, as well as debug and design better datasets. Though not designed to infer causality, importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence interdisciplinary perspectives are critical in these investigations. Finally, we will release code to help researchers train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes.\nCite Copied! DOI PDF ","date":"2024-01-01","permalink":"/publications/lang-2024/","section":"Publications","summary":"Oran Lang, Doron Yaya-Stupp, Ilana Traynis, Heather Cole-Lewis, Chloe R.","title":"Using generative AI to investigate medical imagery models and datasets"},{"content":" Doron Stupp, Ronnie Barequet, I-Ching Lee, Eyal Oren, Amir Feder, Ayelet Benjamini, Avinatan Hassidim, Yossi Matias, Eran Ofek, Alvin Rajkomar. MedRxiv (Pre-print), April 2022 DOI: 10.1101/2022.04.13.22273438 Abstract #Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the assessment and plan. This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. We describe and release a dataset containing annotations of 579 admission and progress notes from the publicly available and de-identified MIMIC-III ICU dataset with over 30,000 labels identifying active problems, their assessment, and the category of associated action items (e.g. medication, lab test). We also propose deep-learning based models that approach human performance, with a F1 score of 0.88. We found that by employing weak supervision and domain specific data-augmentation, we could improve generalization across departments and reduce the number of human labeled notes without sacrificing performance.\nCite Copied! DOI PDF ","date":"2022-04-13","permalink":"/publications/stupp-2022-medrxiv/","section":"Publications","summary":"Doron Stupp, Ronnie Barequet, I-Ching Lee, Eyal Oren, Amir Feder, Ayelet Benjamini, Avinatan Hassidim, Yossi Matias, Eran Ofek, Alvin Rajkomar.","title":"Structured Understanding of Assessment and Plans in Clinical Documentation"},{"content":" Sapir Labes, Doron Stupp, Naama Wagner, Idit Bloch, Michal Lotem, Ephrat L. Lahad, Paz Polak, Tal Pupko, Yuval Tabach. NAR Genomics and Bioinformatics, January 2022 DOI: 10.1093/nargab/lqac025 Abstract #Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.\nCite Copied! DOI PDF ","date":"2022-01-01","permalink":"/publications/labes-2022/","section":"Publications","summary":"Sapir Labes, Doron Stupp, Naama Wagner, Idit Bloch, Michal Lotem, Ephrat L.","title":"Machine-learning of complex evolutionary signals improves classification of SNVs"},{"content":" Doron Stupp, Ronnie Barequet, I-Ching Lee, Eyal Oren, Amir Feder, Ayelet Benjamini, Avinatan Hassidim, Yossi Matias, Eran Ofek, Alvin Rajkomar. January 2022 Presented as poster and lightning talk at MLHC 2022, NC, US.\n","date":"2022-01-01","permalink":"/publications/poster-stupp-2022-mlhc/","section":"Publications","summary":"Doron Stupp, Ronnie Barequet, I-Ching Lee, Eyal Oren, Amir Feder, Ayelet Benjamini, Avinatan Hassidim, Yossi Matias, Eran Ofek, Alvin Rajkomar.","title":"Structured Understanding of Assessment and Plans in Clinical Documentation"},{"content":" Tomer Tsaban*, Doron Stupp*, Dana Sherill-Rofe, Idit Bloch, Ora Schueler-Furman, Reuven Wiener, Yuval Tabach. NAR Genomics and Bioinformatics, January 2021 DOI: 10.1093/nargab/lqab024 Abstract #Mapping co-evolved genes via phylogenetic profiling (PP) is a powerful approach to uncover functional interactions between genes and to associate them with pathways. Despite many successful endeavors, the understanding of co-evolutionary signals in eukaryotes remains partial. Our hypothesis is that ‘Clades’, branches of the tree of life (e.g. primates and mammals), encompass signals that cannot be detected by PP using all eukaryotes. As such, integrating information from different clades should reveal local co-evolution signals and improve function prediction. Accordingly, we analyzed 1028 genomes in 66 clades and demonstrated that the co-evolutionary signal was scattered across clades. We showed that functionally related genes are frequently co-evolved in only parts of the eukaryotic tree and that clades are complementary in detecting functional interactions within pathways. We examined the non-homologous end joining pathway and the UFM1 ubiquitin-like protein pathway and showed that both demonstrated distinguished co-evolution patterns in specific clades. Our research offers a different way to look at co-evolution across eukaryotes and points to the importance of modular co-evolution analysis. We developed the ‘CladeOScope’ PP method to integrate information from 16 clades across over 1000 eukaryotic genomes and is accessible via an easy to use web server at http://cladeoscope.cs.huji.ac.il.\nCite Copied! DOI PDF ","date":"2021-01-01","permalink":"/publications/tsaban-2021/","section":"Publications","summary":"Tomer Tsaban*, Doron Stupp*, Dana Sherill-Rofe, Idit Bloch, Ora Schueler-Furman, Reuven Wiener, Yuval Tabach.","title":"CladeOScope: elucidating functional interactions via a clade co-evolution prism"},{"content":" Doron Stupp, Elad Sharon, Idit Bloch, Marinka Zitnik, Or Zuk, Yuval Tabach. Nature Communications, January 2021 DOI: 10.1038/s41467-021-26792-w Abstract #Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il.\nCite Copied! DOI PDF ","date":"2021-01-01","permalink":"/publications/stupp-2021-natcomm/","section":"Publications","summary":"Doron Stupp, Elad Sharon, Idit Bloch, Marinka Zitnik, Or Zuk, Yuval Tabach.","title":"Co-evolution based machine-learning for predicting the human functional interactome and unraveling evolutionary insights"},{"content":" Hodaya Beer, Dana Sherill-Rofe, Irene Unterman, Idit Bloch, Mendel Isseroff, Doron Stupp, Elad Sharon, Elad Zisman, Yuval Tabach. BioRxiv (Pre-print), January 2020 DOI: 10.1101/2020.01.12.903138 Abstract #Cross-species protein conservation patterns, as directed by natural selection, are indicative of the interplay between protein function, protein-protein interaction and evolution. Since the beginning of the genomic era, proteins were characterized as either conserved or not conserved. This simple classification became archaic and cursory once data on protein orthologs became available for thousands of species. To enrich the language used to describe protein conservation patterns, and to understand their biological significance, we classified 20,294 human proteins against 1096 species. Analyses of the conservation patterns of human proteins in different eukaryotic clades yielded extremely variable and rich patterns that had never been characterized or studied before. Using mathematical classifications, we defined seven conservation motifs: Steps, Critical, Lately Developed, Plateau, Clade Loss, Trait Loss and Gain, which describe the evolution of human proteins. Overall, our work offers novel terms for conservation patterns and defines a new language intended to comprehensively describe protein evolution. This novel terminology enables the classification of proteins based on evolution, reveals aspects of protein evolution, and improves the understanding of protein functions.\nCite Copied! DOI PDF ","date":"2020-01-12","permalink":"/publications/beer-2020-biorxiv/","section":"Publications","summary":"Hodaya Beer, Dana Sherill-Rofe, Irene Unterman, Idit Bloch, Mendel Isseroff, Doron Stupp, Elad Sharon, Elad Zisman, Yuval Tabach.","title":"Conservation Motifs – a novel evolutionary-based classification of proteins"},{"content":" Idit Bloch, Dana Sherill-Rofe, Doron Stupp, Irene Unterman, Hodaya Beer, Dolev Rahat, Elad Sharon, Yuval Tabach. Bioinformatics, January 2020 DOI: 10.1093/bioinformatics/btaa281 Abstract #The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context.\nCite Copied! DOI PDF ","date":"2020-01-01","permalink":"/publications/bloch-2020/","section":"Publications","summary":"Idit Bloch, Dana Sherill-Rofe, Doron Stupp, Irene Unterman, Hodaya Beer, Dolev Rahat, Elad Sharon, Yuval Tabach.","title":"Optimization of Co-evolution Analysis Through Phylogenetic Profiling Reveals Pathway-Specific Signals"},{"content":" Ming-Ru Wu*, Lior Nissim*, Doron Stupp*, Erez Pery, Adina Binder-Nissim, Karen Weisinger, Sebastian Ricardo Palacios, Casper Enghuus, Melissa Humphrey, Zhizhuo Zhang, Eva Maria Novoa Pardo, Manolis Kellis, Ron Weiss, Samuel David Rabkin, Yuval Tabach, Timothy K. Lu. Nature Communications, January 2019 DOI: 10.1038/s41467-019-10912-8 Abstract #Cell state-specific promoters constitute essential tools for basic research and biotechnology because they activate gene expression only under certain biological conditions. Synthetic Promoters with Enhanced Cell-State Specificity (SPECS) can be superior to native ones, but the design of such promoters is challenging and frequently requires gene regulation or transcriptome knowledge that is not readily available. Here, to overcome this challenge, we use a next-generation sequencing approach combined with machine learning to screen a synthetic promoter library with 6107 designs for high-performance SPECS for potentially any cell state. We demonstrate the identification of multiple SPECS that exhibit distinct spatiotemporal activity during the programmed differentiation of induced pluripotent stem cells (iPSCs), as well as SPECS for breast cancer and glioblastoma stem-like cells. We anticipate that this approach could be used to create SPECS for gene therapies that are activated in specific cell states, as well as to study natural transcriptional regulatory networks.\nCite Copied! DOI PDF ","date":"2019-01-01","permalink":"/publications/wu-2019/","section":"Publications","summary":"Ming-Ru Wu*, Lior Nissim*, Doron Stupp*, Erez Pery, Adina Binder-Nissim, Karen Weisinger, Sebastian Ricardo Palacios, Casper Enghuus, Melissa Humphrey, Zhizhuo Zhang, Eva Maria Novoa Pardo, Manolis Kellis, Ron Weiss, Samuel David Rabkin, Yuval Tabach, Timothy K.","title":"A High-throughput Screening and computation Platform for Identifying Synthetic Promoters with Enhanced Cell-State Specificity (SPECS)"},{"content":" Hodaya Beer, Dana Sherill-Rofe, Doron Stupp, Yuval Tabach. January 2019 Presented as poster at ISMB/ECCB 2019, Switzerland.\n","date":"2019-01-01","permalink":"/publications/poster-beer-2019-ismb/","section":"Publications","summary":"Hodaya Beer, Dana Sherill-Rofe, Doron Stupp, Yuval Tabach.","title":"Evolutionary Motifs – A Novel Way To Define Evolution Across A Thousand Of Species"},{"content":" Lena Qawasmi*, Maya Braun*, Irene Guberman, Emiliano Cohen, Lamis Naddaf, Anna Mellul, Olli Matilaine, Danielle share, Doron Stupp, Haya Chahine, Ehud Cohen, Susana Garcia, Yuval Tabach. Journal of Molecular Biology, January 2019 DOI: 10.1016/j.jmb.2019.03.003 Abstract #Myotonic dystrophy type 1 is an autosomal-dominant inherited disorder caused by the expansion of CTG repeats in the 3′ untranslated region of the DMPK gene. The RNAs bearing these expanded repeats have a range of toxic effects. Here we provide evidence from a Caenorhabditis elegans myotonic dystrophy type 1 model that the RNA interference (RNAi) machinery plays a key role in causing RNA toxicity and disease phenotypes. We show that the expanded repeats systematically affect a range of endogenous genes bearing short non-pathogenic repeats and that this mechanism is dependent on the small RNA pathway. Conversely, by perturbating the RNA interference machinery, we reversed the RNA toxicity effect and reduced the disease pathogenesis. Our results unveil a role for RNA repeats as templates (based on sequence homology) for moderate but constant gene silencing. Such a silencing effect affects the cell steady state over time, with diverse impacts depending on tissue, developmental stage, and the type of repeat. Importantly, such a mechanism may be common among repeats and similar in human cells with different expanded repeat diseases.\nCite Copied! DOI PDF ","date":"2019-01-01","permalink":"/publications/qawasmi-2019/","section":"Publications","summary":"Lena Qawasmi*, Maya Braun*, Irene Guberman, Emiliano Cohen, Lamis Naddaf, Anna Mellul, Olli Matilaine, Danielle share, Doron Stupp, Haya Chahine, Ehud Cohen, Susana Garcia, Yuval Tabach.","title":"Expanded CUG repeats trigger disease phenotype and expression changes through the RNAi machinery in C. elegans"},{"content":" Doron Stupp, Yuval Tabach. January 2019 Presented as poster at ISMB/ECCB 2019, Switzerland; Presented as poster and oral presentation at GGE 8 2019, Israel.\n","date":"2019-01-01","permalink":"/publications/poster-stupp-2019-ismb-gge/","section":"Publications","summary":"Doron Stupp, Yuval Tabach.","title":"Machine Learning Based Phylogenetic Profiling (MLPP) - Using Local Co-Evolution for Functional Interaction Prediction and Uncovering Evolutionary Insights"},{"content":" Sapir Labes, Doron Stupp, Dolev Rahat, Idit Bloch, Michal Lotem, Yuval Tabach. January 2019 Presented as poster at 7th Broad-ISF Symposium (2019), Israel.\n","date":"2019-01-01","permalink":"/publications/poster-labes-2019-broadisf/","section":"Publications","summary":"Sapir Labes, Doron Stupp, Dolev Rahat, Idit Bloch, Michal Lotem, Yuval Tabach.","title":"Predicting the Pathogenicity of Variants by Cross-Species Evolutionary Patterns Using Machine Learning"},{"content":" Elad Zisman, Doron Stupp, Yuval Tabach, David Arkadir. January 2019 Presented as poster at GGE 8 2019, Israel.\n","date":"2019-01-01","permalink":"/publications/poster-zisman-2019-gge/","section":"Publications","summary":"Elad Zisman, Doron Stupp, Yuval Tabach, David Arkadir.","title":"Solving Missing Heredity Syndromes via Genetic Algorithm Based Phylogenetic Profiling"},{"content":" Doron Stupp, Idit Bloch, Yuval Tabach. January 2018 Presented as poster at 20th Israeli Bioinformatics Symposium 2018, Israel.\n","date":"2018-01-01","permalink":"/publications/poster-stupp-2018-israelibio/","section":"Publications","summary":"Doron Stupp, Idit Bloch, Yuval Tabach.","title":"Phylogenetic profiling for predicting PPI and interaction context using local co-evolution"},{"content":" Lior Nissim*, Ming-Ru Wu*, Erez Pery, Adina Binder-Nissim, Hiroshi I Suzuki, Doron Stupp, Claudia Wehrspaun, Yuval Tabach, Phillip A Sharp, Timothy K Lu. Cell, January 2017 DOI: 10.1016/j.cell.2017.09.049 Abstract #Despite its success in several clinical trials, cancer immunotherapy remains limited by the rarity of targetable tumor-specific antigens, tumor-mediated immune suppression, and toxicity triggered by systemic delivery of potent immunomodulators. Here, we present a proof-of-concept immunomodulatory gene circuit platform that enables tumor-specific expression of immunostimulators, which could potentially overcome these limitations. Our design comprised de novo synthetic cancer-specific promoters and, to enhance specificity, an RNA-based AND gate that generates combinatorial immunomodulatory outputs only when both promoters are mutually active. These outputs included an immunogenic cell-surface protein, a cytokine, a chemokine, and a checkpoint inhibitor antibody. The circuits triggered selective T cell-mediated killing of cancer cells, but not of normal cells, in vitro. In in vivo efficacy assays, lentiviral circuit delivery mediated significant tumor reduction and prolonged mouse survival. Our design could be adapted to drive additional immunomodulators, sense other cancers, and potentially treat other diseases that require precise immunological programming.\nCite Copied! DOI PDF ","date":"2017-01-01","permalink":"/publications/nissim-2017/","section":"Publications","summary":"Lior Nissim*, Ming-Ru Wu*, Erez Pery, Adina Binder-Nissim, Hiroshi I Suzuki, Doron Stupp, Claudia Wehrspaun, Yuval Tabach, Phillip A Sharp, Timothy K Lu.","title":"Synthetic RNA-Based Immunomodulatory Gene Circuits for Cancer Immunotherapy"},{"content":" Claes D. Enk, Abed Nasereddin, Mary Dan-Goor, Doron Stupp, Hans Chr. Wulf, Charles L. Jaffe. January 2013 Presented as poster at Israel Society for Parasitology, Protozoology and Tropical diseases conference 2013, Israel.\n","date":"2013-01-01","permalink":"/publications/poster-enk-2013-ispptd/","section":"Publications","summary":"Claes D.","title":"Treatment of cutaneous leishmaniasis with daylight activated photodynamic therapy"},{"content":"","date":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories"},{"content":"","date":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags"}]