Fall 2021 – 2022 Schedule

September 10th

Title:Histograms and How to Make Them Better

Systems Group -- University of Chicago Computer Science

Abstract: Summarizing a large dataset with a reduced-size “synopsis” has applications from visualization to approximate computing. Data dimensionality is an acute obstacle where techniques that work well in lower dimensions, such as histograms, fail to scale to higher-dimensional data. My talk surveys a few years of research in this area by my research group and discusses the theory and practice of high-dimensional data summarization. This survey will start by understanding how histograms fail at high-dimensional estimation and simple, but powerful extensions that have a much operating regime (and why these extensions work!). Then, I will discuss the relationship between data summarization and generative modeling in machine learning. I will conclude by describing the practical computer systems that we are building with these algorithmic building blocks.

Bio: Sanjay Krishnan is an Assistant Professor of Computer Science at the University of Chicago. His research studies the intersection of machine learning and database systems. Sanjay completed his PhD and Master’s Degree at UC Berkeley in Computer Science in 2018. Sanjay’s work has received a number of awards including the 2016 SIGMOD Best Demonstration award, 2015 IEEE GHTC Best Paper award, and Sage Scholar award.

Dr. Krishnan Lab

September 17th

Title: Basepairs to petabytes: Computing the Genomics Revolution

Department of Computer Science | Michael Schatz - Department of Computer  Science

Abstract:The last 20 years have been a remarkable era for biology and medicine. One of the most significant achievements has been the sequencing of the first human genomes, which has laid the foundation for profound insights into human genetics, the intricacies of regulation and development, and the forces of evolution. Incredibly, as we look into the future over the next 20 years, we see the very real potential for sequencing more than 1 billion genomes, bringing even deeper insight into human genetics as well as the genetics of millions of other species on the planet. Realizing this great potential for medicine and biology, though, will only be achieved through the integration and development of highly scalable computational and quantitative approaches that can keep pace with the rapid improvements to biotechnology. During this presentation, I aim to chart out these future technologies, anticipate the major themes of research, and call out the challenges ahead.

Bio: Michael Schatz, Bloomberg Distinguished Professor of Computer Science and Biology at Johns Hopkins University, is among the world’s foremost experts in solving computational problems in genomics research. His innovative biotechnologies and computational tools to study the sequence and function of genomes are advancing the understanding of the structure, evolution, and function of genomes for medicine – particularly autism spectrum disorders, cancer, and other human disease – and agriculture.


September 24th

Title: Automatic Phenotyping in the Age of Deep Learning


Abstract: It is often estimated that 80% of clinical data today is stored in an unstructured form, mostly as electronic health records (EHR). Within this corpus of text lies a vast amount of valuable information that can be leveraged for phenotyping, pharmacogenomic studies, and clinical decision support, ultimately improving patient care and reducing healthcare costs. Until fairly recently, automatic phenotyping (patient cohort identification) had been conducted using feature-based approaches in combination with linear classifiers. Deep learning revolutionized clinical informatics, but obtaining large datasets to take advantage of highly expressive neural network models is difficult and expensive. In this talk, I will argue that amenability to pretraining is a key benefit of deep learning for healthcare. I will then outline my contributions related to pretraining phenotyping classifiers using various sources of freely available supervision. If the time permits, I will briefly review several other projects involving substance misuse classification and information extraction from medical records.

Bio: The overarching goal of Dr. Dligach’s research is developing methods for automatic semantic analysis of texts. His work spans such areas of computer science as natural language processing, machine learning, and data mining. Most recently his research has focused on semantic analysis of clinical texts. He works both on method development and applications. Prior to to joining Loyola, Dr. Dligach was a researcher at Boston Children’s Hospital and Harvard Medical School. Dr. Dligach received his PhD in computer science from the University of Colorado Boulder, his MS in computer science from the State University of New York at Buffalo, and his BS in computer science from Loyola University Chicago.

Dr. Dligach Lab

October 01st

Title: The Doctor (or Chatbot?) Is In: Towards Automated Support for Healthcare Tasks using Natural Language Processing