- Week 1: Jan 07th: Pratool Bharti, Assistant Professor, Department of Computer Science, Northern Illinois University, DeKalb, IL, USA
- Week 2: Jan 14th: David Koop, Assistant Professor, Department of Computer Science, Northern Illinois University, DeKalb, IL, USA
- Week 3: Jan 21st: Robert Tell, Vice President, Bioinformatics, Tempus Labs, Inc., Chiacgo, IL, USA
- Week 4: Jan 28th: Mark Potosnak, Professor and Department Chair, Department of Environmental Science and Studies, DePaul University, Chicago, IL, USA
- Week 5: Feb 04th: Benjamin Langmead, Associate Professor, Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Week 6: Feb 11th: Dan Knights, Associate Professor, Department of Computer Science and Engineering, University of Minnesota, Twin Cities, NM, USA
- Week 7: Feb 18th: Todd Smith, CTO, Digital World Biology
- Week 8: Feb 25th: Taylor Reiter, Post Doctoral Researcher, University of California, Davis
- Week 9: Mar 04th: Debzani Deb, Associate Professor, Department of Computer Science, Winston-Salem University, Winston-Salem, NC, USA
- Week 10: Mar 11th: Andy Wing Chun Pang, Staff Scientist, Bionano Genomics, Inc.
- Week 11: Mar 18th: FINALS WEEK – NO SEMINAR – GOOD LUCK WITH FINALS!
TITLE: Context-aware Machine Learning Models for Personalized and Public Health
ABSTRACT: Innovations in designing Machine Learning models (and creating novel applications) have been growing at a rapid pace in the last decade. More recently, there is an earnest interest in context-aware learning, where the goal is to carefully extract limited contexts within the domain of interest during model development and execution. Such contexts can be broad and also domain specific – for example physiology of the human body in classifying physical activities; evolution of anatomies in classifying biological species; dynamically changing outdoor environments during modeling and optimization of flood detection system; and much more. The motivation for integrating contexts during learning not only improves accuracy, but also saves on implementation/ execution cost. However, this process is challenging and often requires multi-disciplinary expertise.
In this talk, I will highlight my recent technical contributions in this space. First, I will present my research on designing physiology-aware learning models to accurately classify complex human activities using wearable devices that are significant for personalized elder care. The innovation here is careful integration of multi-modal inertial sensory data from multiple wearable devices emplaced across multiple positions in the human body, and finally integrating human physiology into decision making. Second, I will present my results on neural network models to classify genus and species types of mosquitoes from smart- generated images taken by experts or by ordinary citizens. The innovation here lies in extracting contextually relevant anatomies (e.g., head, thorax, wings, legs) from mosquito images, and assigning appropriate weights to only the most critical anatomical component(s) for accurate classification. This work is expected to have significant impact in automating mosquito surveillance and related public health efforts in the US and across the globe. Towards the end, I will briefly explain my work on vision guided flood warning system to detect water level in the stream under dynamic environmental conditions.
TITLE: Reproducible Computational Notebooks
ABSTRACT: Computational notebooks provide a setting where users can rapidly examine and evaluate intermediate outputs as solutions are explored through blocks of code named cells. However, these explorations can be difficult to reuse or reproduce in the future, especially by someone other than the original author. We have investigated techniques both to discover provenance from saved notebooks themselves and to enforce greater structure in new notebook explorations. While notebooks encode clues about execution patterns and often share similar structures, we have found there is usually not enough information to infer the provenance of past notebooks. To address this issue, we have built dataflow notebooks as a way to clearly structure dependencies between cells. This makes computations more traceable and reproducible, but brings some interesting usability challenges. We propose new methods to display and refer to cell outputs in order to minimize rewriting while allowing evolution of the code.
TITLE: Liquid biopsy testing at Tempus and modalities of therapeutic intervention
ABSTRACT: This talk will review the variety of technical challenges associated with liquid biopsy testing in the field and a review of a recent publication from Tempus labs describing analytical solutions to these problems. Content will largely be from the publication, Validation of a liquid biopsy assay with molecular and clinical profiling of circulating tumor DNA
TITLE: Enhancing undergraduate data-science skills and participation through community partnerships
Abstract: The Metropolitan Chicago Data-science Corps (MCDC) is funded through the National Science Foundation to promote practical data-science abilities and increase participation from underrepresented groups. From a student perspective, there are three phases: a year-long sequence of introductory data-science pathway courses, a data-science practicum class and a summer internship with salary support. The latter two phases incorporate real-world data-science projects that are solicited from community partners. There are five Chicago-area universities involved: Northwestern (the lead), DePaul, Northeastern Illinois, Chicago State and University of Illinois’s ischool. Each university has its own pathway sequences and individual data-science practicum courses. Students from different participating universities will come together in teams led by faculty mentors to work with non-profit community partners. These partnerships will focus on social justice, environmental issues and community health. The vision is that students will be drawn towards understanding data science through real-world learning and working on problems that are inclusive and support the common good.
TITLE: Pan-genomic advances for fighting reference bias
ABSTRACT: Sequencing data analysis often begins with aligning sequencing reads to a reference genome, where the reference takes the form of a linear string of bases. But linearity leads to reference bias, a tendency to miss or misreport alignments containing non-reference alleles, which can confound downstream statistical and biological results. This is a major concern in human genomics; we don’t want to live in a world where diagnostics and therapeutics are differentially effective depending how closely our genome matches the reference.
Fortunately, computer science and bioinformatics are meeting this moment. In particular, recent advances allow us to index and align sequencing reads to references that include many population variants. Here I will describe this journey from the early days of efficient genome indexing — especially the FM index approach behind Bowtie and BWA — continuing through more modern methods for graph-shaped references and references that include many genomes. I will emphasize recent results that show how to optimize simple and complex pan-genome representations for effective avoidance of reference bias. Finally, I will outline major problems that remain as our field strives to make the transition to more inclusive pan-genomic representations.
Much of this work is collaborative with Travis Gagie, Christina Boucher, Alan Kuhnle and others.
TITLE:Scalable methods for microbiome and dietary intake analysis
ABSTRACT: The human microbiome has been associated with dozens of major human diseases, and is responsible for mediating the effects of certain foods and therapeutic drugs. This has resulted in intense interest in mapping human microbiome traits to health outcomes, and in understanding how exposures like diet and lifestyle influence the microbiome. Unfortunately, many published microbiome studies are under-powered or use low-resolution methods, while actionable microbiome discovery requires careful study design, large sample sizes, high-resolution data, and integration of diet and other exposure data. This talk describes new methods that incorporate dietary intake data and enable scalable high-resolution microbiome studies, including applications of those studies to derive insights into how diet and immigration affect the human gut microbiome.
TITLE: Biotech Careers in Software and Computational Biology
ABSTRACT: After you finish school, what’s next? From algorithms that enable analyses, to data science and deep learning, to visualization and presentation, career opportunities are abundant. Career journeys are, however, nonlinear. What you do in your first position may not be what you do five, ten, or twenty years down the line. In this presentation, I’ll share my journey. A story of bench scientist who liked computers and experienced the power of automating data processing in Excel. An experience that led to discovering the power of the command line and bioinformatics on VAX computers. The next stop was UNIX, relational databases, data mining, and visualization, which led to co-founding a bioinformatics company (Geospiza). More recently, I’ve built on previous skills to explore sequence, structure, function relationships, work with companies that blend the bench with informatics to create powerful products and services in genomics and synthetic biology, and produce websites. I will use short stories to illustrate my experience with computation and software in biotechnology and lessons in career development.
TITLE: Metagenome assembly graphs and when to use them: a case study in identifying strain-level differences in gut microbiomes
ABSTRACT: Short read metagenomic sequencing has expanded our knowledge of microbial communities and their diversity. In particular, metagenome assembly and genome binning or annotation have produced catalogs of metagenome-assembled genomes and genes, revealing new species and functional potentially previously unobserved in cultured organisms. These methods are often biased against complexity in sequencing data and can leave anywhere from 10%-90% of a metagenome sample unanalyzed depending on the complexity (number of genomes, relatedness of organisms) of the underlying community. Assembly graphs improve on these biases by including all of the sequences in a metagenome sample within the graph. In this talk, I will introduce methods for graph-based analysis of metagenomic sequencing data. Using gut microbiome metagenomes from individuals with and without inflammatory bowel disease, I will demonstrate the recovery of strain-level differences directly from short read metagenome sequences.
TITLE:Use of Machine Learning in Touchstroke Authentication and in Exploring Spatial Justice
ABSTRACT: The Center for Applied Data Science (CADS) at WSSU is an institution-wide initiative to foster research and education in data-driven knowledge discovery. CADS aims to bring together computer scientists and domain scientists with complex Data Science research problems to promote and accelerate data-intensive discovery. This talk will focus on discussing two such research endeavors. The first project explores the application of behavioral biometrics in smartphone authentication and presents a touchstroke authentication model based on Auxiliary Classifier Generative Adversarial Network (AC-GAN). Given a small subset of a legitimate user’s touchstrokes data during training, the AC-GAN model learns to generate a vast amount of synthetic touchstrokes that closely approximate the real touchstrokes, simulating imposter behavior, and then uses both generated and real touchstrokes in discriminating legitimate user from the imposters. A second project seeks to explore the power of spatial justice to help develop more equitable and just communities. More specifically, this research investigates an array of spatial variables from 2157 census tracts within the state of NC in an effort to understand their impact on economic mobility and potentially spatial justice.
TITLE: Structural variation in the human genome
ABSTRACT: The human genome is comprised of some six billion nucleotides of information packaged in 23 sets of inherited chromosomes. A striking observation from studying the human genome is the extend of similarity among individuals across populations. Therefore, we can gain insights of evolution, human diversity and disease susceptibility by studying a small fraction of the genome that is variable between people. I would like to share with you my journey in exploring genome variation in humans, first in an academic setting during my graduate study, and then in an industry setting working as a bioinformatics scientist. I progressed from studying a single human genome to working with thousands of samples – controls or disease cohorts. And I learned that while no single genomics platform can detect all variants (from single nucleotide change to genome duplication), Bionano Genomics’ optical genome mapping has the potential to replace multiple current standard-of-care tests in detecting structural changes in disease diagnosis.