Bioinformatics for Computer Scientists

Title Bioinformatics for Computer Scientists (56420)
Quarter Autumn 2015
Instructor Thomas Binkowski (abinkowski@cs.uchicago.edu)
Website

http://uchicago.bio

Syllabus

Course Description
This course aims to introduce computer scientists to the field of bioinformatics.  The vast amounts of data produced in genomics related research has significantly transformed the role of biological research.  High-throughput automated biological experiments require advanced algorithms, implemented in high-performance computing systems, to interpret their results.

This course will focus on analyzing complex data sets in the context of biological problems.  Students will design and implement systems that are reliable, capable of handling huge amounts of data, and utilize best practices in interface and usability design to accomplish common bioinformatics related problems.  While this course should be of interest for students interested in biological sciences and biotechnology, techniques and approaches taught will be applicable to other fields.

Course Content
This course will present a practical, hands-on approach to the field of bioinformatics.    The topics covered in this course will include: software, data mining, high-performance computing, mathematical models and other areas of computer science that play an important role in bioinformatics.  Existing methods for analyzing genomes, sequences and protein structures will be explored, as well as computing infrastructure that support their efficient utilization.  Students will be introduced to all of the biology necessary to understand the applications of bioinformatics algorithms and software taught in this course.  No previous biology coursework is required to be successful in the course.

Coursework
Students will complete weekly assignments during the first 6 weeks of class.  The assignments will consist of programming and basic biological research using online resources.  Assignments will reinforce the topics covered in lecture and will contain self-directed opportunities to allow students to pursue their personal interest in the subject.  The programming portions of the assignments will build key pieces of software that will be used in the final project.  The final project will consist of building a robust, online service for bioinformatics analysis using the Google Cloud Platform. 

There will be a midterm and final exam covering basic competency in the biological themes presented throughout the course.

In addition, each student will be required to present a recent development in computer science, biology, bioinformatics or genomics to the class.  The presentations are designed to broaden the students expose to the field with topics selected from recent publications or news sources.

Week 1: Genomics, Bioinformatics and Molecular Biology

A high-level view of increasingly important role of computing in the biological sciences will be presented.  

Week 2: Genomes, Sequences and Databases

A survey of the current state of the art in storing, organizing and analyzing large data sets will be discussed.  The advantages and disadvantages of these methods will be explored in the context of academic and commercial research initiatives.

Week 3: Sequence Alignment

Fast, reliable alignment of text strings started the bioinformatics revolution.  This lecture will show how these seemingly simple strings form the basis of almost all bioinformatics research.

Week 4: Protein Structure and Function

Proteins are central building blocks of all organisms.  This lecture will take bioinformatics to the third-dimension, showcasing how the spatial assembly and interactions of proteins support life and cause of disease.

Week 5: Protein Motifs and Modeling

Understanding protein function holds the promise developing therapeutics and curing diseases, but the computational complexity of analyzing three-dimensional models presents obstacles that have been difficult to overcome.  This lecture will discuss approaches to shape analysis and comparison that can be scaled to large data sets.

Week 6: High-Performance Computing for Bioinformatics

We will discuss how some of the most powerful computing resources in the world are unable answer the simplest questions in bioinformatics.  Strategies for conducting large-scale analysis of genes and proteins will be presented.

Week 7: Student Presentations

Students will present a research topic in bioinformatics of their own choosing.

Week 8: Microarray Data Analysis

Personalized genomic analysis is being used by consumers to better understand their health and their ancestry.  The technologies used to power these services will be introduced as well as the different approaches used to provide web services to analyze the data.

Week 9: SNPs and Disease

The cause of diseases can be as simple as a single misplaced letter in a DNA sequence.  From gene to disease, we will trace the genetic origin of disease.  We will explore different approaches to cataloging and analyzing these changes.

Week 10: In Silico Drug Discovery

Approaches to using computer models to develop new drugs will be presented.  We will discuss how years of playing Tetris might be more useful than you thought in combating antibiotic resistant pathogens.

Week 11: Final Exam and Final Project Presentations

Students will present their final projects.

Textbook
This course will utilize a variety of resources that are available freely online.  In addition, there are several titles available digitally through UChicago library that will be referenced:

The following books are recommended, but not required:

  • Bioinformatics and Functional Genomics by Pevsner, Jonathon, Second Edition, Wiley-Blackwell, ISBN 0470085851

  • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison. Cambridge University Press, 1998.

  • Introduction to Protein Structure by Carl Branden and John Tooze.

Prerequisites (Courses)

MPCS 33001 Algorithms and Core Programming requirements.  

Prerequisites (Other)

Lectures and demonstrations will be conducted mostly in Python.  Python programming experience will be useful, but is not required as long as students are willing to dedicate sufficient time to obtain basic development and debugging skills in the language.  The course is focused on developing solutions to biological problems, not on mastery of any particular language.  Final projects will be implemented on Google Could Platform which supports Python, Java, PHP and Go.

Satisfies

Specialization - Data Analytics
Specialization - High Performance Computing

Time

Mondays, 5:30-8:30

Location

Young 306