GEN220 – Computational Analysis of High Throughput Biological Data

Computational Analysis of High Throughput Biological Data

GENETICS 220, offered Fall 2012 – see course page

Instructors: Renyi Liu and Jason Stajich

Time and location: Wed 2:10-4:00PM, Friday 2:10-3:00PM in ULB 104 (HHMI Bioinformatics Lab)



  • Graduate students in a life sciences program or permission of the instructors.
  • Previous coursework in genetics/genomics, molecular biology, or cell biology.

Course Description

With the advancement of high throughput data generation methods, a major challenge that graduate students in life sciences have to face today is to analyze large amount of biological data. The objective of this course is to provide an opportunity for graduate students with no computer science background to learn the basic skills of handling high throughput biological data. It covers the Linux/Unix environment and the importance of the command line interface; the Perl programming language; program design, implementation, and testing; relational databases and how to access databases using Perl; basic data structures and algorithms; BioPerl. Students build hands-on skills by analyzing real high throughput biological data through homework assignments and team projects.

Lecture Topics

  1. Introduction to high throughput data and Perl; Linux and programming environment
  2. Scalar data and variables; DNA sequences and random mutations
  3. Control structures; programming strategies;
  4. Subroutines, arrays; next-generation sequences
  5. Hashes; Regular expressions; team project updates
  6. Object-oriented programming; Perl module; gene expression analysis through sequencing
  7. BioPerl
  8. Relational databases (SQL): schemas; queries; queries via Perl; how to use public biological databases
  9. Using Perl and BioPerl to analyze real biological data (handout)