GEN220 – Computational Analysis of High Throughput Biological Data

Computational Analysis of High Throughput Biological Data

GENETICS 220, offered Fall quarters. See the 2018 course site for more info.

Instructors Jason Stajich


  • Graduate students in a life sciences program or permission of the instructors.
  • Previous coursework in genetics/genomics, molecular biology, or cell biology.

Course Description

With the advancement of high throughput data generation methods, a major challenge that graduate students in life sciences have to face today is to analyze large amount of biological data. The objective of this course is to provide an opportunity for graduate students with no computer science background to learn the basic skills of handling high throughput biological data. It covers the Linux/Unix environment and the importance of the command line interface; the Python programming language; program design, implementation, and testing; BioPython, BEDTools, Phylogenetic methods. Students build hands-on skills by analyzing real high throughput biological data through homework assignments and team projects.

Lecture Topics

  1. Introduction to high throughput data and Python; Linux and programming environment
  2. Scalar data and variables; DNA sequences and random mutations
  3. Control structures; programming strategies;
  4. Subroutines, arrays; next-generation sequences
  5. Dictionaries; Regular expressions
  6. Using the high performance computing to automate tasks and run pipelines

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.