Introduction to Biocomputing 2015 (BIOS/BIOI/HG 606)

For details of the 2016 version of this class please see our new web site.

Overview: This hands-on boot camp introduces new graduate students to computational tools, techniques and best practices that foster reproducible research in bioinformatics, genome informatics and biostatistics.

Description: Concepts and tools covered include the Unix system, version control, data management, software compilation, task automation and cluster computing. Participants will be encouraged to help one another and to apply what they have learned to their own research problems. Our tools of choice will be Python (for programming), Git (for version control) and PBS (for cluster resource management). However, lessons learned should be widely applicable for those looking to incorporate more productive computational approaches into their daily research work.

Audience: Students with little to no UNIX experience and no formal programming training.

Requirements: Participants must bring a laptop with specific software packages installed.

When: August 24-28, 9:00 AM - 4:00 PM

Where: 3755 SPH1 (School of Public Health building 1) Map

Class Questionnaire Please help us improve this course by completing this questionnaire.


Day 1. Introduction to UNIX

Session Time Topics
I 9:00-10:15 AM Setup and Motivation
  10:15-10:30 AM Coffee Break
II 10:30-12:00 AM Beginning Unix
  12:00-1:00 PM Lunch
III 1:00-2:15 PM Working with Unix
  2:15-2:30 PM Coffee Break
IV 2:30-4:00 PM How to Get Working


Day 2. Introduction to Programming

Session Time Topics
I 9:00-10:15 AM Intro to Python and Programming Concepts
  10:15-10:30 AM Coffee Break
II 10:30-12:00 AM Variables, and Data Structures
  12:00-1:00 PM Lunch
III 1:00-2:15 PM Control Structures and Functions
  2:15-2:30 PM Coffee Break
IV 2:30-4:00 PM System Calls, Plotting, and iPython Notebooks


Day 3. Data Formats and Visualization

Session Time Topics
I 9:00-10:15 AM Mini-Practice : FASTQ File Manipulation
  10:15-10:30AM Coffee Break
II 10:30-11:15 AM Lecture : Data Formats and Conversions
III 11:15-12:00 AM Mini-Practice: Select a subset of variant/genotype calls
  12:00-1:00PM Lunch
IV 1:00-2:15 PM Practice : Analysis with Genomic Data Formats
  2:15-2:30 PM Coffee Break
V 2:30-4:00 PM Visualization: Overview and Practice


Day 4. Version Control and Cluster Computing

Session Time Topics
I 9:00-10:15 AM Version Control with Git
  10:15-10:30 AM Coffee Break
II 10:30-12:00 AM Collaborating with GitHub & BitBucket
  12:00-1:00 PM Lunch
III 1:00-2:15 PM Concepts in Cluster Computing
  2:15-2:30 PM Coffee Break
IV 2:30-4:00 PM Parallelization Strategies and Workflow Management


Day 5. Unified Analytical Group Projects

Session Time Topics
I 9:00-10:15 AM Introduction to eQTLs and Overview of Project
  10:15-10:30 AM Coffee Break
II 10:30-12:00 AM Obtaining, Parsing and Formatting Data
  12:00-1:00 PM Lunch
III 1:00-2:15 PM Parallel Association Testing and Visualization
  2:15-2:30 PM Coffee Break
IV 2:30-4:00 PM Group Presentations and Discussion



Other courses

Mini course: Introduction to Python
(Sept. 14 – 21, 8:30-10 AM) http://tinyurl.com/bioboot-1

Workshops: ARC On-campus HPC training
(Sept. 14 – Oct. 8, 1-5 PM) http://tinyurl.com/bioboot-3

Full course: BIOINF-575 Programing Lab in Bioinformatics
(Winter term) http://tinyurl.com/bioboot-4

Symposium: Computational Discovery in Complex System Biology
(Sept. 22, 9-5 PM) http://tinyurl.com/bioboot-2