Day 5. Unified Analytical Group Projects

On the final day of this bootcamp, we will group up with the other members of our table and embark on a joint effort to take what we have learned this past week and use it to explore a large-scale biological data set in a collaborative fashion. We will make use of the Geuvadis Project which combines data on genetic variation from the 1000 Genomes Project with gene expression measurements derived from RNA-sequencing generated in Lappalainen et al, Nature 2013 to detect and visualize potential expression quantative trait loci (eQTL).

Typically, such research projects can take a very long time to generate the data and analyze the results. For the purposes of this bootcamp, we will be using a small subset of these data and will attempt to recreate the published results over these regions. Our goal is to give you a taste of what types of data exploration are now available to you with the simple yet powerful biocomputing tools you have learned and to serve as a foundation for your future research endeavors.


Session Time Topics
I 9:00-10:15 AM Introduction to eQTLs and Overview of Project
  10:15-10:30 AM Coffee Break
II 10:30-12:00 AM Obtaining, Parsing and Formatting Data
  12:00-1:00 PM Lunch
III 1:00-2:15 PM Parallel Association Testing and Visualization
  2:15-2:30 PM Coffee Break
IV 2:30-4:00 PM Group Presentations and Discussion


Ryan Mills (RM)
Jacob Kitzman (JK)
Barry Grant (BG)
Hui Jiang (HJ)

Class Questionnaire

Please help us improve this course by completing this questionnaire.

Data Sets:

  • Genotype data for 465 individuals
    • Remote site
    • Local FLUX directory: /scratch/biobootcamp_fluxod/remills/bioboot/geuvadis/genotypes
  • Expression data for 465 individuals
    • Remote site
    • Local FLUX directory: /scratch/biobootcamp_fluxod/remills/bioboot/geuvadis/analysis_results

Project Resources

Analysis notebooks

  • For this exercise, we will need to run ipython notebook on flux. As with Day 4, start a notebook server with the following commands

    • Start a notebook server
    • Notes:
      1. you need to know which host you’re on. The command line prompt will show this (e.g., flux-login3)
      2. you need to know which port your instance of ipython listens to. The first few lines of output from ipython notebook will list this for you.
    remills@flux-login3:/scratch/biobootcamp_fluxod/remills/biobootcamp$ ipython notebook --ip=flux-login3 --no-browser
    [I 13:26:18.660 NotebookApp] Using MathJax from CDN:
    [I 13:26:19.220 NotebookApp] The port 8888 is already in use, trying another random port.
    [I 13:26:19.473 NotebookApp] Serving notebooks from local directory: /scratch/biobootcamp_fluxod/kitzmanj
    [I 13:26:19.473 NotebookApp] 0 active kernels
    [I 13:26:19.473 NotebookApp] The IPython Notebook is running at: http://flux-login3:8889/
    [I 13:26:19.473 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    • A word of warning: in general, running on the head node of the flux cluster is not a good practice, because the head node controls the job queue for the entire cluster. Instead you would want to enter the queue, login to a compute node, and do your work there.

    • Now, your own ipython server is running on flux. But you will need to open a tunnel to get there. The below command will securely connect port 8889 on the computer flux-login3 to port 9000 on my computer.

     ssh -L localhost:9000:flux-login3:8889
    • Navigate to http://localhost:9000 and make a new notebook
  • Ipython notebook reminders:

    • Green box - marks active cell. You are in EDIT mode.
    • Hit escape to exit mode mode and go to COMMAND mode.
    • In command mode:
      • A : insert new cell above
      • B : insert new cell below
      • Enter : edit the currently selected cell
      • up/down arrow : navigate up or down
    • In either:
      • Shift-Enter : run the current cell
    • Remember, you can run commands out of order in the notebook.