Day 5. Unified Analytical Group Projects
On the final day of this bootcamp, we will group up with the other members of our table and embark on a joint effort to take what we have learned this past week and use it to explore a large-scale biological data set in a collaborative fashion. We will make use of the Geuvadis Project which combines data on genetic variation from the 1000 Genomes Project with gene expression measurements derived from RNA-sequencing generated in Lappalainen et al, Nature 2013 to detect and visualize potential expression quantative trait loci (eQTL).
Typically, such research projects can take a very long time to generate the data and analyze the results. For the purposes of this bootcamp, we will be using a small subset of these data and will attempt to recreate the published results over these regions. Our goal is to give you a taste of what types of data exploration are now available to you with the simple yet powerful biocomputing tools you have learned and to serve as a foundation for your future research endeavors.
|I||9:00-10:15 AM||Introduction to eQTLs and Overview of Project|
|10:15-10:30 AM||Coffee Break|
|II||10:30-12:00 AM||Obtaining, Parsing and Formatting Data|
|III||1:00-2:15 PM||Parallel Association Testing and Visualization|
|2:15-2:30 PM||Coffee Break|
|IV||2:30-4:00 PM||Group Presentations and Discussion|
Ryan Mills (RM)
Jacob Kitzman (JK)
Barry Grant (BG)
Hui Jiang (HJ)
Please help us improve this course by completing this questionnaire.
- Genotype data for 465 individuals
- Remote site
- Local FLUX directory: /scratch/biobootcamp_fluxod/remills/bioboot/geuvadis/genotypes
- Expression data for 465 individuals
- Remote site
- Local FLUX directory: /scratch/biobootcamp_fluxod/remills/bioboot/geuvadis/analysis_results
For this exercise, we will need to run ipython notebook on flux. As with Day 4, start a notebook server with the following commands
- Start a notebook server
- you need to know which host you’re on. The command line prompt will show this (e.g., flux-login3)
- you need to know which port your instance of ipython listens to. The first few lines of output from ipython notebook will list this for you.
remills@flux-login3:/scratch/biobootcamp_fluxod/remills/biobootcamp$ ipython notebook --ip=flux-login3 --no-browser [I 13:26:18.660 NotebookApp] Using MathJax from CDN: https://cdn.mathjax.org/mathjax/latest/MathJax.js [I 13:26:19.220 NotebookApp] The port 8888 is already in use, trying another random port. [I 13:26:19.473 NotebookApp] Serving notebooks from local directory: /scratch/biobootcamp_fluxod/kitzmanj [I 13:26:19.473 NotebookApp] 0 active kernels [I 13:26:19.473 NotebookApp] The IPython Notebook is running at: http://flux-login3:8889/ [I 13:26:19.473 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
A word of warning: in general, running on the head node of the flux cluster is not a good practice, because the head node controls the job queue for the entire cluster. Instead you would want to enter the queue, login to a compute node, and do your work there.
Now, your own ipython server is running on flux. But you will need to open a tunnel to get there. The below command will securely connect port 8889 on the computer flux-login3 to port 9000 on my computer.
ssh -L localhost:9000:flux-login3:8889 YOURNAME@flux-login.engin.umich.edu
- Navigate to http://localhost:9000 and make a new notebook
Ipython notebook reminders:
- Green box - marks active cell. You are in EDIT mode.
- Hit escape to exit mode mode and go to COMMAND mode.
- In command mode:
- A : insert new cell above
- B : insert new cell below
- Enter : edit the currently selected cell
- up/down arrow : navigate up or down
- In either:
- Shift-Enter : run the current cell
- Remember, you can run commands out of order in the notebook.