Barry Grant < http://thegrantlab.org/teaching/ >
2021-10-18 (14:01:15 on Mon, Oct 18)

1. Background

The goal of this hands-on session is to practice a version control workflow with GitHub and RStudio that streamlines working with our most important collaborator: Future You.

In this session we will interface with GitHub from our local computers using RStudio.

Side-note: There are many other ways to interact with GitHub, including GitHub’s Desktop App and the command line. You have the largest suite of options if you interface through the command line, but the most common things you’ll do can be done through one of these other applications (i.e. RStudio).

Here’s what we’ll do, since we’ve already set up git on your computers in the last section of class. N.B. If you have not setup git to work with RStudio yet then please ask Barry and/or the IAs now!. In this session we will:

  1. Create a repository on Github.com (remote)

  2. Clone locally using RStudio sync local to remote: pull, stage, commit, push

  3. Explore github.com files, commit history, README

  4. Get exposure to project-oriented workflows in action

2. Why should we use Github?

We want to use GitHub because it helps make coding collaborative and social while also providing huge benefits to organization, archiving, and being able to find your files easily when you need them.

One of the most compelling reasons for me is that it ends (or nearly ends) the horror of keeping track of versions.

Basically, we get away from this:

Which version do we want ahhhhh….


This is a nightmare not only because I have NO idea which is truly the version we used in that paper or analysis we need to update, but because it is going to take a lot of detective work to see what actually changed between each file.

Also, it is very sad to think about the amount of time everyone involved is spending on bookkeeping: is everyone downloading an attachment, dragging it to wherever they organize this on their own computers, and then renaming everything? Hours and hours of all of our lives.

But then there is GitHub.

In GitHub, in this example you will likely only see a single file, which is the most recent version. GitHub’s job is to track who made any changes and when (so no need to save a copy with your name or date at the end), and it also requires that you write something human-readable that will be a breadcrumb for you in the future. It is also designed to be easy to compare versions, and you can easily revert to previous versions.

GitHub also supercharges you as a collaborator. First and foremost with Future You, but also sets you up to collaborate with Future Us!

GitHub, especially in combination with RStudio, is also game-changing for publishing and distributing. You can — and we will — publish and share files openly on the internet.

3. What is Github? And Git?

OK so what is GitHub? And Git?

  • Git is a program that you install on your computer: it is version control software that tracks changes to your files over time.

  • Github is a website that is essentially a social media platform for your git-versioned files. GitHub stores all your versioned files as an archive, but also as allows you to interact with other people’s files and has management tools for the social side of software projects. It has many nice features to be able visualize differences between imagesrendering & diffing map data files, render text data files, and track changes in text.

Github was developed for software development, so much of the functionality and terminology of that is exciting for professional programmers (e.g., branches and pull requests) isn’t necessarily the right place for us as new R users to get started.

So we will be learning and practicing GitHub’s features and terminology on a “need to know basis” as we start managing our projects with GitHub.

4. Connecting RStudio to GitHub

We’ve just created a GitHub account in class together. We also made sure git worked on our own computers and that it worked with RStudio. This allowed us to work locally with git and separately access the GitHub website. But what if we want to connect to GitHub from RStudio? How do we do that?

The first step is to sign in or (sign up for a (free) GitHub account).

Create a Personal Access Token (PAT) on GitHub

Once you’ve signed up, you’ll need to enable RStudio to talk to GitHub. The process for doing so has recently changed. The best way to connect RStudio and GitHub is using your username and a Personal Access Token (PAT).

To generate a personal access token, use the create_github_token() function from usethis. This will take you to the appropriate page on the GitHub website, where you’ll give your token a name and copy it (don’t lose it because it will never appear again!). To do this, go to RStudio and type

install.packages("usethis")
library(usethis)
create_github_token()

Store Personal Access Token to Connect RStudio and GitHub

Now that you’ve created a Personal Access Token, we need to store it so that RStudio can access it and know to connect to your GitHub account. The gitcreds_set() function from the gitcreds package will help you here.

install.packages("gitcreds")
library(gitcreds)
gitcreds_set()

You’ll enter your GitHub username and the Personal Access Token as your password (NOT your GitHub password). Just paste the PAT (token) you copied from the GitHub website above. Once you’ve done all of this, you have connected RStudio to GitHub!

5. Create a GitHub repository

Let’s get started syncing our work to GitHub by going back to the main GitHub website <https://github.com> and going to our user profile. You can do this by typing your username in the URL (github.com/username), or after signing in, by clicking on the top-right button and going to your profile.

This will have an overview of you and your work, and then you can click on the Repository tab

Repositories are the main “unit” of GitHub: they are what GitHub tracks. They are essentially project-level folders that will contain everything associated with a project. It’s where we’ll start too.

We create a new repository (called a “repo”) by clicking “New repository.”

We will choose a name that matches our course code i.e. bimm143 or bggn213.

Also, add a brief description, make it public, create a README file, and create your repo!

The Add gitignore option adds a document where you can identify files or file-types you want Github to ignore. These files will stay in on the local Github folder (the one on your computer), but will not be uploaded onto the web version of Github.

The Add a license option adds a license that describes how other people can use your Github files (e.g., open source, but no one can profit from them, etc.). We won’t worry about this today.

Check out our new repository!

Great! So now we have our new repository that exists in the Cloud. Let’s get it established locally on our computers: that is called “cloning.”

6. Clone your repository using RStudio

Let’s clone this repo to our local computer using RStudio. Unlike downloading, cloning keeps all the version control and user information bundled with the files.

Copy the repo address

First, copy the web address of the repository you want to clone. We will use HTTPS.

Aside: HTTPS is default, but you could alternatively set up with SSH. This is more advanced than we will get into here, but allows 2-factor authentication. See Happy Git with R for more information.

7. RStudio: New Project

Now go back to RStudio, and click on New Project. There are a few different ways; you could also go to File > New Project…, or click the little green + with the R box in the top left. also in the File menu).

Select Version Control

Select Git

Since we are using git.

Paste the repo address

Paste the repo address (which is still in your clipboard) into in the “Repository URL” field. The “Project directory name” should autofill; if it does not press tab, or type it in. It is best practice to keep the “Project directory name” THE SAME as the repository name.

When cloned, this repository is going to become a folder on your computer.

At this point you can save this repo anywhere. There are different schools of thought but we think it is useful to create a high-level folder where you will keep your github repos to keep them organized. We call ours github and keep it in our root folder (~/github), and so that is what we will demonstrate here — you are welcome to do the same. Press “Browse…” to navigate to a folder and you have the option of creating a new folder.

Finally, click Create Project.

Admire your local repo

If everything went well, the repository will show up in RStudio!

The repository is also saved to the location you specified, and you can navigate to it as you normally would in Finder or Windows Explorer:

Hooray!

8. Inspect your local repo

Let’s notice a few things:

We have a Git tab in the top right pane! Let’s click on it.

Our Git tab has 2 items:

  • .gitignore file

  • .Rproj file

These have been added to our repo by RStudio — we can also see them in the File pane in the bottom right of RStudio. These are helper files that RStudio has added to streamline our workflow with GitHub and R. We will talk about these a bit more soon. One thing to note about these files is that they begin with a period (.) which means they are hidden files: they show up in the Files pane of RStudio but won’t show up in your Finder or Windows Explorer.

Going back to the Git tab, both these files have little yellow icons with question marks ?. This is GitHub’s way of saying: “I am responsible for tracking everything that happens in this repo, but I’m not sure what is going on with these files yet. Do you want me to track them too?”

We will handle this in a moment; first let’s look at the README.md file.

Edit your README file

Let’s also open up the README.md. This is a Markdown file, which is the same language we just learned with R Markdown. It’s like an R Markdown file without the abilities to run R code.

We will edit the file and illustrate how GitHub tracks files that have been modified (to complement seeing how it tracks files that have been added.

README files are common in programming; they are the first place that someone will look to see why code exists and how to run it.

In my README, I’ll write:

This repo is for my UCSD bioinformatics class. 

When I save this, notice how it shows up in my Git tab. It has a blue “M”: GitHub is already tracking this file, and tracking it line-by-line, so it knows that something is different: it’s Modified with an M.

Great. Now let’s sync back to GitHub in 4 steps.

9. Sync from RStudio (local) to GitHub (remote)

Syncing to GitHub.com means 4 steps:

  1. Pull

  2. Stage

  3. Commit

  4. Push

We start off this whole process by clicking on the Commit section.

Pull

We start off by “Pulling” from the remote repository (GitHub.com) to make sure that our local copy has the most up-to-date information that is available online. Right now, since we just created the repo and are the only ones that have permission to work on it, we can be pretty confident that there isn’t new information available. But we pull anyways because this is a very safe habit to get into for when you start collaborating with yourself across computers or others. Best practice is to pull often: it costs nothing (other than an internet connection).

Pull by clicking the teal Down Arrow. (Notice also how when you highlight a filename, a preview of the differences displays below).

Stage

Let’s click the boxes next to each file. This is called “staging a file”: you are indicating that you want GitHub to track this file, and that you will be syncing it shortly. Notice:

  • .Rproj and .gitignore files: the question marks turn into an A because these are new files that have been added to your repo (automatically by RStudio, not by you).

  • README.md file: the M indicates that this was modified (by you)

These are the codes used to describe how the files are changed, (from the RStudio cheatsheet):



Commit

Committing is different from saving our files (which we still have to do! RStudio will indicate a file is unsaved with red text and an asterix). We commit a single file or a group of files when we are ready to save a snapshot in time of the progress we’ve made. Maybe this is after a big part of the analysis was done, or when you’re done working for the day.

Committing our files is a 2-step process.

First, you write a “commit message,” which is a human-readable note about what has changed that will accompany GitHub’s non-human-readable alphanumeric code to track our files. I think of commit messages like breadcrumbs to my Future Self: how can I use this space to be useful for me if I’m trying to retrace my steps (and perhaps in a panic?).

Second, you press Commit.

When we have committed successfully, we get a rather unsuccessful-looking pop-up message. You can read this message as “Congratulations! You’ve successfully committed 3 files, 2 of which are new!” It is also providing you with that alphanumeric SHA code that GitHub is using to track these files.

If our attempt was not successful, we will see an Error. Otherwise, interpret this message as a joyous one.

Does your pop-up message say “Aborting commit due to empty commit message.” GitHub is really serious about writing human-readable commit messages.

When we close this window there is going to be (in my opinion) a very subtle indication that we are not done with the syncing process.

We have successfully committed our work as a breadcrumb-message-approved snapshot in time, but it still only exists locally on our computer. We can commit without an internet connection; we have not done anything yet to tell GitHub that we want this pushed to the remote repo at GitHub.com. So as the last step, we push.

Push

The last step in the syncing process is to Push!

Awesome! We’re done here in RStudio for the moment, let’s check out the remote on GitHub.com.

Commit history

The files you added should be on github.com.

Notice how the README.md file we created is automatically displayed at the bottom. Since it is good practice to have a README file that identifies what code does (i.e. why it exists), GitHub will display a Markdown file called README nicely formatted.

Let’s also explore the commit history. The 2 commits we’ve made (the first was when we originally initiated the repo from GitHub.com) are there!

10. Add previous class projects to your GitHub class repo.

Now go to your File explorer (PC) or Finder window (mac) and locate the folder where your class repo lives on yur computer.

Once you have fond this lets copy our provious class folders into this new location. Your objective here is to have a copy of all your R work to date within your repo. BY doing this you will be getting valuable practice syncing files to GitHub: pull, stage, commit, push etc.

Troubleshooting: What if a file doesn’t show up in the Git tab and you expect that it should? Check to make sure you’ve saved the file. If the filename is red with an asterix, there have been changes since it was saved. Remember to save before syncing to GitHub!

Explore your Commit History, and discuss with your neighbor.

11. Committing - how often? Tracking changes in your files

Whenever you make changes to the files in Github, you will walk through the Pull -> Stage -> Commit -> Push steps.

I tend to do this every time I finish a task (basically when I start getting nervous that I will lose my work). Once something is committed, it is very difficult to lose it.

About this document

Here we use the sessionInfo() function to report on our R systems setup at the time of document execution.

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] labsheet_0.1.2
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.28   R6_2.5.1        jsonlite_1.7.2  magrittr_2.0.1 
##  [5] evaluate_0.14   stringi_1.7.4   rlang_0.4.11    jquerylib_0.1.4
##  [9] bslib_0.3.0     rmarkdown_2.10  tools_4.1.1     stringr_1.4.0  
## [13] xfun_0.25       yaml_2.2.1      fastmap_1.1.0   compiler_4.1.1 
## [17] htmltools_0.5.2 knitr_1.34      sass_0.4.0
 

Powered by labsheet