Class 15: Launching AWS instances

Starting your own computer in the cloud

Author

Barry Grant

Published

November 14, 2022

Background

The goal of this hands-on session is to show you how to configure and launch your very own new computer in the cloud. This will allow you to do computational work on remote hardware with capabilities beyond those you may have at hand. For example, in bioinformatics we often need to analyze datasets that are too large (or would take too long) to analyze on our local lab computers.

What is cloud computing?

Cloud computing allows access to arbitrary amounts of compute resources that are physically located elsewhere. Typically you pay for what you use rater than having to pay upfront for new hardware and all its associated maintenance costs. Cloud computing resources exist on servers managed by cloud providers. The most popular cloud providers include Amazon (Amazon Web Services, a.k.a. AWS), Google (Google Compute Engine) and Microsoft (Azure). For academic work in the US we also have NSF/XSEED (who manage access to JetStream and other supercomputers around the country).

Major cloud compute providers. At the time of writing Amazon’s AWS is the market leader by a rather wide margin. We will focus here on learning the basics of AWS but the concepts apply to other cloud computing services as well. The only difference involves some brand specific acronyms and the terminology used to describe various services and actions.

Important Cloud concepts

Virtual Machines (or VMs for short) emulate the architecture and functionality of physical computers. However, they are not sat on our desk but rather live “in the cloud” (actually they are portions of large computer servers sat at remote service centers and not individual machines in the conventional sense) and hence we call them virtual machines. In Amazon’s AWS parlance VMs are called EC2 instances.

Side-note: EC2 stands for Elastic Compute Cloud and by now you should appreciate that this is an area with lots of acronyms made worse by the fact that different vendors use different terminology for similar things.

EC2 instances can be created using different operating systems (i.e. Linux, Windows and Mac) with different CPU, memory, storage and GPU sizes.

Being able to access and use VMs, like we are going to learn here, can eliminate the need to invest in new expensive hardware and avoids the hassle of configuration and maintenance downtime. Many feel that this will become more prevalent in biomedical research in near the future and hence being able to use cloud computing effectively is an important and in-demand skill for a growing number of employers.

Once you have these skills you can launch as many virtual servers as you need, configure their security, networking and manage extra storage options (more on this later). Amazon EC2 also optionally enables auto-scaling up or down to handle changes in requirements (such as spikes in usage) without having to pay up-front. This is why Netflix, Uber and of course Amazon itself are built on cloud computing resources. Another big plus here is helping the user avoid the hassle of hardware purchase, setup, configuration and maintenance.

Accessing the AWS console

The AWS console is a password protected website where you can can configure, launch and control your EC2 instances. We will not cover all it’s functionality here but rather focus on how to launch new instances.

If you are an enrolled student in this course you can access your own AWS console at https://awsed.ucsd.edu/ This will ask for your regular UCSD single-sign-on details and then you should be able to select our course and be re-directed to the AWS console:

To launch your first instance click the large orange “Launch instances” button on the upper right. This will take you to the “quick start” page below. Browse through the list of different machine types there.

Acronym alert: These machine type options are known as Amazon Machine Instance (or AMIs for short). Basically, they are preconfigured templates for launching a Virtual Machine instance. They packages the various applications you may need for your server (including the operating system and possibly additional software).

Select “Ubuntu Server 20.04 LTS (HVM), SSD Volume Type” with the default 64-bit (x86) processor type.

We will select “m5.2xlarge”, which has 8 vCPUs and 32Gb memory. This is a very respectable choice for most small to medium size bioinformatics work. Often you may need more memory than this for typical human genome work for example (sequence read mapping etc.). Feel free to explore other options here later but please stick with this one at least initially.

Getting your private key file

A very important step is creating and downloading a special key file that will allow you to access your instance from the command line.

  • Select “Create a new key pair” (option 1 below, or use a previous one if you already have one and know what you are doing here):
  • Name your new key file something that you will remember with no spaces or funny characters. I strongly suggest using “bioinf_yourname”, for example bioinf underscore barry (e.g. bioinf_barry, note no spaces are allowed here). So you can find and use it later.
  • Click the orange “Create key pair” button and the private key portion should download to your computer.

Configuring security settings

With “m5.2xlarge” selected, jump to step number 6 “Configure Security Group”. This is where you can control how you and others can access your computer in the cloud.

Click “Select an existing security group” and chose the BIMM143/BGGN213 option. If, and only if. you are doing this on your own AWS account (i.e. and not part of the official class) then you will want to create a new security group and make sure you have SSH access on port 22 enabled along with adding a new rule for HTTP access on port 80 and TCP access on port 53XX

Finally increase the storage under “Configure storage” to 30 Gib (gp2) and click the large orange “Launch instance” button.

Conecting to your new instance

After configuring and launching your instance on the last page it will take a short amount of time to setup and become available for use. Note down the instance id somewhere and then click on that id to return to the main AWS console.

Once you see a green “Running” displayed under Instance state (pink box below) we are ready to connect.

Select your instance tick box and and then click the top “Connect” button. This will show example “SSH client” unix commands for connecting to your instance:

We will use a slight variant of this command in your favorite terminal application to access your EC2 instance via ssh - secure shell connection. This will be covered in detail in the next section (different lab page) but briefly here for completeness:

In your Terminal change directory to where you downloaded the private key file

cd ~/Downloads

Change the permission of the key file (make sure to use the name of YOUR key file here and not mine - it is unlikely that your name is also bjgrant ;-)

chmod 400 bioinf_bjgrant.pem

Now it is time to ssh into your EC2 instance with this key file - here use the command you copied from the web site previously (red box above), e.g.

# Use YOUR copied ssh command from above, e.g.
ssh -i "bioinf_bjgrant.pem" ubuntu@ec2-54-200-207-12.us-west-2.compute.amazonaws.com

Again, this will be discussed in detail in the next section, but here is what success looks like for me…

Next we will get to work running some typical bioinformatics analysis on your new shiny VM.

Important-note: Later once you are done with your work please Stop or Terminate your instance so as we are not charged for it any longer. To do this select your instance then click “Instance State” > “Stop Instance”.

Go to next Section >