Introduction
Welcome to the Linux command line! If you're used to working in RStudio, think of the bash shell (the most common Linux command-line interpreter) as a text-based interface to your computerβvery much like RStudio's console, but for your operating system rather than just R. Instead of clicking through menus and folder windows, you type commands to navigate, manage files, and run programs.
This tutorial is designed for R users who are new to Linux and HPC (high-performance computing) environments. By the end, you'll be comfortable enough to navigate the filesystem, manage files, run jobs, and launch a full RStudio session directly in your browser on UF's HiPerGator cluster.
π‘ R Analogies Throughout Because you already think in R, we'll use R analogies throughout this tutorial to connect new concepts to things you already know.
Why Learn the Command Line?
- HPC access is terminal-based. HiPerGator doesn't have a graphical desktop β you connect via SSH and type commands.
- Automation. Shell scripts automate repetitive tasks the same way R scripts automate analyses.
- Bioinformatics tools. Nearly all genomics and bioinformatics software is designed to run from the command line.
- Speed and power. Many file operations (moving, searching, compressing thousands of files) are far faster on the command line than through a GUI.
Getting Started
Opening a Terminal
- On a Mac: Open Terminal (Applications β Utilities β Terminal) or use iTerm2.
- On Windows: Use Windows Terminal with WSL (Windows Subsystem for Linux), or install MobaXterm for HPC work.
- On Linux: Open your desktop's built-in terminal application.
- On HiPerGator: After logging in via SSH, you're already at a terminal.
The Command Prompt
When you open a terminal, you'll see something like:
username@hostname:~$
| Part | Meaning |
|---|---|
username |
Your login name |
hostname |
The name of the machine you're on |
~ |
Your current location (the ~ is shorthand for your home directory) |
$ |
Indicates you are a regular (non-root) user |
π R Analogy The prompt is like the
>you see in R's console β it means the shell is ready for your next command.
Key Terminology
| Term | Linux | R / RStudio Equivalent |
|---|---|---|
| Directory | A folder in the filesystem | A folder in the Files pane |
| File | Any stored object (script, data, log) | An .R, .csv, .rds file |
| Command | An instruction you type and run | A function call |
| Shell | The program (bash) that runs your commands | R's interpreter |
| Working directory | Where you currently "are" in the filesystem | getwd() result |
| Path | The address of a file or directory | A file path in read.csv("path/to/file.csv") |
| Argument / Flag | An option that modifies a command's behavior | A function argument |
Navigating the Filesystem
pwd β Where Am I?
pwd (print working directory) shows your current location.
pwd
/home/username
π R Analogy:
getwd()pwdis the shell equivalent ofgetwd()in R.
ls β What's Here?
ls (list) shows the contents of your current directory.
ls
data results.csv scripts
Useful ls Options
| Command | What it does |
|---|---|
ls -l |
Long format β shows permissions, owner, size, and date |
ls -a |
Show all files, including hidden ones (starting with .) |
ls -lh |
Long format with human-readable file sizes (KB, MB, GB) |
ls -lt |
Long format sorted by modification time, newest first |
ls -lhS |
Sort by file size, largest first |
ls -lh
total 24K
drwxr-xr-x 2 username group 4.0K Jun 13 09:00 data
-rw-r--r-- 1 username group 12K Jun 13 08:55 results.csv
drwxr-xr-x 2 username group 4.0K Jun 13 09:01 scripts
π R Analogy: Files Pane
lsis like looking at RStudio's Files pane, andls -lhis like switching to "Details" view.
cd β Move Around
cd (change directory) navigates between directories.
cd data
pwd
/home/username/data
Essential cd shortcuts
| Command | What it does |
|---|---|
cd .. |
Go up one level (parent directory) |
cd ~ or just cd |
Go to your home directory |
cd / |
Go to the root of the entire filesystem |
cd - |
Go back to your previous directory |
cd /blue/cancercenter/username |
Use an absolute path to go anywhere directly |
π‘ Absolute vs. Relative Paths
- An absolute path starts from root (
/) and always works regardless of where you are:/home/username/data- A relative path is relative to your current location:
dataor../scriptsThis is exactly the same distinction as in R:
"/home/username/data/file.csv"vs"data/file.csv".
Tab Completion
This is one of the most important habits to build. Press Tab after partially typing a file or directory name and bash will complete it for you. Press Tab twice to see all possible completions if there are multiple matches.
cd dat<TAB> # completes to: cd data
This prevents typos and saves significant time.
Managing Files and Directories
mkdir β Create Directories
mkdir (make directory) creates a new folder.
mkdir analysis
ls
analysis data results.csv scripts
Use -p to create nested directories all at once:
mkdir -p project/data/raw
This creates project/, project/data/, and project/data/raw/ in one step β very useful for setting up project structures.
touch β Create an Empty File
touch creates an empty file (or updates the timestamp of an existing one).
touch scripts/analysis.R
cp β Copy Files and Directories
cp (copy) duplicates files or folders.
# Copy a file
cp results.csv results_backup.csv
# Copy into a directory
cp results.csv data/
# Copy a directory and all its contents (requires -r for recursive)
cp -r data data_backup
mv β Move or Rename
mv (move) moves files/directories or renames them. Unlike cp, the original is removed.
# Rename a file
mv analysis.R analysis_v1.R
# Move a file into a directory
mv analysis_v1.R scripts/
# Move and rename in one step
mv scripts/analysis_v1.R archive/analysis_final.R
rm β Delete Files and Directories
β οΈ No Recycle Bin on Linux
rmpermanently deletes files. There is no Trash or Undo. On HiPerGator especially, double-check before runningrm.
# Delete a file
rm results_backup.csv
# Delete a directory and everything inside it
rm -r data_backup
# Interactive mode β asks for confirmation before each deletion (recommended)
rm -i important_file.csv
ln β Create Symbolic Links
Symbolic links (symlinks) are like shortcuts or aliases β they point to a file or directory without duplicating it. This is very useful on HiPerGator for organizing data across filesystems without copying large files.
# Create a symlink called "raw_data" pointing to the actual data location
ln -s /blue/cancercenter/shared/project_data raw_data
ls -la
lrwxrwxrwx 1 username group 38 Jun 13 09:10 raw_data -> /blue/cancercenter/shared/project_data
Viewing and Working with Files
cat β Print File Contents
cat prints the entire contents of a file to the terminal. Best for small files.
cat scripts/analysis.R
less β Scroll Through Files
less is for viewing large files interactively β it doesn't load everything at once.
less results.csv
| Key | Action in less |
|---|---|
Space or f |
Page down |
b |
Page back |
g |
Go to beginning |
G |
Go to end |
/<term> |
Search forward for <term> |
q |
Quit |
head and tail β View File Beginnings/Ends
# First 10 lines (default)
head results.csv
# First 20 lines
head -n 20 results.csv
# Last 10 lines
tail results.csv
# Watch a file update in real time (great for monitoring log files)
tail -f rserver_12345.log
π R Analogy:
head()andtail()These work exactly likehead()andtail()in R, and are especially useful for peeking at large data files without loading them.
wc β Count Lines, Words, Characters
# Count lines in a file (-l for lines only)
wc -l results.csv
1001 results.csv
This tells you results.csv has 1001 lines β useful for quickly checking if a file has the expected number of rows.
grep β Search Inside Files
grep searches for patterns (like grep or stringr::str_detect() in R).
# Find all lines containing "BRCA1"
grep "BRCA1" gene_list.txt
# Case-insensitive search
grep -i "brca1" gene_list.txt
# Show line numbers
grep -n "BRCA1" gene_list.txt
# Count matching lines
grep -c "BRCA1" gene_list.txt
# Search recursively in all files under a directory
grep -r "BRCA1" scripts/
Pipes, Redirection, and Combining Commands
One of the most powerful aspects of the Linux command line is combining simple commands to accomplish complex tasks.
Pipes: |
The pipe | sends the output of one command as the input to the next β exactly like R's %>% (magrittr) or |> (native pipe).
# Count how many files are in the current directory
ls -l | wc -l
# Find lines containing "error" in a log, then show only the first 20
grep "error" pipeline.log | head -20
# List unique sample IDs from column 1 of a file, sorted alphabetically
cut -f1 samples.txt | sort | uniq
π R Analogy:
|>or%>%ls -l | wc -lis likelist.files() |> length()in R.
Redirection: > and >>
Redirect output to a file instead of the terminal.
# Write output to a file (overwrites existing content)
ls -lh > file_list.txt
# Append output to a file (adds to the end)
echo "Analysis complete" >> run_notes.txt
β οΈ Warning Using
>will silently overwrite an existing file. Use>>when you want to add to an existing file.
echo β Print Text
echo prints text to the terminal or into a file.
echo "Hello, HiPerGator"
# Write a simple header into a new file
echo "sample_id,condition,batch" > metadata.csv
Getting Help
man β Manual Pages
man ls
man opens the full manual for a command. Navigate with the same keys as less, and press q to quit.
--help
Most commands also accept a --help flag for a shorter summary:
ls --help
cp --help
π R Analogy:
?man lsis the shell equivalent of?lsin R β it's the built-in help system.
which β Find Where a Command Lives
which R
which python
/apps/compilers/gcc/12.2.0/R/4.4.1/bin/R
This tells you which version of a program is currently active β very useful on HPC systems where multiple versions may be installed.
Working on HiPerGator
HiPerGator uses the SLURM workload manager to schedule and run computational jobs. Rather than running analyses directly (which would use shared login-node resources), you submit jobs to a queue, and SLURM allocates compute nodes for you.
Connecting via SSH
From your local terminal:
ssh username@hpg.rc.ufl.edu
You'll be prompted for your password and Duo two-factor authentication.
The Module System
HiPerGator uses Lmod (Environment Modules) to manage software. Rather than having all software installed globally, you load only what you need.
# See what modules are currently loaded
module list
# Search for available versions of a package
module spider R
# Load a specific module
module load R
# Unload a module
module unload R
# Remove all loaded modules (start fresh)
module purge
Filesystems on HiPerGator
| Filesystem | Location | Best For | Notes |
|---|---|---|---|
| Home | /home/username |
Scripts, config files | Small quota (~40 GB), backed up |
| Blue | /blue/cancercenter/username |
Primary project data and results | Large quota, not backed up |
| Orange | /orange/cancercenter/username |
Long-term storage, archiving | Slower I/O than blue |
π‘ Tip Run large pipelines (like
nf-core/methylseq) with data on/blue. Use/orangefor archiving completed projects.
SLURM: Submitting Jobs
Key SLURM Commands
| Command | What it does |
|---|---|
sbatch script.sh |
Submit a job script to the queue |
squeue -u $USER |
Check the status of your jobs |
scancel <jobid> |
Cancel a running or pending job |
sinfo |
View available partitions and node status |
Anatomy of a SLURM Script
A SLURM batch script is a bash script with special #SBATCH header lines that tell the scheduler what resources you need.
#!/bin/bash
#SBATCH --job-name=my_analysis # Name shown in the queue
#SBATCH --nodes=1 # Number of nodes (usually 1)
#SBATCH --ntasks=1 # Number of parallel tasks
#SBATCH --cpus-per-task=4 # CPU cores per task
#SBATCH --mem=32gb # Memory to reserve
#SBATCH --time=04:00:00 # Max wall time (HH:MM:SS)
#SBATCH --output=%x_%j.log # Log file (%x=jobname, %j=jobid)
#SBATCH --account=cancercenter-dept # Billing account
#SBATCH --qos=cancercenter-dept # Quality of service
#SBATCH --mail-type=END,FAIL # Email on job end or failure
#SBATCH --mail-user=$USER@ufl.edu # Your email
module purge
module load R/4.4.1
Rscript scripts/analysis.R
Running RStudio on HiPerGator
Rather than working in a plain terminal, you can run a full RStudio session on a HiPerGator compute node and access it through your web browser β giving you a familiar environment while using the cluster's compute resources.
Overview
The process works in three steps:
- Submit a SLURM job that starts an RStudio Server (
rserver) on a compute node - Set up an SSH tunnel from your laptop to that compute node
- Open your browser to
http://localhost:8080
Step 1: Create the SLURM Script
Create a file called rserver.sbatch in your home directory with the following content:
#!/bin/bash
#SBATCH --job-name=rserver
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8gb
#SBATCH --time=02:00:00
#SBATCH --output=rserver_%j.log
#SBATCH --account=cancercenter-dept
#SBATCH --qos=cancercenter-dept
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=$USER@ufl.edu
module purge
module load R
rserver
π‘ Tip You can adjust
--memand--timebased on your session needs. For interactive data exploration, 8β16 GB and 2β4 hours is usually sufficient. For memory-intensive work, you can go up to--mem=64gb.
You can create this file directly from the terminal:
# Open nano text editor to create the file
nano rserver.sbatch
Paste the content above, then press Ctrl+O to save and Ctrl+X to exit.
Step 2: Submit the Job
sbatch rserver.sbatch
Submitted batch job 12345678
SLURM will print a job ID (e.g., 12345678). The job may take a moment to start depending on queue wait times. Check its status with:
squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST
12345678 hpg2-comp rserver jbrant R 0:23 1 c12345a-s42
Once ST shows R (Running), move on.
Step 3: Read the Log File
Once the job is running, read its log file to get the SSH tunnel command:
cat rserver_12345678.log
The log will contain a line like:
ssh -N -L 8080:c12345a-s42.ufhpc:37546 username@hpg.rc.ufl.edu
Copy this entire line β you'll run it on your local machine (not on HiPerGator).
Step 4: Open the SSH Tunnel
Open a new terminal window on your local computer (don't close the HiPerGator session). Paste and run the ssh command from the log:
ssh -N -L 8080:c12345a-s42.ufhpc:37546 username@hpg.rc.ufl.edu
| Part | Meaning |
|---|---|
-N |
Don't execute a remote command β just forward the port |
-L 8080:... |
Forward your local port 8080 to the compute node |
username@hpg.rc.ufl.edu |
Your HiPerGator login |
Enter your password and complete Duo authentication. The terminal will appear to hang β this is normal. The tunnel is active as long as this window is open. Do not close it.
Step 5: Open RStudio in Your Browser
Open any web browser on your local computer and navigate to:
http://localhost:8080
You should see an RStudio login page. Log in with your HiPerGator credentials.
π‘ Your RStudio Session Is Running on the Cluster Everything you do in this RStudio session β loading data, running models, installing packages β is executing on the HiPerGator compute node, not your laptop. You can load files from
/blue/cancercenter/username/just as you would from a local path.
Ending Your Session
When you're finished:
- Save your work and close RStudio in the browser
- In the tunnel terminal on your laptop, press
Ctrl+Cto close the SSH tunnel - Cancel the SLURM job if time remains:
scancel 12345678
β οΈ Warning If you simply close the tunnel window without canceling the job, the RStudio server continues running on the cluster (consuming your allocation) until the
--timelimit is reached.
Putting It All Together: A Typical Workflow
Here's what a typical session on HiPerGator might look like for a biostatistician starting a new project:
# 1. Log in to HiPerGator
ssh username@hpg.rc.ufl.edu
# 2. Navigate to your project space on /blue
cd /blue/cancercenter/username
# 3. Create a structured project directory
mkdir -p my_project/{data/raw,data/processed,scripts,results,logs}
# 4. Check what you've created
ls -R my_project
# 5. Copy or link to shared data
ln -s /blue/cancercenter/shared/cohort_data my_project/data/raw/cohort_data
# 6. Transfer a local script (done from your local machine)
# scp local_analysis.R username@hpg.rc.ufl.edu:/blue/cancercenter/username/my_project/scripts/
# 7. Submit an RStudio session to work interactively
sbatch rserver.sbatch
# 8. Check the log once it starts
tail -f rserver_*.log
# 9. Copy the SSH tunnel line, open it on your laptop, and go to http://localhost:8080
Tips and Best Practices
- Tab complete everything. It prevents typos in file paths and saves time.
- Use the up arrow to scroll through command history β you rarely need to retype a long command.
historyprints your recent command history;history | grep sbatchfinds all sbatch commands you've run.Ctrl+Ccancels a running command if something goes wrong or hangs.Ctrl+Lorclearclears the terminal screen.- Never run jobs on the login node. Use
sbatchfor anything computationally intensive. - Be careful with
rm -r. On a shared filesystem, deleted files are gone forever. When in doubt, move things to anarchive/folder first. - Check your quota periodically with the
squotautility available in our group's shared tools.
Quick Reference
Filesystem Navigation
| Command | Action |
|---|---|
pwd |
Print current directory |
ls -lh |
List directory contents with sizes |
cd <dir> |
Change directory |
cd .. |
Go up one level |
cd ~ |
Go to home directory |
File Management
| Command | Action |
|---|---|
mkdir -p <dir> |
Create directory (and parents) |
cp <src> <dst> |
Copy file |
cp -r <src> <dst> |
Copy directory recursively |
mv <src> <dst> |
Move or rename |
rm <file> |
Delete file (no undo!) |
rm -r <dir> |
Delete directory recursively |
ln -s <target> <link> |
Create symbolic link |
Viewing Files
| Command | Action |
|---|---|
cat <file> |
Print file contents |
less <file> |
Scroll through a file |
head -n 20 <file> |
First 20 lines |
tail -f <file> |
Watch file update in real time |
wc -l <file> |
Count lines |
grep "pattern" <file> |
Search inside a file |
SLURM
| Command | Action |
|---|---|
sbatch script.sbatch |
Submit a job |
squeue -u $USER |
Check your job status |
scancel <jobid> |
Cancel a job |
sacct -j <jobid> |
View completed job accounting info |
Getting Help
| Command | Action |
|---|---|
man <command> |
Full manual page |
<command> --help |
Quick help summary |
which <program> |
Find where a program is installed |
module spider <name> |
Search for available software modules |
Next Steps
Once you're comfortable with the basics covered here, explore:
nanoorvim: Terminal text editors for editing scripts directly on the clusterscreenortmux: Keep sessions running after you disconnect from SSH- Shell scripting: Writing
.shscripts to automate pipelines (like writing R functions to wrap repeated code) awkandsed: Powerful text-processing tools for manipulating tabular datascpandrsync: Transfer files to and from HiPerGator- SLURM arrays: Submit hundreds of parallel jobs with a single
sbatchcommand
Resources
- UF Research Computing HiPerGator Documentation
- SLURM Documentation
- The Linux Command Line (free book) by William Shotts
- Software Carpentry: The Unix Shell
- Explain Shell β paste any command to get a plain-English explanation