<- kefj_datetime[kefj_site == ???]
???_datetime <- ???_datetime[???:???] - ???_datetime[???:???]
???_interval e(???) t???
PROG101: Intro to R and RStudio
This is the first module of the Programming track. By the end of this module, you’ll learn how to:
Write and run R code in the RStudio IDE
Store and access data in vectors
Call functions to manipulate data
Pre-class preparation
Set up the PROG101 module on your computer (see Module Setup and Submission). There you’ll find the guided notes and exercises to accompany these recorded lectures.
Lectures
Introduction to R and RStudio
Expressions and variables
Using functions
Basic vectors
Working with multiple vectors
In-class activity
Extreme temperatures in an Alaskan intertidal zone
The intertidal zone is where the marine and terrestrial biomes meet. Organisms in this habitat must be able to withstand saltwater immersion, air exposure, and the impact of crashing waves. Some intertidal species are particularly important to both human and ecological communities, including bivalve shellfish like oysters, clams, and mussels. Bivalves may experience direct negative impacts from rising ocean temperatures, but they may receive indirect positive impacts from declining predator populations. From 2014-2016, an extreme marine heatwave gripped the North Pacific, popularly called “The Blob”. Around the same time, in 2013-2014, an outbreak of sea star wasting disease decimated sea stars along the North American Pacific coast, dramatically reducing the populations of these important bivalve predators. These two events could potentially push Alaskan mussel populations in opposing directions, which raises the question: have bivalves been winners or losers in a changing climate? Traiger et al. investigated this question using long-term monitoring data collected around the Gulf of Alaska (Traiger et al. 2022). Key to their study was an analysis of extreme air temperature. In this activity, you will replicate that part of their methods. Follow the assignment submission guidelines when you’ve completed this activity.
The template repository for this module is available at MarineCS-100B/prog101.
Guiding questions
Read the abstract and sections 2-2.1 of the methods in Traiger et al., then answer the following questions.
What does KEFJ stand for?
How was temperature monitored?
What’s the difference between absolute temperature and temperature anomaly?
Begin exploring
The temperature and associated data from KEFJ is available in the marinecs100b
R package. Use ?kefj
to open the help page for these data.
How many
kefj_*
vectors are there?How long are they?
What do they represent?
Now it’s time to begin coding! I want you to figure out the most common sampling interval for the temperature readings. First, sketch a plan on a piece of paper. Which vectors will you need to use? How are they related to each other? What operators or functions will you need?
The function table()
will help you out here.
Now, fill in the blanks below to get the most common sampling interval. Add comments describing what each line of code does.
This is your first time dealing with dates and times in R. It’s going to take you a while to get used to them and you’ll definitely pull some hair out along the way. “Why isn’t this simpler??” you’ll ask. I wish it could be! But unfortunately we humans treat time in super weird ways and it’s next to impossible to represent it simply. Read “UTC is enough for everyone …right?” by Zach Holman for a humorous (if crass) breakdown of how we got into such a tangled situation.
Problem decomposition
Look how big the kefj_*
vectors are. That’s way too big to tackle all at once! It’s always a good idea to break big, intractable problems down into smaller, digestible components. Let’s begin by finding the hottest and coldest air temperature readings in the dataset. When and where did these extremes happen? Use plot_kefj()
to visualize the temperature readings for the days containing those readings. Before you start filling in the blanks in the code below, first sketch your plan for answering these questions. Make sure to consider which vectors you’ll need and how they relate to each other! When you start filling in the code below, add comments to each line of code describing what it does.
Check out which.min()
and which.max()
This code chunk introduces two new things in R.
as.POSIXct()
is a function for creating vectors that R recognizes as dates and times. For example, to create a variable representing noon on January 1, 2025 you could write jan1_2025 <- as.POSIXct("2025-01-1 12:00")
. The tz
argument specifies the timezone. "Etc/GMT+8"
is the timezone the kefj
data were collected in.
&
represents “and”. It’s how you combine multiple conditions that have to be true. For example, let’s say I have four people of different ages represented by these vectors:
<- c("Alice", "Bob", "Carla", "Danielle")
people <- c(19, 16, 22, 29) ages
Then I can find which people are older than 20 and younger than 25 like this:
people[ages > 20 & ages < 25]
# Plot the hottest day
<- ???(kefj_temperature)
hottest_idx <- ???[hottest_idx]
hottest_time <- kefj_site[???]
??? <- as.POSIXct("???", tz = "Etc/GMT+8")
hotday_start <- as.POSIXct("???", tz = "Etc/GMT+8")
hotday_end <- ??? == hottest_site &
hotday_idx >= hotday_start &
??? <= hotday_end
??? <- ???[hotday_idx]
hotday_datetime <- ???
hotday_temperature <- ???
hotday_exposure plot_kefj(???, ???, ???)
# Repeat for the coldest day
Examine your visualizations. What patterns do you notice in time, temperature, and exposure? Do those patterns match your intuition, or do they differ?
Now that you’ve visualized the extreme temperatures you’re ready to begin calculating extreme temperature exposure. Re-read section 2.1 of Traiger et al. How did they define extreme heat exposure? Translate their written description to code and calculate the extreme heat exposure for the hottest day. Compare your answer to the visualization you made. Does it look right to you? Repeat this process for extreme cold exposure on the coldest day.
Putting it back together (optional)
You’ll see more about problem decomposition in PROG103. If you’d like a preview, try the following.
Let’s go from analyzing a single day to a whole season. Re-read how Traiger et al. quantified extreme hot and cold air temperature exposures at the seasonal scale. Make a sketch of how you would do that for a single site and season.
Pick one site and one season. What were the extreme heat and cold exposure at that site in that season? Repeat for a different site and a different season.
Traiger et al. also calculated water temperature anomalies. Consider how you could do that. Make a sketch showing which vectors you would need and how you would use them. Write code to get the temperature anomalies for one site in one season in one year.
Recap and next steps
In this module you started using R to represent and manipulate data. That’s a critical skill for a marine data scientist!
Future modules in this track will build on what you’ve learned here.
In this module, you called built-in R functions like
which.min()
andtable()
. In PROG102 you’ll learn how to write your own functions to encapsulate tasks like calculating extreme temperature exposure.PROG103 will cover branches and loops. Recall how you had to copy-paste-edit code in this module to apply the same task to different inputs (e.g., extreme hot vs extreme cold, one site vs another). That’s a time-consuming and error-prone way of doing things. Branches and loops are the solution for making choices and repeating operations.
Fill out the PROG101 reflection to complete this module. Well done!