Lecture 3
            Duke University 
 STA 199 - Fall 2025
          
September 2, 2025
Office hours are posted on the course website!
If you can follow along with today’s application exercise steps, great! If something doesn’t work as expected, ask me/TA during the exercise. We’ll either:
What is this a visualization of?


Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
Last time:
We introduced you to the course toolkit.
You cloned your ae repositories and started making some updates in your Quarto documents.
You committed and pushed your changes back – at least most of you did!
Today:
You will wrap up that application exercis, and commit and push your final changes.
We will introduce data visualization.
You will pull to get today’s application exercise file.
You will work on the new application exercise on data visualization, commit your changes, and push them.
Option 2:
Go to RStudio and open the document ae-01-income-inequality.qmd.

Once we made changes to our Quarto document, we
went to the Git pane in RStudio
staged our changes by clicking the checkboxes next to the relevant files
committed our changes with an informative commit message
pulled from GitHub to make sure we had the latest version of our repo
pushed our changes to our application exercise repos
confirmed on GitHub that we could see our changes pushed from RStudio
Grab one before you leave!

Remember this visualization from the code along video – what was it about?


Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
how the sausage is made!
us_uk_tr_votes <- un_votes |>
  inner_join(un_roll_calls, by = "rcid") |>
  inner_join(un_roll_call_issues, by = "rcid", relationship = "many-to-many") |>
  filter(country %in% c("United Kingdom", "United States", "Turkey")) |>
  mutate(year = year(date)) |>
  group_by(country, year, issue) |>
  summarize(percent_yes = mean(vote == "yes"), .groups = "drop")Note
Let’s leave these details aside for a bit, we’ll revisit this code at a later point in the semester. For now, let’s agree that we need to do some “data wrangling” to get the data into the right format for the plot we want to create. Just note that we called the data frame we’ll visualize us_uk_tr_votes.
# A tibble: 1,212 × 4
   country  year issue                        percent_yes
   <chr>   <dbl> <fct>                              <dbl>
 1 Turkey   1946 Colonialism                        0.8  
 2 Turkey   1946 Economic development               0.6  
 3 Turkey   1946 Human rights                       0    
 4 Turkey   1947 Colonialism                        0.222
 5 Turkey   1947 Economic development               0.5  
 6 Turkey   1947 Palestinian conflict               0.143
 7 Turkey   1948 Colonialism                        0.417
 8 Turkey   1948 Arms control and disarmament       0    
 9 Turkey   1948 Economic development               0.375
10 Turkey   1948 Human rights                       0.167
# ℹ 1,202 more rowsMap year to the x aesthetic
Map percent_yes to the y aesthetic
Aesthetics are visual properties of a plot
In the grammar of graphics, variables from the data frame are mapped to aesthetics
It’s common practice in R to omit the names of first two arguments of a function:
with a geom
Map country to the color aesthetic
with another geom
ggplot(us_uk_tr_votes, aes(x = year, y = percent_yes, color = country)) +
  geom_point() +
  geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'geom_smooth() resulted in the following warning:`geom_smooth()` using method = 'loess' and formula = 'y ~ x'Which of the following modifications will change the transparency of the points in the plot?


Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
with alpha
with se = FALSE
We built a plot layer-by-layer

