Lecture 4
Duke University
STA 199 - Fall 2025
September 4, 2025
Which of the following is true about the code below?
mtcars
is the name of the variable being plotted on the x-axismap()
Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
Continue to respond to Wooclap questions during class and remember that attendance and participation are part of your grade – if we haven’t gotten any responses from you yet, and you didn’t just add the class, you’ve already heard from us!
Following along with code along videos: See https://sta199-f25.github.io/computing-code-alongs.html – Clone the code-alongs
repository and work throgh the relevant .qmd
files as you watch the videos. Use the project navigator in RStudio to switch between your code-alongs
project and your other projects (ae, labs, HWs, etc.).
Lab 1 is on Monday, HW 1 is due next Sunday.
The Bechdel test (or the Bechdel-Wallace test), named after cartoonist Alison Bechdel,
is a measure of the representation of women in film and other fiction. The test asks whether a work features at least two women who have a conversation about something other than a man. Some versions of the test also require that those two women have names.
Dykes to Watch Out For - 1985
Film passes if…
We did a statistical analysis of films to test two claims: first, that films that pass the Bechdel test — featuring women in stronger roles — see a lower return on investment, and second, that they see lower gross profits. We found no evidence to support either claim.
ae-02-bechdel-dataviz
ae
project in RStudio.ggplot()
.+
s.Cell label
s are helpful for describing what the code is doing, for jumping between code cells in the editor, and for troubleshooting
message: false
hides any messages emitted by the code in your rendered document
bechdel
data frame
roi
greater than 400 (gross is more than 400 times budget)
title
, roi
, budget_2013
, gross_2013
, year
, and clean_test
# A tibble: 3 × 6
title roi budget_2013 gross_2013 year clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 Paranormal Activity 671. 505595 339424558 2007 dubious
2 The Blair Witch Proje… 648. 839077 543776715 1999 ok
3 El Mariachi 583. 11622 6778946 1992 nowomen
In data transformation with the pipe operator |>
, what does the operator do?
Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
|>
The pipe operator passes what comes before it into the function that comes after it as the first argument in that function.
|>
|>
+
+
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Start with the bechdel
data frame:
# A tibble: 1,615 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 21 & Over 2013 67878146 13000000 5.22 FAIL notalk
2 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
3 12 Years a S… 2013 211714070 20000000 10.6 FAIL notalk
4 2 Guns 2013 208105475 61000000 3.41 FAIL notalk
5 42 2013 190040426 40000000 4.75 FAIL men
6 47 Ronin 2013 184166317 225000000 0.819 FAIL men
7 A Good Day t… 2013 371598396 92000000 4.04 FAIL notalk
8 About Time 2013 102648667 12000000 8.55 PASS ok
9 Admission 2013 36014634 13000000 2.77 PASS ok
10 After Earth 2013 304895295 130000000 2.35 FAIL notalk
# ℹ 1,605 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Filter for rows where binary
is equal to "PASS"
:
# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Dredd 3D 2012 55078343 45658735 1.21 PASS ok
2 About Time 2013 102648667 12000000 8.55 PASS ok
3 Admission 2013 36014634 13000000 2.77 PASS ok
4 American Hust… 2013 397915817 40000000 9.95 PASS ok
5 August: Osage… 2013 87609748 25000000 3.50 PASS ok
6 Beautiful Cre… 2013 75392809 50000000 1.51 PASS ok
7 Blue Jasmine 2013 101793664 18000000 5.66 PASS ok
8 Carrie 2013 120268278 30000000 4.01 PASS ok
9 Despicable Me… 2013 1338831390 76000000 17.6 PASS ok
10 Elysium 2013 379242208 120000000 3.16 PASS ok
# ℹ 743 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Arrange the rows in desc
ending order of roi
:
# A tibble: 753 × 7
title year gross_2013 budget_2013 roi binary clean_test
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 The Blair Wit… 1999 543776715 839077 648. PASS ok
2 The Devil Ins… 2012 157289709 1014639 155. PASS ok
3 My Big Fat Gr… 2002 768922942 6475896 119. PASS ok
4 Chasing Amy 1997 39417963 362810 109. PASS ok
5 Slacker 1991 4200140 39349 107. PASS ok
6 Insidious 2010 164379554 1602348 103. PASS ok
7 Paranormal Ac… 2010 280159759 3204696 87.4 PASS ok
8 Paranormal Ac… 2011 322170936 5178454 62.2 PASS ok
9 The Last Exor… 2010 118787648 1922817 61.8 PASS ok
10 Cinderella 1997 246710482 4208591 58.6 PASS ok
# ℹ 743 more rows
Find movies that pass the Bechdel test and display their titles and ROIs in descending order of ROI.
Select columns title
and roi
:
# A tibble: 753 × 2
title roi
<chr> <dbl>
1 The Blair Witch Project 648.
2 The Devil Inside 155.
3 My Big Fat Greek Wedding 119.
4 Chasing Amy 109.
5 Slacker 107.
6 Insidious 103.
7 Paranormal Activity 2 87.4
8 Paranormal Activity 3 62.2
9 The Last Exorcism 61.8
10 Cinderella 58.6
# ℹ 743 more rows
Build cakes (ggplot
)
Stack dolls (pipe |>
)
Master these constructs, and everything will be A-Ok!
Which type of visualization is useful for comparing distributions across groups?
Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.