Lecture 2
Duke University
STA 199 - Fall 2025
August 28, 2025
library(ellmer)
chat <- chat_openai()
prompts <- survey |>
filter(!is.na(programming_languages)) |>
pull(programming_languages) |>
list()
type_language <- type_object(
language = type_string(),
experience = type_string()
)
language_experience <- parallel_chat_structured(chat, prompts, type = type_language)
language_experience
language
1 Python, Java, R, C/C++, JavaScript, HTML/CSS, SQL, C#, VBA, MATLAB, Rust, Assembly, Swift, Stata, Bash, Typescript, Web3 Suite (HTML, CSS, JS), Nextflow, SAS, PsycoPy, Mathematica, React, Vue, Yaml, CourseKata, Flutter, Scratch
experience
1 Most students report a range of programming experience, from never having programmed before to being very comfortable in languages such as Python, Java, and R. Many took introductory courses (e.g., CS101, AP CS A) and are comfortable with basic programming constructs. Proficiency is generally highest in Python and Java, though some only have a little experience with languages like R, C/C++, and JavaScript. Quite a few have only used programming languages for specific classes or research projects and may not feel confident after a long break. Some students have experience with more niche or specialized tools such as MATLAB, SAS, and web technologies (HTML, CSS, JavaScript/JS frameworks). Many express a need for a refresher if not used recently. In summary, while a core group is comfortable with at least one language, especially Python or Java, many others have only basic to moderate experience and express the desire to gain more proficiency.
Nextflow – didn’t show up before due to case mismatch (Netxflow vs. nextflow)
survey |>
mutate(programming_languages = str_to_lower(programming_languages)) |>
filter(str_detect(programming_languages, "nextflow")) |>
select(programming_languages)
# A tibble: 1 × 1
programming_languages
<chr>
1 r (quite comfortable), bash (quite comfortable), python (less exper…
PsychoPy – misspelled by AI (PsycoPy vs. PsychoPy) + case mismatch
survey |>
mutate(programming_languages = str_to_lower(programming_languages)) |>
filter(str_detect(programming_languages, "psychopy")) |>
select(programming_languages)
# A tibble: 2 × 1
programming_languages
<chr>
1 psychopy (haven't used in while), matlab (minimal for fmri processi…
2 i have programmed with python before through psychopy. it was a ver…
CourseKata – not a programming language but a learning platform
It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.
The Duke Access and Accommodation Services is available to ensure that students are able to engage with their courses and related assignments.
Accoomodations for exams must be arranged through Access and Accommodation Services and exams with accommodations must be taken in the testing center.
I am committed to making all course materials accessible and I’m always learning how to do this better. If any course component is not accessible to you in any way, please don’t hesitate to let me know.
Scan the QR code or go to app.wooclap.com/sta199.
Log in with your Duke NetID.
AI tools for code:
!=
correct/good code.AI tools for narrative: Absolutely not!
AI tools for learning: Sure, but be careful/critical!
To uphold the Duke Community Standard:
I will not lie, cheat, or steal in my academic endeavors;
I will conduct myself honorably in all my endeavors; and
I will act if the Standard is compromised.
This is a large “lecture” but with lots of opportunities and expectations for active learning, participation, and collaboration.
You are expected to show up and participate in all lectures and labs, though there are a good number of each you can miss without penalty.
It’s possible to read the books and watch the videos and get an exposure to the material we cover in the course without showing up, but the experience will be much less rich and you will likely miss out on a lot of learning. That’s why the assessments and workflows are designed to reward and reinforce active participation.
Course operation
Doing data science
By the end of the course, you will be able to…
What does it mean for a data analysis to be “reproducible”?
Short-term goals:
Long-term goals:
Packages: Fundamental units of reproducible R code, including reusable R functions, the documentation that describes how to use them, and sample data1
As of 27 August 2025, there are 22,578 R packages available on CRAN (the Comprehensive R Archive Network)2
We’re going to work with a small (but important) subset of these!
Option 1:
Sit back and enjoy the show!
install.packages()
, once per system:Note
We already pre-installed many of the package you’ll need for this course, so you might go the whole semester without needing to run install.packages()
!
library()
, once per session:If data analysis was cooking…
Installing a package would be like buying ingredients from the store
Loading a package would be like getting the ingredients out of your pantry and setting them on your counter top to be used
aka the package you’ll hear about the most…
penguins
data frame# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <int>
1 Adelie Torgersen 39.1 18.7 181
2 Adelie Torgersen 39.5 17.4 186
3 Adelie Torgersen 40.3 18 195
4 Adelie Torgersen NA NA NA
5 Adelie Torgersen 36.7 19.3 193
6 Adelie Torgersen 39.3 20.6 190
7 Adelie Torgersen 38.9 17.8 181
8 Adelie Torgersen 39.2 19.6 195
9 Adelie Torgersen 34.1 18.1 193
10 Adelie Torgersen 42 20.2 190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>
bill_length_mm
[1] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1
[14] 38.6 34.6 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3
[27] 40.6 40.5 37.9 40.5 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6
[40] 39.8 36.5 40.8 36.0 44.1 37.0 39.6 41.1 37.5 36.0 42.3 39.6 40.1
[53] 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6 35.7 41.3 37.6 41.1 36.4
[66] 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5 42.8 40.9 37.2
[79] 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9 35.7
[92] 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8
[105] 37.9 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6
[118] 37.3 35.7 41.1 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1
[131] 38.5 43.1 36.8 37.5 38.1 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1
[144] 40.7 37.3 39.0 39.2 36.6 36.0 37.8 36.0 41.5 46.1 50.0 48.7 50.0
[157] 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5 48.4 45.8 49.3 42.0
[170] 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8 48.2 50.0
[183] 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0
[209] 43.8 45.5 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5
[222] 50.7 47.7 46.4 48.2 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5
[235] 47.4 50.0 44.9 50.8 43.4 51.3 47.5 52.1 47.5 52.2 45.5 49.5 44.5
[248] 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2 49.1 47.3 46.8 41.7 53.4
[261] 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8 47.2 NA 46.8
[274] 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0 51.3
[287] 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2
[300] 50.6 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5
[313] 47.6 52.0 46.9 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5
[326] 49.8 48.1 51.4 45.7 50.7 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8
[339] 45.7 55.8 43.5 49.6 50.8 50.2
Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
flipper_length_mm
This can be fixed by using penguins$flipper_length_mm
.
[1] 181 186 195 NA 193 190 181 195 193 190 186 180 182 191 198 185
[17] 195 197 184 194 174 180 189 185 180 187 183 187 172 180 178 178
[33] 188 184 195 196 190 180 181 184 182 195 186 196 185 190 182 179
[49] 190 191 186 188 190 200 187 191 186 193 181 194 185 195 185 192
[65] 184 192 195 188 190 198 190 190 196 197 190 195 191 184 187 195
[81] 189 196 187 193 191 194 190 189 189 190 202 205 185 186 187 208
[97] 190 196 178 192 192 203 183 190 193 184 199 190 181 197 198 191
[113] 193 197 191 196 188 199 189 189 187 198 176 202 186 199 191 195
[129] 191 210 190 197 193 199 187 190 191 200 185 193 193 187 188 190
[145] 192 185 190 184 195 193 187 201 211 230 210 218 215 210 211 219
[161] 209 215 214 216 214 213 210 217 210 221 209 222 218 215 213 215
[177] 215 215 216 215 210 220 222 209 207 230 220 220 213 219 208 208
[193] 208 225 210 216 222 217 210 225 213 215 210 220 210 225 217 220
[209] 208 220 208 224 208 221 214 231 219 230 214 229 220 223 216 221
[225] 221 217 216 230 209 220 215 223 212 221 212 224 212 228 218 218
[241] 212 230 218 228 212 224 214 226 216 222 203 225 219 228 215 228
[257] 216 215 210 219 208 209 216 229 213 230 217 230 217 222 214 NA
[273] 215 222 212 213 192 196 193 188 197 198 178 197 195 198 193 194
[289] 185 201 190 201 197 181 190 195 181 191 187 193 195 197 200 200
[305] 191 205 187 201 187 203 195 199 195 210 192 205 210 187 196 196
[321] 196 201 190 212 187 198 199 201 193 203 187 197 191 203 202 194
[337] 206 189 195 207 202 193 210 198
function(argument)
Functions are (most often) verbs, followed by what they will be applied to in parentheses:
trim
med mean()
Object documentation can be accessed with ?
GitHub is the home for your Git-based projects on the internet – like DropBox but much, much better
We will use GitHub as a platform for web hosting and collaboration (and as our course management system!)
with human readable messages
Option 1:
Sit back and enjoy the show!
Note
You’ll need to stick to this option if you haven’t yet accepted your GitHub invite and don’t have a repo created for you.
Option 2:
Go to the course GitHub organization and clone ae-YOUR-GITHUB-NAME
repo to your container.
Find your application repo, that will always be named using the naming convention assignment_title-YOUR-GITHUB-NAME
, e.g., ae-mine-cetinkaya-rundel
or lab-1-mine-cetinkaya-rundel
.
Click on the green “Code” button, make sure SSH is selected, copy the repo URL
In RStudio, File > New Project > From Version Control > Git
Paste repo URL copied in previous step, then click tab to auto-fill the project name, then click Create Project
If you haven’t done Lab 0, for one time only, type yes
in the pop-up dialogue
Never received GitHub invite \(\rightarrow\) Fill out “Getting to know you survey
Never accepted GitHub invite \(\rightarrow\) Look for it in your email and accept it
Cloning repo fails \(\rightarrow\) Review/redo Lab 0 steps for setting up SSH key
Still no luck? Stay after class today or come by my office hours tomorrow or post on Ed for help
Option 1:
Sit back and enjoy the show!
Note
If you chose (or had to choose) this option for the previous tour, or if you couldn’t clone your repo for any reason, you’ll need to stick to this option.
Option 2:
Go to RStudio and open the document ae-01-income-inequality.qmd
.
Once we made changes to our Quarto document, we
went to the Git pane in RStudio
staged our changes by clicking the checkboxes next to the relevant files
committed our changes with an informative commit message
pushed our changes to our application exercise repos
confirmed on GitHub that we could see our changes pushed from RStudio
Feel free to grab one before you leave!