Lecture 22
Duke University
STA 199 - Fall 2025
November 13, 2025
What is sensitivity also known as?

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
Projects due tonight, peer evals due tomorrow night
Practice Exam 2 is posted on the course website
Reply to post on Ed about requests for topics / concepts for exam review [thread]
Which of the following best describes the area annotated on the ROC curve?


Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
Which corner of the plot indicates the best model performance?
Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-14-chicago-taxi-classification.qmd.
Work through the application exercise in class, and render, commit, and push your edits.
Split data into training and testing sets (generally 75/25)
Fit models on training data and reduce to a few candidate models
Make predictions on testing data
Evaluate predictions on testing data using appropriate predictive performance metrics
Don’t forget to also consider explainability and domain knowledge when selecting a final model
In a future machine learning course: Cross-validation (partitioning training data into training and validation sets and repeating this many times to evaluate model predictive performance before using the testing data), feature engineering, hyperparameter tuning, more complex models (random forests, gradient boosting machines, neural networks, etc.)
Note
We will only learn about a subset of these in this course, but you can go further into these ideas in STA 210 or STA 221 as well as in various machine learning courses.