Evaluating models

Lecture 22

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2025

November 13, 2025

Warm-up

While you wait: Participate 📱💻

What is sensitivity also known as?

  • True positive rate
  • True negative rate
  • False positive rate
  • False negative rate
  • Recall

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.

Announcements

  • Projects due tonight, peer evals due tomorrow night

  • Practice Exam 2 is posted on the course website

  • Reply to post on Ed about requests for topics / concepts for exam review [thread]

From last class: Participate 📱💻

Which of the following best describes the area annotated on the ROC curve?

  • Where all positives classified as positive, all negatives classified as negative
  • Where true positive rate = false positive rate
  • Where all positives classified as negative, all negatives classified as positive

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.

ROC curve

Which corner of the plot indicates the best model performance?

Next steps

ae-14-chicago-taxi-classification

  • Go to your ae project in RStudio.

  • If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.

  • If you haven’t yet done so, click Pull to get today’s application exercise file: ae-14-chicago-taxi-classification.qmd.

  • Work through the application exercise in class, and render, commit, and push your edits.

Recap

  • Split data into training and testing sets (generally 75/25)

  • Fit models on training data and reduce to a few candidate models

  • Make predictions on testing data

  • Evaluate predictions on testing data using appropriate predictive performance metrics

    • Linear models: Adjusted R-squared, AIC, etc.
    • Logistic models: False negative and positive rates, AUC (area under the curve), etc.
  • Don’t forget to also consider explainability and domain knowledge when selecting a final model

  • In a future machine learning course: Cross-validation (partitioning training data into training and validation sets and repeating this many times to evaluate model predictive performance before using the testing data), feature engineering, hyperparameter tuning, more complex models (random forests, gradient boosting machines, neural networks, etc.)

Note

We will only learn about a subset of these in this course, but you can go further into these ideas in STA 210 or STA 221 as well as in various machine learning courses.