STA 199: Introduction to Data Science and Statistical Thinking

This page contains an outline of the topics, content, and assignments for the semester. Note that this schedule will be updated as the semester progresses and the timeline of topics and assignments might be updated throughout the semester.

WEEK DATE TOPIC PREPARE MATERIALS DUE
1 Mon, Aug 25 Lab 0: Mise en place
πŸ’» lab 0 Lab 0 due at the end of lab session (not graded)

Tue, Aug 26 Hello World and Hello STA 199! πŸ“ Syllabus πŸ–₯️ slides 01
πŸ—’οΈ notes 01


Thu, Aug 28 Meet the toolkit πŸ“— r4ds - intro
πŸ“˜ ims - chp 1
πŸŽ₯ Meet the toolkit :: R and RStudio
πŸŽ₯ Meet the toolkit :: Quarto
πŸŽ₯ Code along :: First data viz with UN Votes
πŸ–₯️ slides 02
πŸ—’οΈ notes 02
⌨️ ae 01
βœ… ae 01

2 Mon, Sep 1 No lab - Labor Day



Tue, Sep 2 ARC Presentation
Grammar of data visualization
πŸ“— r4ds - chp 1
πŸ“˜ ims - chp 4
πŸŽ₯ Visualizing data
πŸŽ₯ Building a plot step-by-step with ggplot2
πŸŽ₯ Grammar of graphics
πŸŽ₯ Code along :: First look at Palmer Penguins
πŸ–₯️ slides 03
πŸ—’οΈ notes 03


Thu, Sep 4 Grammar of data transformation πŸ“— r4ds - chp 2
πŸ“— r4ds - chp 3.1-3.5
πŸŽ₯ Grammar of data transformation
πŸŽ₯ Code along :: Flights and pipes
πŸ–₯️ slides 04
πŸ—’οΈ notes 04
⌨️ ae 02
βœ… ae 02

3 Mon, Sep 8 Lab 1: Exploring NC Counties
πŸ’» lab 1
πŸ“ hw 1
Lab 1 due at the end of the lab session

Tue, Sep 9 Exploratory data analysis I πŸ“— r4ds - chp 3.6-3.7
πŸŽ₯ Visualizing and summarizing categorical data
πŸŽ₯ Visualizing and summarizing numerical data
πŸŽ₯ Visualizing and summarizing relationships
πŸŽ₯ Code along :: Star Wars characters
πŸ–₯️ slides 05
πŸ—’οΈ notes 05
⌨️ ae 03
βœ… ae 03


Thu, Sep 11 Exploratory data analysis II πŸ“˜ ims - chp 5
πŸ“˜ ims - chp 6
πŸŽ₯ Code along :: Diving deeper with Palmer Penguins
πŸ–₯️ slides 06
πŸ—’οΈ notes 06
⌨️ ae 04
βœ… ae 04


Sun, Sep 14


HW 1 due at 11:59 pm
4 Mon, Sep 15 Lab 2: Get in teams then group_by() πŸ“— r4ds - chp 4 πŸ’» lab 2
πŸ“ hw 2
Lab 2 due at the end of the lab session

Tue, Sep 16 Tidying data πŸŽ₯ Tidy data
πŸŽ₯ Tidying data
πŸŽ₯ Code along :: Country populations over time
πŸ“— r4ds - chp 5
πŸ–₯️ slides 07
πŸ—’οΈ notes 07
⌨️ ae 05
βœ… ae 05


Thu, Sep 18 Joining data πŸŽ₯ Joining data
πŸŽ₯ Code along :: Continent populations
πŸ“— r4ds - chp 19.1-19.3
πŸ–₯️ slides 08
πŸ—’οΈ notes 08
⌨️ ae 06
βœ… ae 06


Sun, Sep 21


HW 2 due at 11:59 pm
5 Mon, Sep 22 Lab 3: Inflation everywhere
πŸ’» lab 3
πŸ“ hw 3
Lab 3 due at the end of the lab session

Tue, Sep 23 Data types and classes πŸŽ₯ Data types
πŸŽ₯ Data classes
πŸŽ₯ Code along :: That’s my type
πŸ“— r4ds - chp 16
πŸ–₯️ slides 09
πŸ—’οΈ notes 09
⌨️ ae 07
βœ… ae 07


Thu, Sep 25 Importing and recoding data πŸŽ₯ Importing data
πŸŽ₯ Code along :: Halving CO2 emissions
πŸŽ₯ Code along :: Student survey
πŸ“— r4ds - chp 7
πŸ“— r4ds - chp 17.1 - 17.3
πŸ–₯️ slides 10
πŸ—’οΈ notes 10
⌨️ ae 08
βœ… ae 08


Sun, Sep 28


HW 3 due at 11:59 pm
6 Mon, Sep 29 Lab 4: Changes in college athletics
πŸ’» lab 4 Lab 4 due at the end of lab session

Tue, Sep 30 Exam 1 review
πŸ–₯️ slides 11
πŸ—’οΈ notes 11
πŸ“ exam 1 review
βœ… exam 1 review


Thu, Oct 2 Exam 1 - In-class + take-home released



Sat, Oct 4


Exam 1 take-home due at 12 pm
7 Mon, Oct 6 Project milestone 1 - Working collaboratively [45 mins]
Project milestone 2 - Project proposals [30 mins]
πŸ“ Pre-read: Merge conflicts
πŸ“ Project description
πŸ““ project milestone 1
πŸ““ project milestone 2
Project milestone 1 due at the end of lab session

Tue, Oct 7 Web scraping a single page πŸŽ₯ Web scraping basics
πŸŽ₯ Code along :: Scraping an eCommerce page
πŸ“— r4ds - chp 24.1 - 24.6
πŸ–₯️ slides 12
πŸ—’οΈ notes 12
⌨️ ae 09
βœ… ae 09


Thu, Oct 9 Web scraping many pages πŸŽ₯ Code along :: Scraping many eCommerce pages
πŸŽ₯ Web scraping considerations
πŸ“— r4ds - chp 25.1 - 25.2
πŸ–₯️ slides 13
πŸ—’οΈ notes 13
⌨️ ae 09
βœ… ae 09
Midterm course evaluation due at 11:59 pm (optional)
8 Mon, Oct 13 No lab - Fall Break



Tue, Oct 14 No lecture - Fall Break



Thu, Oct 16 Data science ethics πŸŽ₯ Misrepresentation
πŸŽ₯ Data privacy
πŸŽ₯ Algorithmic bias
πŸŽ₯ Code along :: Sectors and services
πŸ–₯️ slides 14
πŸ—’οΈ notes 14
Project milestone 2 due at 11:59 pm
Peer evaluation 1 due at 11:59 pm
9 Mon, Oct 20 Project milestone 3 - Improve and progress πŸ“ Tidyverse style guide - Chp 1-5 πŸ““ project milestone 3 Project milestone 3 due at the end of lab session

Tue, Oct 21 The language of models πŸŽ₯ The language of models
πŸŽ₯ Linear regression with a numerical predictor
πŸ“˜ ims - chp 7.1
πŸ–₯️ slides 15
πŸ—’οΈ notes 15
⌨️ ae 10
βœ… ae 10


Thu, Oct 23 Linear regression with a single predictor πŸŽ₯ Linear regression with a categorical predictor
πŸŽ₯ Outliers in linear regression
πŸŽ₯ Code along :: Modeling fish
πŸ“˜ ims - chp 7.2

Peer evaluation 2 due at 11:59 pm
10 Mon, Oct 27 Project milestone 4 - Peer review [30 minutes]
Lab 5: Visualize, model, interpret [45 minutes]

πŸ““ project milestone 4 Project milestone 4 at the end of lab session
Lab 5 due at the end of the lab session

Tue, Oct 28 Linear regression with multiple predictors πŸŽ₯ Linear regression with multiple predictors
πŸŽ₯ Main and interaction effects
πŸ“˜ ims - chp 8.1-8.2
πŸ“˜ ims - chp 8.3-8.5



Thu, Oct 30 Model selection and overfitting πŸŽ₯ Code along :: Modeling interest rates
Peer evaluation 3 at 11:59 pm

Sun, Nov 2


HW 4 at 11:59 pm
11 Mon, Nov 3 Project milestone 5 - Work on writeup and presentations
πŸ““ project milestone 5 Project milestone 5 due at the end of lab session

Tue, Nov 4 Essential data science skills potpourri:
  • Working with generative AI tools
  • Communicating data science results effectively
  • Customizing Quarto reports and presentations
πŸ“˜ ims - chp 6
πŸ“— r4ds - chp 10



Thu, Nov 6 Logistic regression I πŸŽ₯ Logistic regression
πŸŽ₯ Code along :: Building a spam filter
πŸ“˜ ims - chp 9


12 Mon, Nov 10 Project milestone 6 - Present and turn in write-up!
πŸ““ project milestone 6

Tue, Nov 11 Logistic regression II πŸŽ₯ Clasification and decision errors
πŸŽ₯ Overfitting and spending your data



Thu, Nov 13 Evaluating models πŸŽ₯ Code along :: Forest classification
Peer evaluation 4 due at 11:59 pm
13 Mon, Nov 17 Lab 6: Everything so far II

Lab 6 due at the end of lab session

Tue, Nov 18 Exam 2 review
πŸ“ exam 2 review
βœ… exam 2 review


Thu, Nov 20 Exam 2 - In-class + take-home released



Sat, Nov 22


Exam 2 take-home due at 12 pm
14 Mon, Nov 24 Lab 7: Explore and classify

Lab 7 due at the end of lab session

Tue, Nov 25 Quantifying uncertainty with bootstrap intervals πŸŽ₯ Quantifying uncertainty
πŸŽ₯ Bootstrapping
πŸŽ₯ Code along :: Bootstrapping Duke Forest houses
πŸ“˜ ims - chp 11
πŸ“˜ ims - chp 12



Thu, Nov 27 No lecture - Thanksgiving



Sun, Nov 30


HW 5 due at 11:59 pm
15 Mon, Dec 1 Lab 8: Inference To be posted
Lab 8 due at the end of lab session

Tue, Dec 2 Making decisions with randomization tests To be posted


Thu, Dec 4 Looking further To be posted


Fri, Dec 5


HW 6 due at 11:59 pm (will be accepted until Sun, Dec 7 at 11:59 pm without penalty)

NA Final review (time TBD, location TBD)
πŸ“ final review
βœ… final review

16 Fri, Dec 12 Final (2 pm - 5 pm) To be posted