STA 199: Introduction to Data Science and Statistical Thinking

This page contains an outline of the topics, content, and assignments for the semester. Note that this schedule will be updated as the semester progresses and the timeline of topics and assignments might be updated throughout the semester.

WEEK DATE TOPIC PREPARE MATERIALS DUE
1 Mon, Aug 25 Lab 0: Mise en place
πŸ’» lab 0 Lab 0 due at the end of lab session (not graded)

Tue, Aug 26 Hello World and Hello STA 199! πŸ“ Syllabus πŸ–₯️ slides 01
πŸ—’οΈ notes 01


Thu, Aug 28 Meet the toolkit πŸ“— r4ds - intro
πŸ“˜ ims - chp 1
πŸŽ₯ Meet the toolkit :: R and RStudio
πŸŽ₯ Meet the toolkit :: Quarto
πŸŽ₯ Code along :: First data viz with UN Votes
πŸ–₯️ slides 02
πŸ—’οΈ notes 02
⌨️ ae 01
βœ… ae 01

2 Mon, Sep 1 No lab - Labor Day



Tue, Sep 2 ARC Presentation
Grammar of data visualization
πŸ“— r4ds - chp 1
πŸ“˜ ims - chp 4
πŸŽ₯ Visualizing data
πŸŽ₯ Building a plot step-by-step with ggplot2
πŸŽ₯ Grammar of graphics
πŸŽ₯ Code along :: First look at Palmer Penguins
πŸ–₯️ slides 03
πŸ—’οΈ notes 03


Thu, Sep 4 Grammar of data transformation πŸ“— r4ds - chp 2
πŸ“— r4ds - chp 3.1-3.5
πŸŽ₯ Grammar of data transformation
πŸŽ₯ Code along :: Flights and pipes
πŸ–₯️ slides 04
πŸ—’οΈ notes 04
⌨️ ae 02
βœ… ae 02

3 Mon, Sep 8 Lab 1: Exploring NC Counties
πŸ’» lab 1
πŸ“ hw 1
Lab 1 due at the end of the lab session

Tue, Sep 9 Exploratory data analysis I πŸ“— r4ds - chp 3.6-3.7
πŸŽ₯ Visualizing and summarizing categorical data
πŸŽ₯ Visualizing and summarizing numerical data
πŸŽ₯ Visualizing and summarizing relationships
πŸŽ₯ Code along :: Star Wars characters
πŸ–₯️ slides 05
πŸ—’οΈ notes 05
⌨️ ae 03
βœ… ae 03


Thu, Sep 11 Exploratory data analysis II πŸ“˜ ims - chp 5
πŸ“˜ ims - chp 6
πŸŽ₯ Code along :: Diving deeper with Palmer Penguins
πŸ–₯️ slides 06
πŸ—’οΈ notes 06
⌨️ ae 04
βœ… ae 04


Sun, Sep 14


HW 1 due at 11:59 pm
4 Mon, Sep 15 Lab 2: Get in teams then group_by() πŸ“— r4ds - chp 4 πŸ’» lab 2
πŸ“ hw 2
Lab 2 due at the end of the lab session

Tue, Sep 16 Tidying data πŸŽ₯ Tidy data
πŸŽ₯ Tidying data
πŸŽ₯ Code along :: Country populations over time
πŸ“— r4ds - chp 5
πŸ–₯️ slides 07
πŸ—’οΈ notes 07
⌨️ ae 05
βœ… ae 05


Thu, Sep 18 Joining data πŸŽ₯ Joining data
πŸŽ₯ Code along :: Continent populations
πŸ“— r4ds - chp 19.1-19.3
πŸ–₯️ slides 08
πŸ—’οΈ notes 08
⌨️ ae 06
βœ… ae 06


Sun, Sep 21


HW 2 due at 11:59 pm
5 Mon, Sep 22 Lab 3: Inflation everywhere
πŸ’» lab 3
πŸ“ hw 3
Lab 3 due at the end of the lab session

Tue, Sep 23 Data types and classes πŸŽ₯ Data types
πŸŽ₯ Data classes
πŸŽ₯ Code along :: That’s my type
πŸ“— r4ds - chp 16
πŸ–₯️ slides 09
πŸ—’οΈ notes 09
⌨️ ae 07
βœ… ae 07


Thu, Sep 25 Importing and recoding data πŸŽ₯ Importing data
πŸŽ₯ Code along :: Halving CO2 emissions
πŸŽ₯ Code along :: Student survey
πŸ“— r4ds - chp 7
πŸ“— r4ds - chp 17.1 - 17.3
πŸ–₯️ slides 10
πŸ—’οΈ notes 10
⌨️ ae 08
βœ… ae 08


Sun, Sep 28


HW 3 due at 11:59 pm
6 Mon, Sep 29 Lab 4: Changes in college athletics
πŸ’» lab 4 Lab 4 due at the end of lab session

Tue, Sep 30 Exam 1 review
πŸ–₯️ slides 11
πŸ—’οΈ notes 11
πŸ“ exam 1 review
βœ… exam 1 review


Thu, Oct 2 Exam 1 - In-class + take-home released



Sat, Oct 4


Exam 1 take-home due at 12 pm (noon)
7 Mon, Oct 6 Project milestone 1 - Working collaboratively [45 mins]
Project milestone 2 - Project proposals [30 mins]
πŸ“ Pre-read: Merge conflicts
πŸ“ Project description
πŸ““ project milestone 1
πŸ““ project milestone 2
Project milestone 1 due at the end of lab session

Tue, Oct 7 Web scraping a single page πŸŽ₯ Web scraping basics
πŸŽ₯ Code along :: Scraping an eCommerce page
πŸ“— r4ds - chp 24.1 - 24.6
πŸ–₯️ slides 12
πŸ—’οΈ notes 12
⌨️ ae 09
βœ… ae 09


Thu, Oct 9 Web scraping many pages πŸŽ₯ Code along :: Scraping many eCommerce pages
πŸŽ₯ Web scraping considerations
πŸ“— r4ds - chp 25.1 - 25.2
πŸ–₯️ slides 13
πŸ—’οΈ notes 13
⌨️ ae 09
βœ… ae 09
Midterm course evaluation due at 11:59 pm (optional)
8 Mon, Oct 13 No lab - Fall Break



Tue, Oct 14 No lecture - Fall Break



Thu, Oct 16 Data science ethics πŸŽ₯ Misrepresentation
πŸŽ₯ Data privacy
πŸŽ₯ Algorithmic bias
πŸŽ₯ Code along :: Sectors and services
πŸ–₯️ slides 14
πŸ—’οΈ notes 14
Project milestone 2 due at 11:59 pm
Peer evaluation 1 due at 11:59 pm
9 Mon, Oct 20 Project milestone 3 - Improve and progress πŸ“ Tidyverse style guide - Chp 1-5 πŸ““ project milestone 3 Project milestone 3 due at the end of lab session

Tue, Oct 21 The language of models πŸŽ₯ The language of models
πŸŽ₯ Linear regression with a numerical predictor
πŸ“˜ ims - chp 7.1
πŸ–₯️ slides 15
πŸ—’οΈ notes 15
⌨️ ae 10
βœ… ae 10


Thu, Oct 23 Linear regression with a single predictor πŸŽ₯ Linear regression with a categorical predictor
πŸŽ₯ Outliers in linear regression
πŸŽ₯ Code along :: Modeling fish
πŸ“˜ ims - chp 7.2
πŸ–₯️ slides 16
πŸ—’οΈ notes 16
⌨️ ae 11
βœ… ae 11
Peer evaluation 2 due at 11:59 pm
10 Mon, Oct 27 Project milestone 4 - Peer review [30 minutes]
Lab 5: Make up your other half [45 minutes]

πŸ““ project milestone 4
πŸ’» lab 5
πŸ“ hw 4
Project milestone 4 at the end of lab session
Lab 5 due at the end of the lab session

Tue, Oct 28 Linear regression with multiple predictors πŸŽ₯ Linear regression with multiple predictors
πŸŽ₯ Main and interaction effects
πŸ“˜ ims - chp 8.1-8.2
πŸ“˜ ims - chp 8.3-8.5
πŸ–₯️ slides 17
πŸ—’οΈ notes 17
⌨️ ae 12
βœ… ae 12


Thu, Oct 30 Model selection and overfitting πŸŽ₯ Code along :: Modeling interest rates πŸ–₯️ slides 18
πŸ—’οΈ notes 18
Peer evaluation 3 at 11:59 pm

Sun, Nov 2


HW 4 at 11:59 pm
11 Mon, Nov 3 Project milestone 5 - Work on writeup and presentations
πŸ““ project milestone 5 Project milestone 5 due at the end of lab session

Tue, Nov 4 Developing and communicating data science results πŸ“˜ ims - chp 6
πŸ“— r4ds - chp 10
πŸ–₯️ slides 19
πŸ—’οΈ notes 19


Thu, Nov 6 Logistic regression πŸŽ₯ Logistic regression
πŸŽ₯ Code along :: Building a spam filter
πŸ“˜ ims - chp 9
πŸ–₯️ slides 20
πŸ—’οΈ notes 20
⌨️ ae 13
βœ… ae 13

12 Mon, Nov 10 Project milestone 6 - Presentation
πŸ““ project milestone 6 Project milestone 6 presentation due at the start of lab session

Tue, Nov 11 Spending your data πŸŽ₯ Clasification and decision errors
πŸŽ₯ Overfitting and spending your data
πŸ–₯️ slides 21
πŸ—’οΈ notes 21


Thu, Nov 13 Evaluating models πŸŽ₯ Code along :: Forest classification πŸ–₯️ slides 22
πŸ—’οΈ notes 22
⌨️ ae 14
βœ… ae 14
Project milestone 6 write-up due at 11:59 pm

Fri, Nov 14


Peer evaluation 4 due at 11:59 pm
13 Mon, Nov 17 Lab 6: Everything so far II
πŸ’» lab 6 Lab 6 due at the end of lab session

Tue, Nov 18 Exam 2 review
πŸ–₯️ slides 23
πŸ—’οΈ notes 23
πŸ“ exam 2 review
βœ… exam 2 review


Thu, Nov 20 Exam 2 - In-class + take-home released



Sat, Nov 22


Exam 2 take-home due at 12 pm (noon)
14 Mon, Nov 24 Lab 7: Leavin’ on a jet plane
πŸ’» lab 7
πŸ“ hw 5
Lab 7 due at the end of lab session

Tue, Nov 25 Quantifying uncertainty with bootstrap intervals πŸŽ₯ Quantifying uncertainty
πŸŽ₯ Bootstrapping
πŸŽ₯ Code along :: Bootstrapping Duke Forest houses
πŸ“˜ ims - chp 11
πŸ“˜ ims - chp 12
πŸ–₯️ slides 24
πŸ—’οΈ notes 24
⌨️ ae 15
βœ… ae 15


Thu, Nov 27 No lecture - Thanksgiving



Sun, Nov 30


HW 5 due at 11:59 pm (will be accepted until Wed, Dec 3 at 11:59 pm without penalty)
15 Mon, Dec 1 Lab 8: Inference
πŸ’» lab 8
πŸ“ hw 6
Lab 8 due at the end of lab session

Tue, Dec 2 Making decisions with randomization tests πŸŽ₯ Hypothesis testing
πŸ“˜ ims - chp 11
πŸ–₯️ slides 25
πŸ—’οΈ notes 25


Thu, Dec 4 Looking further [To be posted] πŸ–₯️ slides 26
πŸ—’οΈ notes 26


Fri, Dec 5


HW 6 due at 11:59 pm (will be accepted until Sun, Dec 7 at 11:59 pm without penalty)

NA Final review (time TBD, location TBD)
πŸ“ final review
βœ… final review

16 Fri, Dec 12 Final (2 pm - 5 pm)