Lecture 25
Duke University
STA 199 - Fall 2025
December 2, 2025
Take 2 minutes to fill out the TA evaluation form – link in your email! Due Monday, December 8th.
Nominate a TA for the StatSci TA of the Year award by sending an email to dus@stat.duke.edu with a brief narrative for your nomination.



Which of the following is true about confidence intervals?

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
To estimate plausible values of a parameter of interest, e.g.,
Bootstrapping is a statistical procedure that resamples(with replacement) a single data set to create many simulated samples.
We then use these simulated samples to quantify the uncertainty around the sample statistic we’re interested in, e.g., a slope (\(b_1\)), a mean (\(\bar{x}\)), a proportion (\(\hat{p}\)).
Calculate the observed slope:
Take 1000 bootstrap samples and fit models to each one:
set.seed(1120)
boot_fits <- duke_forest |>
specify(price ~ area) |>
generate(reps = 1000, type = "bootstrap") |>
fit()
boot_fits# A tibble: 2,000 × 3
# Groups: replicate [1,000]
replicate term estimate
<int> <chr> <dbl>
1 1 intercept 47819.
2 1 area 191.
3 2 intercept 144645.
4 2 area 134.
5 3 intercept 114008.
6 3 area 161.
7 4 intercept 100639.
8 4 area 166.
9 5 intercept 215264.
10 5 area 125.
# ℹ 1,990 more rows
reps = 1000 times
Percentile method: Compute the 95% CI as the middle 95% of the bootstrap distribution:
If you want to be very certain (i.e., more confident) that you capture the population parameter, should we use a wider or a narrower interval?

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
What drawbacks are associated with using a wider interval?

How can we get best of both worlds – high precision and high accuracy?
Population: Complete set of observations of whatever we are studying, e.g., people, tweets, photographs, etc. – population size = \(N\)
Sample: Subset of the population, ideally random and representative – sample size = \(n\)
Sample statistic \(\ne\) population parameter, but if the sample is good, it can be a good estimate
Statistical inference: Discipline that concerns itself with the development of procedures, methods, and theorems that allow us to extract meaning and information from data that has been generated by stochastic (random) process
We report the estimate with a confidence interval, and the width of this interval depends on the variability of sample statistics from different samples from the population
Since we can’t continue sampling from the population, we bootstrap from the one sample we have to estimate sampling variability
Standard error method: Compute the 95% CI as the observed slope plus/minus ~2 * standard error (the standard deviation of the bootstrap distribution):
That quantity (~2 * standard error) is called the margin of error, e.g.,

In this class you learned how to construct a confidence interval (i.e., calculate the margin of error) using a computational method called bootstrapping.
The bootstrap distributions you constructed (given enough reps – repeated samples) were unimodal and symmetric around the observed statistic.
This is not a happenstance! And there is theory behind it… It’s called the Central Limit Theorem!
You can learn about the Central Limit Theorem and theory-based methods for constructing confidence intervals (and other inference procedures) in future stats courses.
Bootstrapping for categorical data
specify(response = x, success = "success level")
calculate(stat = "prop")
Bootstrapping for other stats
calculate() documentation: infer.tidymodels.org/reference/calculate.html
infer pipelines: infer.tidymodels.org/articles/observed_stat_examples.html
A hypothesis test is a statistical technique used to evaluate competing claims using data
Null hypothesis, \(H_0\): An assumption about the population. “There is nothing going on.”
Alternative hypothesis, \(H_A\): A research question about the population. “There is something going on”.
Note
Hypotheses are always at the population level!
Null hypothesis, \(H_0\): “There is nothing going on.” The slope of the model for predicting the prices of houses in Duke Forest from their areas is 0, \(\beta_1 = 0\).
Alternative hypothesis, \(H_A\): “There is something going on”. The slope of the model for predicting the prices of houses in Duke Forest from their areas is different than, \(\beta_1 \ne 0\).
Assume you live in a world where null hypothesis is true: \(\beta_1 = 0\).
Ask yourself how likely you are to observe the sample statistic, or something even more extreme, in this world:
\[P \big( b_1 \leq -159~or~b_1 \geq 159 ~|~ \beta_1 = 0 \big)\]
Null hypothesis, \(H_0\): Defendant is innocent
Alternative hypothesis, \(H_A\): Defendant is guilty
Start with a null hypothesis, \(H_0\), that represents the status quo
Set an alternative hypothesis, \(H_A\), that represents the research question, i.e. what we’re testing for
Conduct a hypothesis test under the assumption that the null hypothesis is true and calculate a p-value (probability of observed or more extreme outcome given that the null hypothesis is true)
… which we have already done:
# A tibble: 2,000 × 3
# Groups: replicate [1,000]
replicate term estimate
<int> <chr> <dbl>
1 1 intercept 594889.
2 1 area -12.6
3 2 intercept 477930.
4 2 area 29.5
5 3 intercept 581950.
6 3 area -7.93
7 4 intercept 487542.
8 4 area 26.0
9 5 intercept 643406.
10 5 area -30.0
# ℹ 1,990 more rows
Warning: Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the
`generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the
`generate()` step.
ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
# A tibble: 2 × 2
term p_value
<chr> <dbl>
1 area 0
2 intercept 0
Based on the p-value calculated, what is the conclusion of the hypothesis test?