Lecture 14
Duke University
STA 199 - Fall 2025
October 9, 2025
The following code in chronicle-scrape.R extracts titles of an opinion article from The Chronicle website:
Which of the following needs to change to extract column titles instead?
read_html()
html_elements() to html_element()
.space-y-4 .font-extrabold to .space-y-4 .text-brand
html_text() to html_attr()

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
HW 2, Question 7: Reproduce the colorful box plot – We caught an error in grading (any theme with a white background would have worked). If you originally missed points due to not using theme_bw(), but you used another theme with a white background, we’ve updated your grade.
Midsemester course survey due tonight at 11:59pm
Project proposals (Milestone 2) + first peer evaluation due next Thursday at 11:59pm – any questions?
Go to https://www2.stat.duke.edu/~cr173/data/dukechronicle-opinion/www.dukechronicle.com/section/opinionabc4.html (copy of The Chronicle opinion section as of October 7, 2025).

Go to your ae project in RStudio.
If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
If you haven’t yet done so, click Pull to get today’s application exercise file: ae-09-chronicle-scrape.qmd and chronicle-scrape.R.
Put the folllowing tasks in order to scrape data from a website:
read_html() to read the page’s source code into R
Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.
When working in a Quarto document, your analysis is re-run each time you knit
If web scraping in a Quarto document, you’d be re-scraping the data each time you knit, which is undesirable (and not nice)!
An alternative workflow: