Lab 4

Changes in college athletics

Lab
Due: End of lab on Mon, Sep 29

Introduction

In this lab you’ll investigaete the relationship between age and opinion on the impact of the many changes (transfer portal, athlete name, image and likeness (NIL) compensation, conference realignments) taking place in Division I college athletics.

Make sure to upload your completed lab to Gradescope by the end of your lab session and commit and push your final version to GitHub.

Getting started

By now you should be familiar with how to get started with a lab assignment by cloning the GitHub repo for the assignment.

Click to expand if you need a refresher on how to get started with a lab assignment.
  • Go to https://cmgr.oit.duke.edu/containers and login with your Duke NetID and Password.
  • Click STA199 under My reservations to log into your container. You should now see the RStudio environment.
  • Go to the course organization at github.com/sta199-f25 organization on GitHub. Click on the repo with the prefix lab-4. It contains the starter documents you need to complete the homework.
  • Click on the green CODE button, select Use SSH. Click on the clipboard icon to copy the repo URL.
  • In RStudio, go to FileNew ProjectVersion ControlGit.
  • Copy and paste the URL of your assignment repo into the dialog box Repository URL. Again, please make sure to have SSH highlighted under Clone when you copy the address.
  • Click Create Project, and the files from your GitHub repo will be displayed in the Files pane in RStudio.

Open the lab-4.qmd template Quarto file and update the authors field to add your name first (first and last) and then your teammates’ names (first and last). Render the document. Examine the rendered document and make sure your and your teammates’ names are updated in the document. Commit and push your changes with a meaningful commit message and push to GitHub.

Click to expand if you need a refresher on assignment guidelines.

Code

Code should follow the tidyverse style. Particularly,

  • there should be spaces before and line breaks after each + when building a ggplot,
  • there should also be spaces before and line breaks after each |> in a data transformation pipeline,
  • code should be properly indented,
  • there should be spaces around = signs and spaces after commas.

Additionally, all code should be visible in the PDF output, i.e., should not run off the page on the PDF. Long lines that run off the page should be split across multiple lines with line breaks.1

Plots

  • Plots should have an informative title and, if needed, also a subtitle.
  • Axes and legends should be labeled with both the variable name and its units (if applicable).
  • Careful consideration should be given to aesthetic choices.

Workflow

Continuing to develop a sound workflow for reproducible data analysis is important as you complete the lab and other assignments in this course.

  • You should have at least 3 commits with meaningful commit messages by the end of the assignment.
  • Final versions of both your .qmd file and the rendered PDF should be pushed to GitHub.

Packages

In this lab we will work with the tidyverse package.

Data

YouGov, in collaboration with Elon University Poll and the Knight Commission on Intercollegiate Athletics, polled 1,500 US adults (aged 18 and older) between July 7-11, 2025.2 The following question was asked to these 1,500 adults:

Overall, how would you describe the impact of the many changes (transfer portal, athlete name, image and likeness (NIL) compensation, conference realignments3) taking place in Division I college athletics?

Responses were broken down into the following categories:

Variable Levels
Age 18-44; 45+
Opinion Very positive; Somewhat positive; Neutral; Somewhat negative; Very negative; Unsure

Of the 1,500 responses, 699 were between the ages of 18-44.

Of the individuals that are between 18-44,

  • 78 individuals said they thought the changes were Very positive,
  • 176 individuals said they thought the changes were Somewhat positive,
  • 162 individuals said they thought the changes were Neutral,
  • 50 individuals said they thought the changes were Somewhat negative,
  • 36 individuals said they thought the changes were Very negative.

Of the individuals that are 45+,

  • 41 individuals said they thought the changes were Very positive,
  • 121 individuals said they thought the changes were Somewhat positive,
  • 186 individuals said they thought the changes were Neutral,
  • 146 individuals said they thought the changes were Somewhat negative,
  • 97 individuals said they thought the changes were Very negative.

Questions

Question 1

  1. Fill in the code below to create a two-way table that summarizes these data by filling in the blanks below.
survey_counts <- tribble(
  ~age,    ~opinion,            ~n,
  "18-44", "Very positive",     ___,
  "18-44", "Somewhat positive", ___,
  "18-44", "Neutral",           ___,
  "18-44", "Somewhat negative", ___,
  "18-44", "Very negative",     ___,
  "18-44", "Unsure",            ___,
  "45+",   "Very positive",     ___,
  "45+",   "Somewhat positive", ___,
  "45+",   "Neutral",           ___,
  "45+",   "Somewhat negative", ___,
  "45+",   "Very negative",     ___,
  "45+",   "Unsure",            ___
) |>
  mutate(
    age     = fct_relevel(age, ___),
    opinion = fct_relevel(opinion, ___)
  )

survey_counts |>
  pivot_wider(
    names_from = ___,
    values_from = ___
  )

For parts b-d below, use a single pipeline starting with survey_counts, calculate the desired proportions, and make sure the result is an ungrouped data frame with a column for relevant counts, a column for relevant proportions, and a column for the groups you’re interested in.

  1. Marginal proportions of age: Calculate the proportions of individuals who are 18-44 year olds and 45+ year-olds in this sample.

  2. Marginal proportions of opinion: Calculate the proportions of individuals who are Very positive, Somewhat positive, Neutral, Somewhat negative, Very negative, and Unsure.

  3. Conditional proportions of opinion based on age: Calculate the proportions of individuals who are Very positive, Somewhat positive, Neutral, Somewhat negative, Very negative, and Unsure

    • among those who are 18-44 years old and
    • among those who are 45+ years old.

Question 2

  1. What type of plot would be appropriate to visualize the relationship between age and opinion on the impact of the many changes taking place in Division I college athletics?

  2. Create the plot, using geom_col(), from part (a) that can be used to evaluate the relationship between age and opinion on the impact of the many changes taking place in Division I college athletics? Use the discrete viridis color scale for the fill aesthetic, scale_fill_viridis_d(). You cal review the documentation for this function and choose a Viridis color scale other than the default, but you must use one of these since the data are ordinal and an ordinal color scale is most appropriate. Make sure to include appropriate labels and a title (and also a subtitle if you wish).

Tip

Your visualization should be displaying the proportions you calculated in Question 1(d).

  1. Based on your calculations so far, as well as your visualization, write 1-2 sentences that describe the relationship, in this sample, between age and opinion on the impact of the many changes taking place in Division I college athletics?

Render, commit, and push one last time. Make sure that you commit and push all changed documents and that your Git pane is completely empty before proceeding.

Wrap-up

Warning

Before you wrap up the assignment, make sure that you render, commit, and push one final time so that the final versions of both your .qmd file and the rendered PDF are pushed to GitHub and your Git pane is empty. We will be checking these to make sure you have been practicing how to commit and push changes.

Submission

By now you should also be familiar with how to submit your assignment in Gradescope.

Click to expand if you need a refresher on how to get started with a lab assignment.

Submit your PDF document to Gradescope by the end of the lab to be considered “on time”:

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials \(\rightarrow\) Duke NetID and log in using your NetID credentials.
  • Click on your STA 199 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with question. All the pages of your lab should be associated with at least one question (i.e., should be “checked”).
Checklist

Make sure you have:

  • attempted all questions
  • rendered your Quarto document
  • committed and pushed everything to your GitHub repository such that the Git pane in RStudio is empty
  • uploaded your PDF to Gradescope

Grading and feedback

  • This lab is worth 30 points:
    • 10 points for being in lab and turning in something – no partial credit for this part.
    • 20 points for:
      • answering the questions correctly – there is partial credit for this part.
      • following the workflow – there is partial credit for this part.
  • The workflow points are for:
    • committing at least three times as you work through your lab,
    • having your final version of .qmd and .pdf files in your GitHub repository, and
    • overall organization.
  • You’ll receive feedback on your lab on Gradescope within a week.

Good luck, and have fun with it!

Footnotes

  1. Remember, haikus not novellas when writing code!↩︎

  2. Full survey results can be found at https://eloncdn.blob.core.windows.net/eu3/sites/819/2025/07/Elon-Knight-Commission-survey-TOPLINE.pdf.↩︎

  3. The transfer portal is an online database for college student-athletes who wish to transfer to a different school. Name, image, and likeness (NIL) compensation allows college athletes to earn money from third-party companies for using their “name, image, and likeness” through activities like endorsements, social media promotions, and public appearances. Conference realignments refer to the shifting of colleges and universities between athletic conferences, which can affect competition levels, revenue distribution, and media exposure.↩︎