Data types and classes

Lecture 9

Author
Affiliation

Dr. Mine Çetinkaya-Rundel

Duke University
STA 199 - Fall 2025

Published

September 23, 2025

Warm-up

While you wait: Participate 📱💻

Fill in the blanks:

I’m a _____ (first-year, sophomore, junior, senior)

and on Tuesdays I have _____ class(es).

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.

Announcements

  • Survey: Confidence in STEM courses at Duke

  • Exam 1:

    • In class on Thu, Oct 2
    • Take home Thu, Oct 2 after class until Sat, Oct 4 at noon
    • Covers lectures 1-10, labs 1-4, and homeworks 1-3
    • Practice exam to be posted on Friday, exam review on Tue, Sep 30

Data types

How many classes do you have on Tuesdays?

survey
# A tibble: 4 × 2
  tue_classes year      
  <chr>       <chr>     
1 <NA>        <NA>      
2 2           Sophomore 
3 three       First-year
4 1           Senior    

Variable types

What type of variable is tue_classes?

survey
# A tibble: 4 × 2
  tue_classes year      
  <chr>       <chr>     
1 <NA>        <NA>      
2 2           Sophomore 
3 three       First-year
4 1           Senior    

Let’s (attempt to) clean it up…

survey <- survey |>
  mutate(
    tue_classes = case_when(
      tue_classes == "three" ~ "3",
      # add more conditions as needed
      .default = tue_classes
    ),
    tue_classes = as.numeric(tue_classes)
  )

survey
# A tibble: 4 × 2
  tue_classes year      
        <dbl> <chr>     
1          NA <NA>      
2           2 Sophomore 
3           3 First-year
4           1 Senior    

Data types

Data types in R

  • logical
  • double
  • integer
  • character
  • and some more, but we won’t be focusing on those

Logical & character

logical - Boolean values TRUE and FALSE


typeof(TRUE)
[1] "logical"

character - character strings



typeof("First-year")
[1] "character"

Double & integer

double - floating point numerical values (default numerical type)


typeof(2.5)
[1] "double"
[1] "double"

integer - integer numerical values (indicated with an L)


typeof(3L)
[1] "integer"
typeof(1:3)
[1] "integer"

Concatenation

Vectors can be constructed using the c() function.

  • Numeric vector:
c(1, 2, 3)
[1] 1 2 3

. . .

  • Character vector:
c("Hello", "World!")
[1] "Hello"  "World!"

. . .

  • Vector made of vectors:
c(c("hi", "hello"), c("bye", "jello"))
[1] "hi"    "hello" "bye"   "jello"

Converting between types

with intention…

x <- 1:3
x
[1] 1 2 3
[1] "integer"
y <- as.character(x)
y
[1] "1" "2" "3"
[1] "character"

Converting between types

with intention…

x <- c(TRUE, FALSE)
x
[1]  TRUE FALSE
[1] "logical"
y <- as.numeric(x)
y
[1] 1 0
[1] "double"

Converting between types

without intention…

c(2, "Just this one!")
[1] "2"              "Just this one!"

. . .

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that’s not always a great thing!

Converting between types

without intention…

c(FALSE, 3L)
[1] 0 3

. . .

c(FALSE, 1.2)
[1] 0.0 1.2

. . .

c(2L, "two")
[1] "2"   "two"

. . .

c(TRUE, "two")
[1] "TRUE" "two" 

Participate 📱💻

What is the output of typeof(c(1.2, 3L))?

  • "character"
  • "double"
  • "integer"
  • "logical"

Scan the QR code or go to app.wooclap.com/sta199. Log in with your Duke NetID.

Explicit vs. implicit coercion

Explicit coercion:

When you call a function like as.logical(), as.numeric(), as.integer(), as.double(), or as.character().

Implicit coercion:

Happens when you use a vector in a specific context that expects a certain type of vector.

Data classes

Data classes

  • Vectors are like Lego building blocks
  • We stick them together to build more complicated constructs, e.g. representations of data
  • The class attribute relates to the S3 class of an object which determines its behaviour
    • You don’t need to worry about what S3 classes really mean, but you can read more about it here if you’re curious
  • Examples: factors, dates, and data frames

Factors

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values

class_years <- factor(
  c(
    "First-year",
    "Sophomore",
    "Sophomore",
    "Senior",
    "Junior"
  )
)
class_years
[1] First-year Sophomore  Sophomore  Senior     Junior    
Levels: First-year Junior Senior Sophomore
typeof(class_years)
[1] "integer"
class(class_years)
[1] "factor"

More on factors

We can think of factors like character (level labels) and an integer (level numbers) glued together

glimpse(class_years)
 Factor w/ 4 levels "First-year","Junior",..: 1 4 4 3 2
as.integer(class_years)
[1] 1 4 4 3 2

Dates

today <- as.Date("2025-09-23")
today
[1] "2025-09-23"
typeof(today)
[1] "double"
class(today)
[1] "Date"

More on dates

We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin) glued together

as.integer(today)
[1] 20354
as.integer(today) / 365 # roughly 55 yrs
[1] 55.76438

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df
  x y
1 1 3
2 2 4
typeof(df)
[1] "list"
class(df)
[1] "data.frame"

Lists

Lists are a generic vector container; vectors of any type can go in them

l <- list(
  x = 1:4,
  y = c("hi", "hello", "jello"),
  z = c(TRUE, FALSE)
)
l
$x
[1] 1 2 3 4

$y
[1] "hi"    "hello" "jello"

$z
[1]  TRUE FALSE

Lists and data frames

  • A data frame is a special list containing vectors of equal length
df
  x y
1 1 3
2 2 4
  • When we use the pull() function, we extract a vector from the data frame
df |>
  pull(y)
[1] 3 4

Working with factors

Read data in as character strings

survey
# A tibble: 4 × 2
  tue_classes year      
        <dbl> <chr>     
1          NA <NA>      
2           2 Sophomore 
3           3 First-year
4           1 Senior    

But coerce when plotting

ggplot(survey, mapping = aes(x = year)) +
  geom_bar()

Use forcats to reorder levels

survey |>
  mutate(
    year = fct_relevel(year, "First-year", "Sophomore", "Junior", "Senior")
  ) |>
  ggplot(mapping = aes(x = year)) +
  geom_bar()
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `year = fct_relevel(year, "First-year", "Sophomore",
  "Junior", "Senior")`.
Caused by warning:
! 1 unknown level in `f`: Junior

A peek into forcats

Reordering levels by:

  • fct_relevel(): hand

  • fct_infreq(): frequency

  • fct_reorder(): sorting along another variable

  • fct_rev(): reversing

Changing level values by:

  • fct_lump(): lumping uncommon levels together into “other”

  • fct_other(): manually replacing some levels with “other”

Application exercise

ae-07-durham-climate-factors

  • Go to your ae project in RStudio.

  • If you haven’t yet done so, make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.

  • If you haven’t yet done so, click Pull to get today’s application exercise file: ae-07-durham-climate-factors.qmd.

  • Work through the application exercise in class, and render, commit, and push your edits by the end of class.