AE 01: Income inequality

Packages

For this application exercise, we’ll use the tidyverse package.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4          ✔ readr     2.1.5     
✔ forcats   1.0.0          ✔ stringr   1.5.1     
✔ ggplot2   3.5.2          ✔ tibble    3.3.0.9004
✔ lubridate 1.9.4          ✔ tidyr     1.3.1     
✔ purrr     1.1.0          
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Demo: Run the code cell above and inspect the message printed in the Console.

Data

Let’s load the data.

income_inequality <- read_csv("data/income-inequality.csv")

Rows: 549 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, country_code
dbl (3): year, gini_pre_tax, gini_post_tax

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

And take a look at the first 10 rows.

income_inequality

# A tibble: 549 × 5
   country   country_code  year gini_pre_tax gini_post_tax
   <chr>     <chr>        <dbl>        <dbl>         <dbl>
 1 Australia AUS           1989        0.431         0.304
 2 Australia AUS           1995        0.47          0.311
 3 Australia AUS           2001        0.481         0.32 
 4 Australia AUS           2003        0.469         0.316
 5 Australia AUS           2004        0.467         0.316
 6 Australia AUS           2008        0.469         0.335
 7 Australia AUS           2010        0.47          0.333
 8 Australia AUS           2014        0.481         0.329
 9 Australia AUS           2016        0.467         0.326
10 Australia AUS           2018        0.469         0.329
# ℹ 539 more rows

The variables in the dataset are:

country: Country name
country_code: Country code
year: Year
gini_pre_tax: Gini coefficient before taxes
gini_post_tax: Gini coefficient after taxes

Demo: And take a look at the structure of the data as well with the glimpse() function.

glimpse(income_inequality)

Rows: 549
Columns: 5
$ country       <chr> "Australia", "Australia", "Australia", "Australia", "Aus…
$ country_code  <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", …
$ year          <dbl> 1989, 1995, 2001, 2003, 2004, 2008, 2010, 2014, 2016, 20…
$ gini_pre_tax  <dbl> 0.431, 0.470, 0.481, 0.469, 0.467, 0.469, 0.470, 0.481, …
$ gini_post_tax <dbl> 0.304, 0.311, 0.320, 0.316, 0.316, 0.335, 0.333, 0.329, …

Demo: Render, commit (with an appropriate commit message), and push your changes to GitHub.

Your turn: replace the blank below with the number of rows in the income_inequality data frame based on the output of the chunk below. Then, replace it with “inline code” and render again. Then, commit (with an appropriate commit message) and push your changes to GitHub.

The income_inequality data frame has 549 observations. Each observation represents a country in a given year.

Income inequality over time in a country

Let’s explore income inequality over time in a country.

income_inequality |>
  filter(country == "Denmark") |>
  pivot_longer(
    cols = contains("gini"),
    names_to = "gini_type",
    values_to = "gini_coefficient"
  ) |>
  mutate(
    gini_type = if_else(gini_type == "gini_pre_tax", "Pre-tax", "Post-tax"),
    gini_type = fct_relevel(gini_type, "Pre-tax", "Post-tax")
  ) |>
  ggplot(aes(x = year, y = gini_coefficient, color = gini_type)) +
  geom_line(linewidth = 1)

Your turn: Create a similar plot for another country of your choice by by modifying the code in the code cell labeled plot-single above. Then, render, commit (with an appropriate commit message), and push your changes to GitHub.

Hint: Open the data frame in the data viewer by clicking on its name in the Environment pane to see the list of countries.

Income inequality over time in many countries

Now let’s explore income inequality over time in many countries.

income_inequality |>
  filter(country %in% c("United States", "Sweden", "South Africa")) |>
  pivot_longer(
    cols = contains("gini"),
    names_to = "gini_type",
    values_to = "gini_coefficient"
  ) |>
  mutate(
    gini_type = if_else(gini_type == "gini_pre_tax", "Pre-tax", "Post-tax"),
    gini_type = fct_relevel(gini_type, "Pre-tax", "Post-tax")
  ) |>
  ggplot(aes(x = year, y = gini_coefficient, color = gini_type)) +
  geom_line(linewidth = 1) +
  facet_wrap(~country, ncol = 1)

Your turn: Change as many countries as you like except for the United States by modifying the code in the cell labeled plot-many above. You can replace country names like we did before or add or remove them. Then, render to see your changes. Discuss your findings with your neighbor – Why did you choose those countries? How do the income inequalities of these countries compare? Finally, commit your changes (with an appropriate commit message), and push your changes to GitHub.