── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0.9004
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
AE 01: Income inequality
Suggested answers
These are suggested answers. This document should be used as a reference only; it’s not designed to be an exhaustive key.
Demo: Replace “Your Name” above with your actual name. Then, render the document to inspect the changes. Once you confirm you’re happy with the changes, commit your changes (with an appropriate commit message) by staging ALL of the changed files, and push them to GitHub. In your browser, navigate to your GitHub repository and confirm that the changes are reflected there.
The dataset we will explore is in the data folder and it’s called income-inequality.csv
. It contains data on income inequality, measured by the GINI coefficient for different countries and years.
From Wikipedia:
A Gini coefficient of 0 reflects perfect equality, where all income or wealth values are the same. In contrast, a Gini coefficient of 1 (or 100%) reflects maximal inequality among values, where a single individual has all the income while all others have none.
In this dataset we have two Gini coefficients: one for income inequality before taxes and another for income inequality after taxes. These two values allow us to explore how much countries redistribute income. In addition, since we have data for multiple years, we can also explore how income inequality has changed over time.
Packages
For this application exercise, we’ll use the tidyverse package.
Demo: Run the code cell above and inspect the message printed in the Console.
Data
Let’s load the data.
income_inequality <- read_csv("data/income-inequality.csv")
Rows: 549 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, country_code
dbl (3): year, gini_pre_tax, gini_post_tax
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
And take a look at the first 10 rows.
income_inequality
# A tibble: 549 × 5
country country_code year gini_pre_tax gini_post_tax
<chr> <chr> <dbl> <dbl> <dbl>
1 Australia AUS 1989 0.431 0.304
2 Australia AUS 1995 0.47 0.311
3 Australia AUS 2001 0.481 0.32
4 Australia AUS 2003 0.469 0.316
5 Australia AUS 2004 0.467 0.316
6 Australia AUS 2008 0.469 0.335
7 Australia AUS 2010 0.47 0.333
8 Australia AUS 2014 0.481 0.329
9 Australia AUS 2016 0.467 0.326
10 Australia AUS 2018 0.469 0.329
# ℹ 539 more rows
The variables in the dataset are:
-
country
: Country name -
country_code
: Country code -
year
: Year -
gini_pre_tax
: Gini coefficient before taxes -
gini_post_tax
: Gini coefficient after taxes
Demo: And take a look at the structure of the data as well with the glimpse()
function.
glimpse(income_inequality)
Rows: 549
Columns: 5
$ country <chr> "Australia", "Australia", "Australia", "Australia", "Aus…
$ country_code <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", …
$ year <dbl> 1989, 1995, 2001, 2003, 2004, 2008, 2010, 2014, 2016, 20…
$ gini_pre_tax <dbl> 0.431, 0.470, 0.481, 0.469, 0.467, 0.469, 0.470, 0.481, …
$ gini_post_tax <dbl> 0.304, 0.311, 0.320, 0.316, 0.316, 0.335, 0.333, 0.329, …
Demo: Render, commit (with an appropriate commit message), and push your changes to GitHub.
Your turn: replace the blank below with the number of rows in the income_inequality
data frame based on the output of the chunk below. Then, replace it with “inline code” and render again. Then, commit (with an appropriate commit message) and push your changes to GitHub.
The income_inequality
data frame has 549 observations. Each observation represents a country in a given year.
Income inequality over time in a country
Let’s explore income inequality over time in a country.
income_inequality |>
filter(country == "Denmark") |>
pivot_longer(
cols = contains("gini"),
names_to = "gini_type",
values_to = "gini_coefficient"
) |>
mutate(
gini_type = if_else(gini_type == "gini_pre_tax", "Pre-tax", "Post-tax"),
gini_type = fct_relevel(gini_type, "Pre-tax", "Post-tax")
) |>
ggplot(aes(x = year, y = gini_coefficient, color = gini_type)) +
geom_line(linewidth = 1)
Your turn: Create a similar plot for another country of your choice by by modifying the code in the code cell labeled plot-single
above. Then, render, commit (with an appropriate commit message), and push your changes to GitHub.
Hint: Open the data frame in the data viewer by clicking on its name in the Environment pane to see the list of countries.
Income inequality over time in many countries
Now let’s explore income inequality over time in many countries.
income_inequality |>
filter(country %in% c("United States", "Sweden", "South Africa")) |>
pivot_longer(
cols = contains("gini"),
names_to = "gini_type",
values_to = "gini_coefficient"
) |>
mutate(
gini_type = if_else(gini_type == "gini_pre_tax", "Pre-tax", "Post-tax"),
gini_type = fct_relevel(gini_type, "Pre-tax", "Post-tax")
) |>
ggplot(aes(x = year, y = gini_coefficient, color = gini_type)) +
geom_line(linewidth = 1) +
facet_wrap(~country, ncol = 1)
Your turn: Change as many countries as you like except for the United States by modifying the code in the cell labeled plot-many
above. You can replace country names like we did before or add or remove them. Then, render to see your changes. Discuss your findings with your neighbor – Why did you choose those countries? How do the income inequalities of these countries compare? Finally, commit your changes (with an appropriate commit message), and push your changes to GitHub.