Now, let’s suppose we want to add a second column containing the continent of the country. We can either use nested ifelse() statements, which makes the coed quite hard to read:
country continent
1 Deu Europe
2 Deu Europe
3 Deu Europe
4 Deu Europe
5 Mexico America
6 Peru America
7 Ghana Africa
8 China Asia
9 Spanien Europe
case_when() has a slightly different syntax, but is not nested, which makes it easier to read. Condition and output are separated by ~. So if the condition on the left side is met in a row, the function returns the value on the right side of ~:
df_2 <- df %>%mutate(continent =case_when(country %in%c("Deu", "Spanien") ~"Europe", country %in%c("Mexico", "Peru") ~"America", country =="Ghana"~"Africa", TRUE~"Another continent" ) )df_2
country continent
1 Deu Europe
2 Deu Europe
3 Deu Europe
4 Deu Europe
5 Mexico America
6 Peru America
7 Ghana Africa
8 China Another continent
9 Spanien Europe
We wrap this statement into a mutate function to automatically create the new column continent from the output of case_when. The TRUE in the last row catches all conditions we haven’t dealt with further above. So all rows which haven’t met any of the above conditions will get the label “Another continent”.
Evaluation order
case_when() goes from the top to the bottom. So if a row has met a statement, it is not considered further down. That’s why it makes sense to go from the most specific statements to the less specific ones. Otherwise the least specific ones might overwrite everything in the beginning:
df_3 <- df %>%mutate(continent =case_when(country %in%c(df$country) ~"Other country", country %in%c("Mexico", "Peru") ~"America", country =="Ghana"~"Africa", TRUE~"Another continent" ) )df_3
country continent
1 Deu Other country
2 Deu Other country
3 Deu Other country
4 Deu Other country
5 Mexico Other country
6 Peru Other country
7 Ghana Other country
8 China Other country
9 Spanien Other country
Because our first statement already covers all rows, the rest is obsolete. This top-down working also makes the TRUE condition in our last line possible, because only those rows that haven’t been used yet will come this far, and all of them are caught (because TRUE always is true).