I can highly recommend the according chapter in R for Data Science in case you want to dive deeper.
For-loops
In the last SIG we talked about for-loops.
While for is definitely the most flexible of the looping options, we suggest you avoid it wherever you can, for the following two reasons:
It is not very expressive, i.e. takes a lot of code to do what you want.
Okay, that doesn’t look that complicated. But still, we need to define an empty vector at the beginning so we can save our sums, we need to iterate from 1:length(example_list), and manually select the \(i^{th}\) from the input list. That is not very expressive, and can be solved a lot easier. Enter, the apply-family:
The apply-family
The apply-functions apply a function to a vector, list, matrix … and also always return a vector, list matrix …, depending on the specific function. Let’s rewrite our for-loop with sapply():
vec_sum <-sapply(example_list, sum)vec_sum
vec_1 vec_2 vec_3
55 75250 NA
A lot less code and easier to understand! We just go over every list element and calculate its sum.
If we want to add another function argument, we can do that as well:
Depending of the output we want, we can choose different apply-functions:
sapply()
sapply() simplifies the result, so, e.g., it will return a vector if possible:
sapply(example_list, print_sum)
[1] 55
[1] 75250
[1] 1593
vec_1 vec_2 vec_3
55 75250 1593
vapply()
Similar to sapply(), but we can pre-specify a return value, so it might be safer to use:
vapply(example_list, print_sum, integer(1))
[1] 55
[1] 75250
[1] 1593
vec_1 vec_2 vec_3
55 75250 1593
Because the result is an integer vector, we don’t get an error, but if we write this:
vapply(example_list, print_sum, character(1))
[1] 55
Error in vapply(example_list, print_sum, character(1)): values must be type 'character',
but FUN(X[[1]]) result is type 'integer'
The function returns an error, because its output is an integer, and not a character vector.
lapply()
Returns a list:
lapply(example_list, print_sum)
[1] 55
[1] 75250
[1] 1593
$vec_1
[1] 55
$vec_2
[1] 75250
$vec_3
[1] 1593
Exercises
Work with the iris data.frame (it is already included in Base R):
Exercise 1
Write a for-loop to determine the median of each column, if it is numeric. If not, return the column class with class(). Save the results in a character vector, so every element should be converted to character before saving it in the vector.
Define the body of the for loop as its own function. This function should take a vector, and, if this vector is numeric, output the median as a character, otherwise the class of the vector.
Caution
check_median <-function(vec){if(is.numeric(vec)){ result <-median(vec, na.rm =TRUE) } else{ result <-class(vec) }## Convert to character, so our function always returns the correct type result <-as.character(result)return(result)}## Check it:check_median(c(100, 1000))
Rewrite the for-loop from Exercise 1 with functions from the apply-family, so it returns the following objects. Define the function that gets applied on every input element externally, so we have cleaner code.
A vector.
Caution
sapply(iris, check_median)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"5.8" "3" "4.35" "1.3" "factor"
Or, even better:
vapply(iris, check_median, character(1))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
"5.8" "3" "4.35" "1.3" "factor"
Wow, that’s pretty nice, we condensed our function to half a line by defining the function somewhere else, and not using a for-loop!