The apply family

R
loops
R-SIG
R-SIG 31.07.2023
Published

July 31, 2023

1

I can highly recommend the according chapter in R for Data Science in case you want to dive deeper.

For-loops

In the last SIG we talked about for-loops.
While for is definitely the most flexible of the looping options, we suggest you avoid it wherever you can, for the following two reasons:

    1. It is not very expressive, i.e. takes a lot of code to do what you want.
    1. It permits you to write horrible code.

Let’s consider this example:

example_list <- list(
  "vec_1" = c(1:10),
  "vec_2" = c(100:400),
  "vec_3" = c(80:97, NA)
)
str(example_list)
List of 3
 $ vec_1: int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ vec_2: int [1:301] 100 101 102 103 104 105 106 107 108 109 ...
 $ vec_3: int [1:19] 80 81 82 83 84 85 86 87 88 89 ...

Here we have a list consisting of three vectors. Our goal is to sum them an output the result into a new vector. We could use a for-loop to do that:

vec_sum <- c()
for(i in 1: length(example_list)){
  vec_sum[i] <- sum(example_list[[i]], na.rm = TRUE)
}
vec_sum
[1]    55 75250  1593

Okay, that doesn’t look that complicated. But still, we need to define an empty vector at the beginning so we can save our sums, we need to iterate from 1:length(example_list), and manually select the \(i^{th}\) from the input list. That is not very expressive, and can be solved a lot easier. Enter, the apply-family:

The apply-family

The apply-functions apply a function to a vector, list, matrix … and also always return a vector, list matrix …, depending on the specific function. Let’s rewrite our for-loop with sapply():

vec_sum <- sapply(example_list, sum)
vec_sum
vec_1 vec_2 vec_3 
   55 75250    NA 

A lot less code and easier to understand! We just go over every list element and calculate its sum.

If we want to add another function argument, we can do that as well:

vec_sum <- sapply(example_list, sum, na.rm = TRUE)
vec_sum
vec_1 vec_2 vec_3 
   55 75250  1593 

We can also define our own function:

vec_sum <- sapply(example_list, function(x){
  res_sum <- sum(x, na.rm = TRUE)
  print(res_sum)
  return(res_sum)
})
[1] 55
[1] 75250
[1] 1593

Here we calculate the sum of object x, and then print it.

Finally, which makes for even nicer code, we can define the function externally, to give it a concise name:

print_sum <- function(vec){
  res_sum <- sum(vec, na.rm = TRUE)
  print(res_sum)
  return(res_sum)
}

vec_sum <- sapply(example_list, print_sum)
[1] 55
[1] 75250
[1] 1593
vec_sum
vec_1 vec_2 vec_3 
   55 75250  1593 

Depending of the output we want, we can choose different apply-functions:

sapply()

sapply() simplifies the result, so, e.g., it will return a vector if possible:

sapply(example_list, print_sum)
[1] 55
[1] 75250
[1] 1593
vec_1 vec_2 vec_3 
   55 75250  1593 

vapply()

Similar to sapply(), but we can pre-specify a return value, so it might be safer to use:

vapply(example_list, print_sum, integer(1))
[1] 55
[1] 75250
[1] 1593
vec_1 vec_2 vec_3 
   55 75250  1593 

Because the result is an integer vector, we don’t get an error, but if we write this:

vapply(example_list, print_sum, character(1))
[1] 55
Error in vapply(example_list, print_sum, character(1)): values must be type 'character',
 but FUN(X[[1]]) result is type 'integer'

The function returns an error, because its output is an integer, and not a character vector.

lapply()

Returns a list:

lapply(example_list, print_sum)
[1] 55
[1] 75250
[1] 1593
$vec_1
[1] 55

$vec_2
[1] 75250

$vec_3
[1] 1593
Exercises

Work with the iris data.frame (it is already included in Base R):

Exercise 1

Write a for-loop to determine the median of each column, if it is numeric. If not, return the column class with class(). Save the results in a character vector, so every element should be converted to character before saving it in the vector.

vec_median <- c()
for(i in 1:ncol(iris)){
  if(is.numeric(iris[, i])){
    vec_median[i] <- as.character(median(iris[, i], na.rm = TRUE))
  } else{
    vec_median[i] <- class(iris[, i])
  }
}

vec_median
[1] "5.8"    "3"      "4.35"   "1.3"    "factor"

Exercise 2

  1. Define the body of the for loop as its own function. This function should take a vector, and, if this vector is numeric, output the median as a character, otherwise the class of the vector.
check_median <- function(vec){
  if(is.numeric(vec)){
    result <- median(vec, na.rm = TRUE)
  } else{
    result <- class(vec)
  }
  ## Convert to character, so our function always returns the correct type
  result <- as.character(result)
  return(result)
}

## Check it:
check_median(c(100, 1000))
[1] "550"
check_median(c("a", "b"))
[1] "character"
  1. Use it in the for-loop.
vec_median <- c()
for(i in 1:ncol(iris)){
  vec_median[i] <- check_median(iris[, i])
}

vec_median
[1] "5.8"    "3"      "4.35"   "1.3"    "factor"

Exercise 3

Rewrite the for-loop from Exercise 1 with functions from the apply-family, so it returns the following objects. Define the function that gets applied on every input element externally, so we have cleaner code.

  1. A vector.
sapply(iris, check_median)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
       "5.8"          "3"       "4.35"        "1.3"     "factor" 

Or, even better:

vapply(iris, check_median, character(1))
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
       "5.8"          "3"       "4.35"        "1.3"     "factor" 

Wow, that’s pretty nice, we condensed our function to half a line by defining the function somewhere else, and not using a for-loop!

  1. A list.
lapply(iris, check_median)
$Sepal.Length
[1] "5.8"

$Sepal.Width
[1] "3"

$Petal.Length
[1] "4.35"

$Petal.Width
[1] "1.3"

$Species
[1] "factor"

Footnotes

  1. Image by Kier in Sight Archives on Unsplash.↩︎