Basics of the by() function in R

The by() function in R is an easy function that allows us to group data within a data set, and perform mathematical functions on it.

It takes a vector, a string, a matrix, or a data frame as input and computes that data based on the mentioned functions.


Let’s start with the syntax

The by() function takes the data as input and computes that based on a given function.

by(x,indices,FUN)

Where,

  • X = The input data frame.
  • Indices = It is the list of variables or factors.
  • FUN = The function which needs to be applied for the variables/factors.

A Simple Example of by() function in R

In this section, we are going to try out a simple example. For this purpose, we are using the ‘iris’ dataset. The reason to use this dataset is that it contains categorical data with respect to the numerical value.

Let’s import the dataset by the following code.

#importing data and assigning to variable df
df<-iris

#computes the mean for species categories in terms of petal.width 
by(df$Petal.Width,list(df$Species),mean)
Output = 
---------------------------------------------------------
: setosa
[1] 0.246
---------------------------------------------------------- 
: versicolor
[1] 1.326
---------------------------------------------------------- 
: virginica
[1] 2.026
----------------------------------------------------------

In the above section, you can see that the by function returns the mean of the species category by grouping them with petal.width. Similarly, you can pass any function to by() and it will return the output based on specified variables.

df<-iris

by(df$Petal.Width,list(df$Species),median)

Output =
---------------------------------------------------------
: setosa
[1] 0.2
---------------------------------------------------------- 
: versicolor
[1] 1.3
---------------------------------------------------------- 
: virginica
[1] 2
----------------------------------------------------------

In the above output, I have passed the median function as input to by and the by() function returns the computed values. i.e. mean values of species categories in terms of petal.width.


by() function with multiple lists

In this section, we are going to see how we can add multiple lists to the by() function.

For this purpose, we are going to use the ‘Toothgrowth’ dataset as it consists of multiple columns as well as the categorical data.

Let’s see how it works.

#importing dataset
df<-ToothGrowth

#passing multiple columns 
by(df$len,list(df$supp,df$dose),mean)
Output = 

: OJ
: 0.5
[1] 13.23
---------------------------------------------------------- 
: VC
: 0.5
[1] 7.98
---------------------------------------------------------- 
: OJ
: 1
[1] 22.7
---------------------------------------------------------- 
: VC
: 1
[1] 16.77
---------------------------------------------------------- 
: OJ
: 2
[1] 26.06
---------------------------------------------------------- 
: VC
: 2
[1] 26.14

As you can observe in the above output, we have passed 2 lists in our code and the by() function has returned the computed values. You can see the mean of the length of supp and dose.

Well, the by() function is very useful in understanding the statistical relationship between two variables in a dataset.

You can easily analyze one variable with respect to multiple variables and thereby understand the data behavior.


Wrapping up

The by() function in R is highly useful during data analysis as explained above. You can analyze one variable with respect to another variable and you can also pass the function for the computation.

In this article, we have passed the mean function. Likewise, you can pass the required function to get the computed values as output.

That’s all for now. Happy analyzing!!!

More study: R documentation

Ninad Pathak
Ninad Pathak
Articles: 55