Apply Functions in R and Python

In this post, I will go over some simple tools which can be used to create more efficient and concise R and Python scripts. First, I will explain the apply function in R and Python. Then, I will briefly go over anonymous functions in Python.

The Apply Function in R

The apply function is used to manipulate data frames, matrices, and lists. It takes a data frame and a function as inputs, and applies that function to each row or column of the data frame. In essence, the apply function is an alternative to “for” loops. 

The apply function has three main inputs: a data object, a margin variable, and a function. As mentioned earlier, the data object can have different formats. The margin variable specifies if the function applies to rows (MARGIN=1) or columns (MARGIN=2). The function can either be an built-in R function (e.g., sum or max) or a function that the user defines. The function can be defined both inside and outside the apply function.  

Example Problem

Here I will define a simple problem as our test case. The task is to find the maximum of each column and divide all the elements of that column by the maximum. We will use the iris data set, because it is available in R and in Python’s seaborn package.

# Load the iris data set
data(iris)

# Assign first four columns of the iris data set to a data frame
iris_df<-as.data.frame(iris[,1:4])

# Use the apply function to do the calculations of the example problem
output_max=as.data.frame(apply(iris_df, MARGIN = 2, FUN = function (x) x/max(x)))

Sometimes there are other, easier ways to do these calculations. However, when what you want to do is more complicated, this method comes in handy. The apply function has some other variants such as lapply, sapply, and mapply. Refer to this post (here) for more information about these functions.

The Apply Function in Python

The pandas package for Python also has a function called apply, which is equivalent to its R counterpart; the following code illustrates how to use it. In pandas, axis=0 specifies columns and axis=1 specifies rows. Note that in this example I have defined a function outside of the apply, and imported it to calculate the maximum and the ratio-to-maximum. In the next section, I will present an alternative way of defining in-line functions in Python.

# The iris data set is available in the seaborn package in python
import seaborn as sns
import pandas

# The following script loads the iris data set into a data frame
iris = sns.load_dataset('iris')

# Define an external function to calculate the ratio-to-maximum 
def ratio_to_max (data):
    maximum=max(data)
    print(maximum)
    ratio=data/maximum
    return ratio

# Use the built-in apply function in Python to calculate the ratio-to-maximum for all columns
output_df=iris.iloc[:,0:4].apply(ratio_to_max, axis=0)


Anonymous Functions in Python

Python provides an easy alternative to external functions like the one used above. This method is called an anonymous or “lambda” function. A lambda is a tool to conduct a specific task on a data object, similar to a regular function; however, it can be defined within other functions and doesn’t need to be assigned a name. Therefore, in many cases, lambdas offer a cleaner and more efficient alternative to regular functions. A history of the lambda function can be found in this post (here), which also provides a comprehensive list of lambda’s functionalities. Here is an example of the lambda function used instead of the regular function defined before:

# The iris data set is available in the seaborn package in python
import seaborn as sns
import pandas

# The following script loads the iris data set into a data frame
iris = sns.load_dataset('iris')

# Here we use lambda to create an anonymous function and use that within panda's apply function 
output_df=iris.iloc[:,0:4].apply(lambda x:x/max(x), axis=0)

Note that, although R does not have a tool like lambda, it does provide a way of defining anonymous functions such as the one defined within the apply function. Also, there are other widely used Python built-in functions which work nicely with lambdas. For example, the map, filter, and reduce functions can take advantage of lambda’s simplicity in complex data mining tasks. You can refer to here and here for more information about these functions.

4 thoughts on “Apply Functions in R and Python

  1. Note that, although R does not have a tool like lambda, it does provide a way of defining anonymous functions such as the one defined within the apply function.

    Hey. That’s just wrong. Your example contradicts the statement.

    • Thanks for the comment. What I meant by that statement was that R can also handle anonymous functions [lambda calculus], as you mentioned, but it is not called “lambda” in R.

  2. Hello, Keyvan Malek. Great post! Have you looked at the purrr package? It provides a myriad of apply-like functions but in a framework that belongs to the Tidyverse universe in R. I myself used apply-family functions for a long time but once I made the transition to “map and its sisters”, I haven’t gone back. In case you don’t know it, you might want to check it out sometime.

Leave a comment