Multivariate Adaptive Regression Splines (MARS) in Python

Multivariate Adaptive Regression Splines, or MARS, is an algorithm for complex non-linear regression problems.

The algorithm involves finding a set of simple linear functions that in aggregate result in the best predictive performance. In this way, MARS is a type of ensemble of simple linear functions and can achieve good performance on challenging regression problems with many input variables and complex non-linear relationships.

In this tutorial, you will discover how to develop Multivariate Adaptive Regression Spline models in Python.

After completing this tutorial, you will know:

  • The MARS algorithm for multivariate non-linear regression predictive modeling problems.
  • How to use the py-earth API to develop MARS models compatible with scikit-learn.
  • How to evaluate and make predictions with MARS models on regression predictive modeling problems.

Kick-start your project with my new book Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Multivariate Adaptive Regression Splines (MARS) in Python

Multivariate Adaptive Regression Splines (MARS) in Python
Photo by Sei F, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Multivariate Adaptive Regression Splines
  2. MARS Python API
  3. MARS Worked Example for Regression

Multivariate Adaptive Regression Splines

Multivariate Adaptive Regression Splines, or MARS for short, is an algorithm designed for multivariate non-linear regression problems.

Regression problems are those where a model must predict a numerical value. Multivariate means that there are more than one (often tens) of input variables, and nonlinear means that the relationship between the input variables and the target variable is not linear, meaning cannot be described using a straight line (e.g. it is curved or bent).

MARS is an adaptive procedure for regression, and is well suited for high-dimensional problems (i.e., a large number of inputs). It can be viewed as a generalization of stepwise linear regression …

— Page 321, The Elements of Statistical Learning, 2016.

The MARS algorithm involves discovering a set of simple piecewise linear functions that characterize the data and using them in aggregate to make a prediction. In a sense, the model is an ensemble of linear functions.

A piecewise linear function is a function composed of smaller functions. In this case, it is a function that either outputs 0 or the input value directly.

A “right function” of one input variable involves selecting a specific value for the variable and outputting a 0 for all values below the value and outputting the value as-is for all values above the chosen value.

  • f(x) = x if x > value, else 0

Or the reverse, a “left function” can be used where values less than the chosen value are output directly and values larger than the chosen value output a zero.

  • f(x) = x if x < value, else 0

This is called a hinge function, where the chosen value or split point is the “knot” of the function. It is also called a rectified linear function in neural networks.

The functions are also referred to as “splines,” hence the name of the algorithm.

Each function is piecewise linear, with a knot at the value t. In the terminology of […], these are linear splines.

— Page 322, The Elements of Statistical Learning, 2016.

The MARS algorithm generates many of these functions, called basis functions for one or more input variables.

A linear regression model is then learned from the output of each of these basis functions with the target variable. This means that the output of each basis function is weighted by a coefficient. A prediction is made by summing the weighted output of all of the basis functions in the model.

Key to the MARS algorithm is how the basis functions are chosen. This involves two steps: the growing or generation phase called the forward-stage and the pruning or refining stage called the backward-stage.

  • Forward Stage: Generate candidate basis functions for the model.
  • Backward Stage: Delete basis functions from the model.

The forward stage involves generating basis functions and adding to the model. Like a decision tree, each value for each input variable in the training dataset is considered as a candidate for a basis function.

How was the cut point determined? Each data point for each predictor is evaluated as a candidate cut point by creating a linear regression model with the candidate features, and the corresponding model error is calculated.

— Page 146, Applied Predictive Modeling, 2013.

Functions are always added in pairs, for the left and right version of the piecewise linear function of the same split point. A generated pair of functions is only added to the model if it reduces the error made by the overall model.

The backward stage involves selecting functions to delete from the model, one at a time. A function is only removed from the model if it results in no impact in performance (neutral) or a lift in predictive performance.

Once the full set of features has been created, the algorithm sequentially removes individual features that do not contribute significantly to the model equation. This “pruning” procedure assesses each predictor variable and estimates how much the error rate was decreased by including it in the model.

— Page 148, Applied Predictive Modeling, 2013.

The change in model performance in the backward stage is evaluated using cross-validation of the training dataset, referred to as generalized cross-validation or GCV for short. As such, the effect of each piecewise linear model on the model’s performance can be estimated.

The number of functions used by the model is determined automatically, as the pruning process will halt when no further improvements can be made.

The only two key hyperparameters to consider are the total number of candidate functions to generate, often set to a very large number, and the degree of the functions to generate.

… there are two tuning parameters associated with the MARS model: the degree of the features that are added to the model and the number of retained terms. The latter parameter can be automatically determined us- ing the default pruning procedure (using GCV), set by the user or determined using an external resampling technique.

— Page 149, Applied Predictive Modeling, 2013.

The degree is the number of input variables considered by each piecewise linear function. By default, this is set to one, but can be set to larger values to allow complex interactions between input variables to be captured by the model. The degree is often kept small to limit the computational complexity of the model (memory and execution time).

A benefit of the MARS algorithm is that it only uses input variables that lift the performance of the model. Much like the bagging and random forest ensemble algorithms, MARS achieves an automatic type of feature selection.

… the model automatically conducts feature selection; the model equation is independent of predictor variables that are not involved with any of the final model features. This point cannot be underrated.

— Page 149, Applied Predictive Modeling, 2013.

Now that we are familiar with the MARS algorithm, let’s look at how we might develop MARS models in Python.

Want to Get Started With Ensemble Learning?

Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

MARS Python API

The MARS algorithm is not provided in the scikit-learn library; instead, a third-party library must be used.

MARS is provided by the py-earth Python library.

Earth” is a play on “Mars” (the planet) and is also the name of the package in R that provides the MARS algorithm.

The py-earth Python package is a Python implementation of MARS named for the R version and provides full comparability with the scikit-learn machine learning library.

The first step is to install the py-earth library. I recommend using the pip package manager, using the following command from the command line:

Once installed, we can load the library and print the version in a Python script to confirm it was installed correctly.

Running the script will load the py-earth library and print the library version number.

Your version number should be the same or higher.

A MARS model can be created with default model hyperparameters by creating an instance of the Earth class.

Once created, the model can be fit on training data directly.

By default, you probably do not need to set any of the algorithm hyperparameters.

The algorithm automatically discovers the number and type of basis functions to use.

The maximum number of basis functions is configured by the “max_terms” argument and is set to a large number proportional to the number of input variables and capped at a maximum of 400.

The degree of the piecewise linear functions, i.e. the number of input variables considered in each basis function, is controlled by the “max_degree” argument and defaults to 1.

Once fit, the model can be used to make a prediction on new data.

A summary of the fit model can be created by calling the summary() function.

The summary returns a list of the basis functions used in the model and the estimated performance of the model estimated by generalized cross-validation (GCV) on the training dataset.

An example of a summary output is provided below where we can see that the model has 19 basis functions and an estimated MSE of about 25.

Now that we are familiar with developing MARS models with the py-earth API, let’s look at a worked example.

MARS Worked Example for Regression

In this section, we will look at a worked example of evaluating and using a MARS model for a regression predictive modeling problem.

First, we must define a regression dataset.

We will use the make_regression() function to create a synthetic regression problem with 20 features (columns) and 10,000 examples (rows). The example below creates and summarizes the shape of the synthetic dataset.

Running the example creates the dataset and summarizes the number of rows and columns, matching our expectations.

Next, we can evaluate a MARS model on the dataset.

We will define the model using the default hyperparameters.

We will evaluate the model using repeated k-fold cross-validation, which is a good practice when evaluating regression models in general.

In this case, we will use three repeats and 10 folds.

We will evaluate model performance using mean absolute error, or MAE for short.

The scikit-learn API will make the MAE score negative so that it can be maximized, meaning scores will range from negative infinity (worst) to 0 (best).

Finally, we will report the performance of the model as the mean MAE score across all repeats and cross-validation folds.

Tying this together, the complete example of evaluating a MARS model on a regression dataset is listed below.

Running the example evaluates the performance of the MARS model and reports the mean and standard deviation of the MAE score.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

In this case, we can see that the MARS algorithm achieved a mean MAE of about 4.0 (ignoring the sign) on the synthetic regression dataset.

We may want to use MARS as our final model and use it to make predictions on new data.

This requires first defining and fitting the model on all available data.

We can then call the predict() function and pass in new input data in order to make predictions.

The complete example of fitting a MARS final model and making a prediction on a single row of new data is listed below.

Running the example fits the MARS model on all available data, then makes a single regression prediction.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Papers

Books

APIs

Articles

Summary

In this tutorial, you discovered how to develop Multivariate Adaptive Regression Spline models in Python.

Specifically, you learned:

  • The MARS algorithm for multivariate non-linear regression predictive modeling problems.
  • How to use the py-earth API to develop MARS models compatible with scikit-learn.
  • How to evaluate and make predictions with MARS models on regression predictive modeling problems.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects


See What's Inside

38 Responses to Multivariate Adaptive Regression Splines (MARS) in Python

  1. Avatar
    Anthony The Koala November 13, 2020 at 8:16 pm #

    Dear Dr Jason,
    Pipping as in

    WIll only work for Python version 3.6, https://pypi.org/project/sklearn-contrib-py-earth/#files

    However if your Python version is > 3.6, you can download a whl version from this site, https://www.lfd.uci.edu/~gohlke/pythonlibs/ .

    Search the page by ctrl+F sklearn_contrib_py_earth – note underscore _ NOT -.
    Choose either 32-bit or 64-bit python for Python versions 3.7, 3.8 and 3.9.
    Download the appropriate whl file to a folder in your computer

    Install – example for Python v3.8 64-bit

    Test:

    In sum, if your Python version is > 3.6, then go to https://www.lfd.uci.edu/~gohlke/pythonlibs/. then search within browser page = CTRL+F sklearn_contrib_py_earth and select particular version of python, 32-bit or 64-bit version for the particular python version. Eg for python v3.8 there must be a 38 in the whl.
    Download the appropriate whl, then pip install particular wheel.

    Thank you,
    Anthony of Sydney

    • Avatar
      Jason Brownlee November 14, 2020 at 6:32 am #

      Thanks for sharing.

      I recommend Python 3.6 for machine learning and deep learning at the time of writing.

      • Avatar
        Anthony The Koala November 14, 2020 at 11:42 am #

        Dear Dr Jason,
        I have a question please on the Earth() model.

        I did ‘light’ reading on the topic and it talks about knots and splines. That is where there is a disjunction along the x axis, you construct a knot/spline.

        Aren’t splines and knots something that actuarial students/graduates use when modelling disjointed x data?

        Thank you,
        Anthony of Sydney

        • Avatar
          Jason Brownlee November 15, 2020 at 6:24 am #

          Sorry, I don’t know about what actuarial students study.

          • Avatar
            Anthony The Koala November 15, 2020 at 12:32 pm #

            Dear Dr Jason,
            I wasn’t expecting to you to know what actuarial students study. I am aware from my undergraduate studies, studying about splines even though I never studied actuarial studies.

            This site https://www.acted.co.uk/forums/index.php?threads/splines-in-emblem.8885/ answers the question. that a spline curve “…is a set of curves which join on to each other to produce a single, more complex curve….” such that one can “…model complex curves using fairly simple functions and model them to an arbitrary level of complexity.

            This document shows graphs of non-linear functions and the interpolation methods joining the ‘disjointed’ segments http://www.ae.metu.edu.tr/~ae464/splines.pdf.

            As a result, I may have a better understanding of the Earth() algorithm used in this page.

            Thank you,
            Anthony of Sydney

          • Avatar
            Jason Brownlee November 15, 2020 at 12:57 pm #

            Thanks for sharing the links.

          • Avatar
            Anthony The Koala November 17, 2020 at 3:42 pm #

            Dear Dr Jason,
            I printed the summary for the model:

            From the summary:
            R^2 RSQ = 0.9997,
            That is near-perfect. Isn’t the R^2 a measure of the accuracy?
            Thank you,
            Anthony of Sydney

          • Avatar
            Jason Brownlee November 18, 2020 at 6:37 am #

            R^2 is a goodness of fit metric:
            https://en.wikipedia.org/wiki/Coefficient_of_determination

          • Avatar
            Anthony The Koala November 17, 2020 at 3:45 pm #

            Dear Dr Jason,
            When I printed the summary(), I expected it to be in tabular form. You have to move the horizontal slide to see R^2 at the extreme right.
            Thank you,
            Anthony of Sydney

          • Avatar
            Jason Brownlee November 18, 2020 at 6:38 am #

            Try printing the summary to the console so the new line characters (\n) can be interpreted correctly.

          • Avatar
            Anthony The Koala November 18, 2020 at 2:05 pm #

            Dear Dr Jason,
            i understand that R^2 explains the variance and 1-R^2 is the unexplained variance. The sqrt(R^2) = |R| the magnitude of the correlation without the + or – direction.

            The R^2 in the above model was 0.99997. The goodness of fit is very close to 1.

            My question is can you have a very high goodness of fit and low accuracy. For example is the Mean Absolute Error (MAE) is the average of the difference between the original values and the predicted problems.

            Put it in other ways, can R^2 be high and MAE be high or low?

            Put it it another way, if R^2 is high like 0.9997, why worry about MAE?

            Thank you

            Anthony of Sydney

          • Avatar
            Jason Brownlee November 19, 2020 at 7:40 am #

            Great question. My intuition says that R^2 and error metrics would be highly correlated. E.g. no, you cannot have one score showing good performance and another showing poor performance. Nevertheless, it would be good to confirm this intuition with some contrived cases (a great blog post idea!)

  2. Avatar
    Matthias November 14, 2020 at 12:05 am #

    Hello Jason
    Thanks for the interesting instructions!
    Unfortunately the installation of ‘pip install sklearn-contrib-py-earth’ failed with incomprehensible error messages. I couldn’t really see the reason. But maybe this is only the case with me?

    Best Regards

    Matthias

    • Avatar
      Jason Brownlee November 14, 2020 at 6:33 am #

      Try this:

  3. Avatar
    Ravichandran Srinivasan November 16, 2020 at 7:53 pm #

    Dear Jason

    My python version is 3.6.10

    I have this code:

    # check pyearth version
    import pyearth
    # display version
    print(pyearth.__version__)

    # define the model
    model = Earth()

    when I run the same I obtain:

    import pyearth
    # display version
    print(pyearth.__version__)

    # define the model
    model = Earth()
    0.1.0
    Traceback (most recent call last):

    File “”, line 6, in
    model = Earth()

    NameError: name ‘Earth’ is not defined

    My test of pyearth shows

    dir(pyearth)
    Out[12]:
    [‘Earth’,
    ‘__builtins__’,
    ‘__cached__’,
    ‘__doc__’,
    ‘__file__’,
    ‘__loader__’,
    ‘__name__’,
    ‘__package__’,
    ‘__path__’,
    ‘__spec__’,
    ‘__version__’,
    ‘_basis’,
    ‘_forward’,
    ‘_knot_search’,
    ‘_pruning’,
    ‘_qr’,
    ‘_record’,
    ‘_types’,
    ‘_util’,
    ‘_version’,
    ‘earth’]

    Where am I going wrong?

    • Avatar
      Jason Brownlee November 17, 2020 at 6:27 am #

      Looks like you did not import the “Earth” class.

      Perhaps confirm you copied the code exactly from the tutorial.

  4. Avatar
    Ravichandran Srinivasan November 16, 2020 at 8:03 pm #

    Please ignore the above. The problem got resolved when I ran your final code

    Thanks

  5. Avatar
    KV Subbaiah Setty November 17, 2020 at 1:36 pm #

    Hello Jason,
    Very clear and nice tutorial on MARS/EARTH (Multivariate Adaptive Regression Splines/ Enhanced Adaptive Regression Through Hinges), Thanks for the tutorial.

    But a few things that bother me are:

    1. When do we prefer MARS compared to other non-linear ensembles like Random forest, GBM, XGBoost, etc.

    2.What are the pros and cons of MARS.

    3. Why modern Books, Blogs, Articles, and Tutorials on ML don’t discuss much compared to other algorithms like, say XGboost.?

    • Avatar
      Jason Brownlee November 18, 2020 at 6:37 am #

      Thanks.

      Use MARS when it performs better than other algorithms that you have evaluated. There’s no better rule of thumb.

      MARS is more complex than some algorithms and in turn may be slower to train.

      Not many people know about MARS, perhaps that is why they don’t write about it.

  6. Avatar
    Rui N/A Liu November 23, 2020 at 3:24 pm #

    Hi, Jason. Thanks for sharing this. That is helpful. I want to try MultioutputRegressor in this MARS model. But I think it is not callable, Do you know how to do multi outputs using MARS?

    Thanks

    • Avatar
      Jason Brownlee November 24, 2020 at 6:17 am #

      You’re welcome.

      I’m not sure off the cuff, perhaps try it and see?

  7. Avatar
    bryan December 21, 2020 at 4:08 am #

    Hi Jason, just ask, can it be used to predict multi-steps or multi days ahead just like in ARIMA or Prophet?

    • Avatar
      Jason Brownlee December 21, 2020 at 6:41 am #

      It could be used for time series forecasting, but it was designed for regression more generally.

  8. Avatar
    Amir February 25, 2021 at 4:09 am #

    Hi Jason, thanks for your interesting tutorial.
    Is it possible to perform regression via MARS and see the regression scheme (model)?
    Regards,

    • Avatar
      Jason Brownlee February 25, 2021 at 5:37 am #

      Sure. Perhaps you can interrogate the fit MARS object and retrieve the model details.

  9. Avatar
    Amir February 25, 2021 at 6:15 am #

    Thanks for your reply!

  10. Avatar
    Fabio April 9, 2021 at 9:33 pm #

    Hi Jason, I like your blog.
    I would just mention that interpetability is a major advantange of regression splines.
    On the other hand, as also Kuhn mentions in his great book, they can be unstable (especially with higher degree values). This is also my experience, compared to other non-linear frameworks.
    Just thought it would help put the cherry on the cake of your excellent post.

  11. Avatar
    Steven May 4, 2021 at 7:32 am #

    Hi Jason, thank you for the blog. I have a question about the estimates. Since they are linear regressions separated by knots, wouldn’t that suggests that the estimate for the variable (age – 50) being 20 means that the average change in the outcome is 20 for every unit increase in age after 50?On the other hand, since this is a non-parametric method, shouldn’t the estimates be medians, or does “non-parametric” in this case just suggest that we need to us GCV to determine model fit?

    • Avatar
      Jason Brownlee May 5, 2021 at 6:03 am #

      You’re welcome.

      I would guess the kinks in the response function make the response non-linear. E.g. hard to analyze like a linear regression.

  12. Avatar
    Steven May 4, 2021 at 9:23 am #

    Also, the output of my hinge functions from the MARS model don’t align with the output of the partial dependence plots.

    For example I get (30 – variable); estimate: – 0.7, when the variable before 30 is actually increasing with the outcome in the partial dependence plot. I am not sure why the estimate is negative but the visualization is positive.

  13. Avatar
    Robert Granat October 27, 2021 at 10:18 pm #

    Dear Dr Brownlee, is it possible to use pyearth for modelling with Y being a discrete variable, i.e., for solving classification problems?

    • Adrian Tam
      Adrian Tam October 28, 2021 at 3:30 am #

      It should give you something but the model is not designed for truly discrete variables. Hence the result may not be good.

  14. Avatar
    Dev November 10, 2021 at 8:33 pm #

    Hi.
    Thank you for the valuable lesson. I have a small question. If I used logged variable (log of COVID-19 case data) as the dependent variable and some other non-logged variable as an independent variable, can I use MARS on these? Are there any restrictions on using MARS on log-transformed variables?
    Thanks

    • Adrian Tam
      Adrian Tam November 14, 2021 at 12:45 pm #

      MARS is giving you piecewise linear functions. Hence for a mix of log and non-log variables, you are building something like “a log(x1) + b x2”. It makes sense sometimes, for example, considering radioactive decay, it is the logged variable in linear.

  15. Avatar
    Toby April 24, 2022 at 8:28 pm #

    Hi there sir, How do you use the hyperparameters determined by the cross validation procedure on a training dataset to then evaluate model performance on a test set? thanks!

Leave a Reply