## Curvilinear Regression in R

In this post, we will learn about some basics of curvilinear regression in R.

The curvilinear regression analysis is used to determine if there is a non-linear trend exists between $X$ and $Y$.

Adding more parameters to an equation results in a better fit to the data. A quadratic and cubic equation will always have higher $R^2$ than the linear regression model. Similarly, a cubic equation will usually have higher $R^2$ than a quadratic one.

The logarithmic relationship can be described as follows:

$$Y=m\, log(x)++c$$

the polynomial relationship can be described as follows:

$$Y=m_1x + m_2x^2 + m_3x^3 + m_nx^n + c$$

The logarithmic example is more akin to a simple regression, whereas the polynomial example is multiple regression. Logarithmic relationships are common in the natural world; you may encounter them in many circumstances. Drawing the relationships between response and predictor variables as a scatter plot is generally a good starting point.

Consider the following data that are related in a curvilinear form,

Growth | Nutrient |
---|---|

2 | 2 |

9 | 4 |

11 | 6 |

12 | 8 |

13 | 10 |

14 | 16 |

17 | 22 |

19 | 28 |

17 | 30 |

18 | 36 |

20 | 48 |

Let us perform a curvilinear regression in R language.

Growth <- c(2, 9, 11, 12, 13, 14, 17, 19, 17, 18, 20) Nutrient <- c(2, 4, 6, 8, 10, 16, 22, 28, 30, 36, 48) data <- data <- as.data.frame(cbind(Growth, Nutrient)) ggplot(data, aes(Nutrient, Growth) ) + geom_point() + stat_smooth()

The Scatter plot shows the relationship appears to be a logarithmic one.

Let us carry out a linear regression using the `lm()`

function by taking the $\log$ of the predictor variable rather than the basic variable itself.

data <- cbind(Growth, Nutrient) mod <- lm(Growth~log(Nutrient, data)) summary(mod) ## Call: lm(formula = Growth ~ log(Nutrient), data = data) Residuals: Min 1Q Median 3Q Max -2.2274 -0.9039 0.5400 0.9344 1.3097 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6914 1.0596 0.652 0.53 log(Nutrient) 5.1014 0.3858 13.223 3.36e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.229 on 9 degrees of freedom Multiple R-squared: 0.951, Adjusted R-squared: 0.9456 F-statistic: 174.8 on 1 and 9 DF, p-value: 3.356e-07