When two variables, x and y have the linear relation, that is described y=ax+b using two constants a and b, you will do linear regression. To begin with, input the following data for now, and then plot them. The chart shows that x and y would have a linear relation.
x <- c(2,2,3,4,5,6,6,6,6,7,8,9) y <- c(8,7,7,5,8,5,4,3,4,3,2,1) xy <- data.frame(X=x,Y=y) plot(xy)
You can do regression analysis by commanding like following.
xy.lm <- lm(Y~X,data=xy)
summary Function Shows us the summary of regression analysis.
summary(xy.lm)
This function prints like following.
Call: lm(formula = Y ~ X, data = xy) Residuals: Min 1Q Median 3Q Max -1.12805 -0.46189 -0.16159 0.08994 2.93902 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.7256 0.8746 11.120 5.96e-07 *** X -0.9329 0.1522 -6.128 0.000112 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.126 on 10 degrees of freedom Multiple R-squared: 0.7897, Adjusted R-squared: 0.7687 F-statistic: 37.55 on 1 and 10 DF, p-value: 0.0001115
The list of Coefficients tells us that the regression equation is y = -0.9329 x + 9.7256. Next follows standard error(Std.Error), t-value and p-value. The symbols of *** means that regression went successfully. Important among those following the below is the Multiple R-squared and Adjusted R-squared. Among the changes in the dependent variable (y), it refers to the percentage that could be described in the estimation equation. This time, because the explanatory variable is one, you may ignore the Adjusted R-squared. it is important only when you use multiple explanatory variables in such as multiple regression analysis.
You can calculate the confidence intervals and prediction intervals of the regression line in the predict function. If you specify a "confidence" in the "interval" argument it calculates the confidence interval. If you specify the "prediction" in the "interval" argument it calculates prediction interval. The following example plots the 95% confidence interval and prediction interval for you have specified the value of 0.95 for the "level" argument.
You can collectively draw a graph using the matplot functions. Regression line is red, confidence interval blue solid line, prediction interval will be drawn in blue dotted line.
It is to be noted that the 95 percent confidence interval, if you repeat exactly 100 times the measurement and analysis in the same way, it is the range that contains the regression line at 95% probability. The 95% prediction interval, when another of the data were obtained new, it is the scope, such as that contained at 95% probability in the interval range.
newdata <- data.frame(X=seq(2,9,0.1)) xy.con <- predict(xy.lm,newdata,interval="confidence",level=0.95) #Confidence Intervals xy.pre <- predict(xy.lm,newdata,interval="prediction",level=0.95) #Prediction Intervals xy.con <- as.data.frame(xy.con) #Convert to data frame format xy.pre <- as.data.frame(xy.pre) plot(xy,xlim=c(2,9),ylim=c(0,10)) par(new=TRUE) #Declaration to draw overlap matplot( #Function to draw graphs at the same time x=newdata$X, y=cbind(xy.con$fit, xy.con$upr, xy.con$lwr, xy.pre$upr, xy.pre$lwr), xlim=c(2,9),ylim=c(0,10), axes=F, ylab="", xlab="", type="l", lwd=c(1.5, 1, 1, 1, 1), lty=c(1, 1, 1, 2, 2), col=c("red", "blue", "blue", "blue", "blue") )