In this page, you can create a virtual data using a random number. Random number in accordance with the normal distribution can be generated in rnorm function. rnorm function generates the random numbers of the average of the second argument. Also, you should specify a variance by giving the third argument.
x1 <- rnorm(100, 10, 1) x2 <- rnorm(100, 10, 2) x3 <- rnorm(100, 10, 3) y <- x1 + 2 * x2 + 3 * x3 + rnorm(100, 0, 1)
We used lm function for single explanatory regression analysis, however, we can use the same function for multiple regression analysis.
result <- lm(y ~ x1 + x2 + x3)
Check the result.
summary(result)
The summary function shows us the summary of regression analysis. It prints like following.
Call: lm(formula = y ~ x1 + x2 + x3) Residuals: Min 1Q Median 3Q Max -1.96202 -0.62604 -0.01741 0.73180 2.64501 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.80572 1.21700 0.662 0.51 x1 0.85806 0.09992 8.588 1.6e-13 *** x2 2.03232 0.04711 43.140 < 2e-16 *** x3 3.04166 0.03769 80.698 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9662 on 96 degrees of freedom Multiple R-squared: 0.9887, Adjusted R-squared: 0.9883 F-statistic: 2798 on 3 and 96 DF, p-value: < 2.2e-16
The summary shows that the regression equation of y = 0.80572 * x1 + 2.03232 * x2 + 3.04166 * x3 + 0.80572. For using the random number, it is not guaranteed for you to get the same result. Significant probability is very small for each coefficient and it means they have the 0.1% level of significance. Intercept does not have small significance, but this is usually ignored because it means the only hardness to deny that this is not 0.
This time you should see the Adjusted R-squared for the explanatory variable is plural. When the explanatory variable increases in such multiple regression analysis, even meaningless variable can raise the Multiple R-squared value. It is important to use the Adjusted R-squared in order to compensate for this drawback.
If a meaningless explanatory variable is included, you can notice it at the output of the summary function. First, generate a random number of as x4 that is independent to y. Then do multiple regression analysis again with the explanatory variables x1~x4.
x4 <- rnorm(100, 10, 4) result2 <- lm(y ~ x1 + x2 + x3 + x4) summary(result2)
summary function shows us the summary of regression analysis. It prints like following.
Call: lm(formula = y ~ x1 + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max -1.95356 -0.61666 -0.03673 0.74286 2.65140 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.780315 1.227547 0.636 0.527 x1 0.856324 0.100668 8.506 2.55e-13 *** x2 2.031220 0.047563 42.706 < 2e-16 *** x3 3.041296 0.037908 80.228 < 2e-16 *** x4 0.005898 0.024471 0.241 0.810 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.971 on 95 degrees of freedom Multiple R-squared: 0.9887, Adjusted R-squared: 0.9882 F-statistic: 2078 on 4 and 95 DF, p-value: < 2.2e-16
Coefficient of x4 shown to be not significant. In other words, it can not be denied that the coefficient of x4 is not zero.
If there may be irrelevant in the explanatory variable, stepwise method, the way of selecting the variables that are deemed appropriate one by one. It can be done by R automatically.
result3 <- lm(y~1) step(result3, direction="both", scope=list(upper=~x1+x2+x3+x4))
It prints like following.
Start: AIC=417.69 y ~ 1 Df Sum of Sq RSS AIC + x3 1 9548.9 1946.6 300.87 + x2 1 1748.3 9747.2 461.9611495.5 476.45 + x4 1 2.8 11492.7 478.43 + x1 1 0.8 11494.7 478.45 Step: AIC=300.87 y ~ x3 Df Sum of Sq RSS AIC + x2 1 1755.6 191.0 70.73 + x1 1 113.9 1832.7 296.84 1946.6 300.87 + x4 1 10.2 1936.4 302.34 - x3 1 9548.9 11495.5 476.45 Step: AIC=70.73 y ~ x3 + x2 Df Sum of Sq RSS AIC + x1 1 125.6 65.4 -34.43 + x4 1 3.9 187.1 70.66 191.0 70.73 - x2 1 1755.6 1946.6 300.87 - x3 1 9556.2 9747.2 461.96 Step: AIC=-34.43 y ~ x3 + x2 + x1 Df Sum of Sq RSS AIC 65.4 -34.43 + x4 1 0.5 65.0 -33.15 - x1 1 125.6 191.0 70.73 - x2 1 1767.2 1832.7 296.84 - x3 1 9679.7 9745.1 463.94 Call: lm(formula = y ~ x3 + x2 + x1) Coefficients: (Intercept) x3 x2 x1 0.8057 3.0416 2.0323 0.8580
You will see that finally x4 are cut.