Linear Model in R

by EMCS LABS — on  , 

LM in R: understanding levels, factors, & coefficients

  • N차원의 linear 모델은 N개의 factor에 의해서만 가능하다.
  • level이 3개 이상이 되면 regression space의 수가 늘어날 뿐 차원이 증가하지는 않는다.
  • lm은 anova와 달리 lm의 모든 coefficient들에 대한 one-sample t-test를 통해서 p를 제공한다.
  • 그래서 anova보다 훨씬 풍부하고 정교한 정보를 주고 post-hoc조차 필요없게 된다.

sample data

grade Male Female all
1st 73.6 73.5 73.55
2nd 74.2 72.5 73.35
3rd 83.2 83.3 83.25

1 way ANOVA as single regression

lm(formula = Score ~ Grade, data = data)

Coefficients    
(Intercept) Grade2nd Grade3rd
73.55 -0.2 9.7

single regression
y = 73.55 - 0.2 x (1st:0, 2nd:1)
y = 73.55 + 9.7 x (1st:0, 2nd:1)

Coefficients Estimate Std. Error t value Pr(>|t|)
(Intercept) 73.5500 0.7214 101.957 < 2e-16 ***
Grade2nd -0.2000 1.0202 -0.196 0.845
Grade3rd 9.7000 1.0202 9.508 2.31e-13 ***
Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(>F)
Grade 2 1280.93 640.47 61.537 5.783e-15 ***
Residuals 57 593.25 10.41    

2 way ANOVA as multiple regression

lm(formula = Score ~ Grade*Gender, data = data)

Coefficients          
(Intercept) Grade2nd Grade3rd GenderM Grade2nd:GenderM Grade3rd:GenderM
73.5 0.7 9.7 0.1 -1.8 0

multiple regression
y = 73.5 + 0.7 x1 (1st:0, 2nd:1) + 0.1 x2 (F:0, M:1) - 1.8 x1 x2
y = 73.5 + 9.7 x1 (1st:0, 3rd:1) + 0.1 x2 (F:0, M:1) - 0 x1 x2

Coefficients Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.350e+01 1.035e+00 71.000 < 2e-16 ***
GenderM 1.000e-01 1.464e+00 0.068 0.946
Grade2nd 7.000e-01 1.464e+00 0.478 0.634
Grade3rd 9.700e+00 1.464e+00 6.626 1.67e-08 ***
GenderM:Grade2nd -1.800e+00 2.070e+00 -0.869 0.388
GenderM:Grade3rd 4.353e-14 2.070e+00 0.000 1.000
Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 3.75 3.75 0.3499 0.5566
Grade 2 1280.93 640.47 59.7636 2.05e-14 ***
Gender:Grade 2 10.80 5.40 0.5039 0.6070
Residuals 54 578.70 10.72    

Hosung Nam