Week 02: Predicting with Regression

JAYNUX 2015. 11. 16. 22:06

2015. 11. 16. 22:06

Week 02: Predicting with Regression

Caret package
Data slicing
Training options
Plotting predictions
Basic preprocessing
Covariate creation
Preprocessing with principal components analysis
Predicting with Regression
Predicting with Regression Multiple Covariates

Predicting with Regression

선형적이면 구현하기 쉽고 이해가 쉽다.

하지만 비선형이면 정확도가 너무 떨어진다.

Example: old faithful eruptions

# ppredicting with regression
library(caret);data(faithful); set.seed(333)
inTrain <- createDataPartition(y=faithful$waiting,
                               p=0.5, list=FALSE)
trainFaith <- faithful[inTrain,]; testFaith <- faithful[-inTrain,]
head(trainFaith)

  eruptions waiting
1     3.600      79
3     3.333      74
5     4.533      85
6     2.883      55
7     4.700      88
8     3.600      85

# showing graph
plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",xlab="Waiting",ylab="Duration")

데이터가 선형이므로 Linear Regression을 적용하면 좋다.

# making model
lm1 <- lm(eruptions ~ waiting,data=trainFaith)
summary(lm1)
Call:
lm(formula = eruptions ~ waiting, data = trainFaith)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.26990 -0.34789  0.03979  0.36589  1.05020 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.792739   0.227869  -7.867 1.04e-12 ***
waiting      0.073901   0.003148  23.474  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.495 on 135 degrees of freedom
Multiple R-squared:  0.8032,	Adjusted R-squared:  0.8018 
F-statistic:   551 on 1 and 135 DF,  p-value: < 2.2e-16

# model fit
plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",xlab="Waiting",ylab="Duration")
lines(trainFaith$waiting,lm1$fitted,lwd=3)

만들어진 모델을 확인한다.

그리고 값을 prediction을 해보면 아래와 같다.

# predict a new value
coef(lm1)[1] + coef(lm1)[2]*80
(Intercept) 
   4.119307 
newdata <- data.frame(waiting=80)
predict(lm1,newdata)
       1 
4.119307

테스트와 트레이이닝에 대해서 각각 그려보면 아래와 같다.

# plot predictions - training and test
par(mfrow=c(1,2))
plot(trainFaith$waiting,trainFaith$eruptions,pch=19,col="blue",xlab="Waiting",ylab="Duration")
lines(trainFaith$waiting,predict(lm1),lwd=3)
plot(testFaith$waiting,testFaith$eruptions,pch=19,col="blue",xlab="Waiting",ylab="Duration")
lines(testFaith$waiting,predict(lm1,newdata=testFaith),lwd=3)

각각의 모델의 에러를 계산해보면 아래와 같다. 계산 방식은 RMSE를 이용하며 트레이닝과 테스트에 대해서 각각 수행 했다.

> # Calculate RMSE on training
> sqrt(sum((lm1$fitted-trainFaith$eruptions)^2))
[1] 5.75186
> # Calculate RMSE on test
> sqrt(sum((predict(lm1,newdata=testFaith)-testFaith$eruptions)^2))
[1] 5.838559

이러한 구현을 caret을 이용해서 할경우 매우 쉽게 할 수 있다.

> #same process with caret
> modFit <- train(eruptions ~ waiting,data=trainFaith,method="lm")
> summary(modFit$finalModel)

Call:
lm(formula = .outcome ~ ., data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.26990 -0.34789  0.03979  0.36589  1.05020 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.792739   0.227869  -7.867 1.04e-12 ***
waiting      0.073901   0.003148  23.474  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.495 on 135 degrees of freedom
Multiple R-squared:  0.8032,	Adjusted R-squared:  0.8018 
F-statistic:   551 on 1 and 135 DF,  p-value: < 2.2e-16

저작자표시

'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글

Week 03: Predicting with trees (1)	2015.11.19
Week 02: Predicting with Regression Multiple Covariates (0)	2015.11.16
Week 02: Covariate creation, Pre-processing with principal components analysis (0)	2015.11.15
Week 02: Basic preprocessing, Covariate creation (0)	2015.11.15
Week 02: Training options, Plotting predictions (0)	2015.11.13

GOOD to GREAT

Week 02: Predicting with Regression

'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글

+ Recent posts

티스토리툴바