Week 02: Predicting with Regression Multiple Covariates

• Caret package
• Data slicing
• Training options
• Plotting predictions
• Basic preprocessing
• Covariate creation
• Preprocessing with principal components analysis
• Predicting with Regression
• Predicting with Regression Multiple Covariates

Predicting with Regression Multiple Covariates

다루는 내용은 두가지로,

첫 번째는 mutiple covariates를 이용해서 regression model을 생성하는 방법을 다루며,

두 번째는 mutiple covariates들 중에서 무엇을 선택해야하는지를 알아내는 방법이다.

Example: Predicting Wage

# wage data
library(ISLR); library(ggplot2); library(caret);
data(Wage); Wage <- subset(Wage,select=-c(logwage))
summary(Wage)

# get training / test set
inTrain <- createDataPartition(y=Wage$wage, p=0.7, list=FALSE) training <- Wage[inTrain,]; testing <- Wage[-inTrain,] dim(training); dim(testing) # feature plot to know how those are related to each other featurePlot(x=training[,c("age","education","jobclass")], y = training$wage,
plot="pairs")

# plot age versus wage
qplot(age,wage,data=training)

# to figure out the reason to make outlier
# color point by jobclass, most of point are blue.
qplot(age,wage,colour=jobclass,data=training)

# color education
# also explaine, advanced degree is outlier
qplot(age,wage,colour=education,data=training)

# fit a linear model
modFit<- train(wage ~ age + jobclass + education,
method = "lm",data=training)
finMod <- modFit$finalModel print(modFit) plot(finMod,1,pch=19,cex=0.5,col="#00000010") qplot(finMod$fitted,finMod$residuals,colour=race,data=training) plot(finMod$residuals,pch=19)

# predicted versus truth in test set
pred <- predict(modFit, testing)
qplot(wage,pred,colour=year,data=testing)

# all covariates
modFitAll<- train(wage ~ .,data=training,method="lm")
pred <- predict(modFitAll, testing)
qplot(wage,pred,data=testing)

#### 'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글

 Week 03: Bagging  (0) 2015.11.19 2015.11.19 2015.11.16 2015.11.16 2015.11.15 2015.11.15