Week 02: Predicting with Regression Multiple Covariates
- Caret package
- Data slicing
- Training options
- Plotting predictions
- Basic preprocessing
- Covariate creation
- Preprocessing with principal components analysis
- Predicting with Regression
- Predicting with Regression Multiple Covariates
Predicting with Regression Multiple Covariates
다루는 내용은 두가지로,
첫 번째는 mutiple covariates를 이용해서 regression model을 생성하는 방법을 다루며,
두 번째는 mutiple covariates들 중에서 무엇을 선택해야하는지를 알아내는 방법이다.
Example: Predicting Wage
# wage data
library(ISLR); library(ggplot2); library(caret);
data(Wage); Wage <- subset(Wage,select=-c(logwage))
summary(Wage)
# get training / test set
inTrain <- createDataPartition(y=Wage$wage,
p=0.7, list=FALSE)
training <- Wage[inTrain,]; testing <- Wage[-inTrain,]
dim(training); dim(testing)
# feature plot to know how those are related to each other
featurePlot(x=training[,c("age","education","jobclass")],
y = training$wage,
plot="pairs")
# plot age versus wage
qplot(age,wage,data=training)
# to figure out the reason to make outlier
# color point by jobclass, most of point are blue.
qplot(age,wage,colour=jobclass,data=training)
# color education
# also explaine, advanced degree is outlier
qplot(age,wage,colour=education,data=training)
# fit a linear model
modFit<- train(wage ~ age + jobclass + education,
method = "lm",data=training)
finMod <- modFit$finalModel
print(modFit)
plot(finMod,1,pch=19,cex=0.5,col="#00000010")
qplot(finMod$fitted,finMod$residuals,colour=race,data=training)
plot(finMod$residuals,pch=19)
# predicted versus truth in test set
pred <- predict(modFit, testing)
qplot(wage,pred,colour=year,data=testing)
# all covariates
modFitAll<- train(wage ~ .,data=training,method="lm")
pred <- predict(modFitAll, testing)
qplot(wage,pred,data=testing)
'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글
Week 03: Bagging (0) | 2015.11.19 |
---|---|
Week 03: Predicting with trees (1) | 2015.11.19 |
Week 02: Predicting with Regression (0) | 2015.11.16 |
Week 02: Covariate creation, Pre-processing with principal components analysis (0) | 2015.11.15 |
Week 02: Basic preprocessing, Covariate creation (0) | 2015.11.15 |