Week 02: Predicting with Regression Multiple Covariates



  • Caret package
  • Data slicing
  • Training options
  • Plotting predictions
  • Basic preprocessing
  • Covariate creation
  • Preprocessing with principal components analysis
  • Predicting with Regression
  • Predicting with Regression Multiple Covariates



Predicting with Regression Multiple Covariates


다루는 내용은 두가지로,

첫 번째는 mutiple covariates를 이용해서 regression model을 생성하는 방법을 다루며,

두 번째는 mutiple covariates들 중에서 무엇을 선택해야하는지를 알아내는 방법이다.



Example: Predicting Wage



# wage data
library(ISLR); library(ggplot2); library(caret);
data(Wage); Wage <- subset(Wage,select=-c(logwage))
summary(Wage)

# get training / test set
inTrain <- createDataPartition(y=Wage$wage,
                               p=0.7, list=FALSE)
training <- Wage[inTrain,]; testing <- Wage[-inTrain,]
dim(training); dim(testing)

# feature plot to know how those are related to each other
featurePlot(x=training[,c("age","education","jobclass")],
            y = training$wage,
            plot="pairs")

# plot age versus wage
qplot(age,wage,data=training)

# to figure out the reason to make outlier
# color point by jobclass, most of point are blue.
qplot(age,wage,colour=jobclass,data=training)

# color education
# also explaine, advanced degree is outlier
qplot(age,wage,colour=education,data=training)

# fit a linear model
modFit<- train(wage ~ age + jobclass + education,
               method = "lm",data=training)
finMod <- modFit$finalModel
print(modFit)

plot(finMod,1,pch=19,cex=0.5,col="#00000010")
qplot(finMod$fitted,finMod$residuals,colour=race,data=training)
plot(finMod$residuals,pch=19)

# predicted versus truth in test set
pred <- predict(modFit, testing)
qplot(wage,pred,colour=year,data=testing)

# all covariates
modFitAll<- train(wage ~ .,data=training,method="lm")
pred <- predict(modFitAll, testing)
qplot(wage,pred,data=testing)








+ Recent posts