Boosting
jemin lee
2015년 11월 23일
Boosting is one of the bset box classifier
Basic Idea
Take lots of (poosibly) weak predictors
Weight them and add them up
Get a stronger predictor
Start with a set of classifiers h1,…,hk
Boosting in R
Boosting can be used with any subset of classifiers One large subclass is gradient boosting R has multiple boosting libraries. Differences include the choice of basic classification functions and combination rules. gbm - boosting with trees. mboost - model based boosting ada - statistical boosting based on additive logistic regression gamBoost for boosting generalized additive models Most of these are available in the caret package
Wage example
library(ISLR); data(Wage); library(ggplot2); library(caret);
## Loading required package: lattice
Wage <- subset(Wage,select=-c(logwage))
inTrain <- createDataPartition(y=Wage$wage,
p=0.7, list=FALSE)
training <- Wage[inTrain,]; testing <- Wage[-inTrain,]
#Fit the model
modFit <- train(wage ~., method="gbm",data=training,verbose=FALSE)
## Loading required package: gbm
## Loading required package: survival
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:caret':
##
## cluster
##
## Loading required package: splines
## Loading required package: parallel
## Loaded gbm 2.1.1
## Loading required package: plyr
print(modFit)
## Stochastic Gradient Boosting
##
## 2102 samples
## 10 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 2102, 2102, 2102, 2102, 2102, 2102, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared RMSE SD Rsquared SD
## 1 50 35.15759 0.3038776 1.530055 0.02487676
## 1 100 34.62096 0.3137957 1.439743 0.02232634
## 1 150 34.55009 0.3156775 1.394560 0.02166566
## 2 50 34.60094 0.3156275 1.455682 0.02485196
## 2 100 34.47184 0.3182034 1.374485 0.02300061
## 2 150 34.54167 0.3162264 1.387161 0.02253147
## 3 50 34.47797 0.3187507 1.440898 0.02512358
## 3 100 34.59459 0.3142979 1.374172 0.02372285
## 3 150 34.82958 0.3071383 1.363458 0.02276088
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 100,
## interaction.depth = 2, shrinkage = 0.1 and n.minobsinnode = 10.
qplot(predict(modFit,testing),wage,data=testing)
'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글
Week 04: Regularized Regression (0) | 2015.11.24 |
---|---|
Week03: Model based prediction (0) | 2015.11.23 |
Week 03: Random Forests (0) | 2015.11.23 |
Week 03: Bagging (0) | 2015.11.19 |
Week 03: Predicting with trees (1) | 2015.11.19 |