Model based prediction
jemin lee
2015년 11월 23일
Basic Idea
Assume the data follow a probabilistic model Use Bayes’ theorem to identify optimal classifiers
Proc:
- Can take advantage of structure of the data
- May be computationally convenient
- Are reasonably accurate on real problems
Cons:
- Make additional assumptions about the data
- When the model is incorrect you may get reduced accruacy
Example: Iris Data
data(iris); library(ggplot2); library(caret)
## Loading required package: lattice
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
Create training and test sets
inTrain <- createDataPartition(y=iris$Species,
p=0.7, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
dim(training); dim(testing)
## [1] 105 5
## [1] 45 5
Build predictions linear discriminate analysis (lda)
modlda = train(Species ~ .,data=training,method="lda")
## Loading required package: MASS
modnb = train(Species ~ ., data=training,method="nb")
plda = predict(modlda,testing); pnb = predict(modnb,testing)
table(plda,pnb)
## pnb
## plda setosa versicolor virginica
## setosa 15 0 0
## versicolor 0 17 0
## virginica 0 1 12
Comparsion of results
see that just one value appears between the two classes appears to be not classified in the same way by the same way two algorithms but overall they perform very similarly.
equalPredictions = (plda==pnb)
qplot(Petal.Width,Sepal.Width,colour=equalPredictions,data=testing)
'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글
Certification and Comments (0) | 2015.12.04 |
---|---|
Week 04: Regularized Regression (0) | 2015.11.24 |
Week03: Boosting (0) | 2015.11.23 |
Week 03: Random Forests (0) | 2015.11.23 |
Week 03: Bagging (0) | 2015.11.19 |