Basic Idea

Assume the data follow a probabilistic model Use Bayes’ theorem to identify optimal classifiers

Proc:
- Can take advantage of structure of the data
- May be computationally convenient
- Are reasonably accurate on real problems

Cons:
- Make additional assumptions about the data
- When the model is incorrect you may get reduced accruacy

Example: Iris Data

data(iris); library(ggplot2); library(caret)
## Loading required package: lattice
names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"
table(iris$Species)
## 
##     setosa versicolor  virginica 
##         50         50         50

Create training and test sets

inTrain <- createDataPartition(y=iris$Species,
                              p=0.7, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
dim(training); dim(testing)
## [1] 105   5
## [1] 45  5

Build predictions linear discriminate analysis (lda)

modlda = train(Species ~ .,data=training,method="lda")
## Loading required package: MASS
modnb = train(Species ~ ., data=training,method="nb")
plda = predict(modlda,testing); pnb = predict(modnb,testing)
table(plda,pnb)
##             pnb
## plda         setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         17         0
##   virginica       0          1        12

Comparsion of results
see that just one value appears between the two classes appears to be not classified in the same way by the same way two algorithms but overall they perform very similarly.

equalPredictions = (plda==pnb)
qplot(Petal.Width,Sepal.Width,colour=equalPredictions,data=testing)

'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글

Certification and Comments  (0) 2015.12.04
Week 04: Regularized Regression  (0) 2015.11.24
Week03: Boosting  (0) 2015.11.23
Week 03: Random Forests  (0) 2015.11.23
Week 03: Bagging  (0) 2015.11.19

+ Recent posts