MOOC/Practical Machine Learning (r programing)

Week03: Model based prediction

JAYNUX 2015. 11. 23. 22:44

Basic Idea

Assume the data follow a probabilistic model Use Bayes’ theorem to identify optimal classifiers

Proc:
- Can take advantage of structure of the data
- May be computationally convenient
- Are reasonably accurate on real problems

Cons:
- Make additional assumptions about the data
- When the model is incorrect you may get reduced accruacy

Example: Iris Data

data(iris); library(ggplot2); library(caret)
## Loading required package: lattice
names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"
table(iris$Species)
## 
##     setosa versicolor  virginica 
##         50         50         50

Create training and test sets

inTrain <- createDataPartition(y=iris$Species,
                              p=0.7, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
dim(training); dim(testing)
## [1] 105   5
## [1] 45  5

Build predictions linear discriminate analysis (lda)

modlda = train(Species ~ .,data=training,method="lda")
## Loading required package: MASS
modnb = train(Species ~ ., data=training,method="nb")
plda = predict(modlda,testing); pnb = predict(modnb,testing)
table(plda,pnb)
##             pnb
## plda         setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         17         0
##   virginica       0          1        12

Comparsion of results
see that just one value appears between the two classes appears to be not classified in the same way by the same way two algorithms but overall they perform very similarly.

equalPredictions = (plda==pnb)
qplot(Petal.Width,Sepal.Width,colour=equalPredictions,data=testing)