Week03: Model based prediction

JAYNUX 2015. 11. 23. 22:44

2015. 11. 23. 22:44

Basic Idea

Assume the data follow a probabilistic model Use Bayes’ theorem to identify optimal classifiers

Proc:
- Can take advantage of structure of the data
- May be computationally convenient
- Are reasonably accurate on real problems

Cons:
- Make additional assumptions about the data
- When the model is incorrect you may get reduced accruacy

Example: Iris Data

data(iris); library(ggplot2); library(caret)

## Loading required package: lattice

names(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"

table(iris$Species)

## 
##     setosa versicolor  virginica 
##         50         50         50

Create training and test sets

inTrain <- createDataPartition(y=iris$Species,
                              p=0.7, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
dim(training); dim(testing)

## [1] 105   5

## [1] 45  5

Build predictions linear discriminate analysis (lda)

modlda = train(Species ~ .,data=training,method="lda")

## Loading required package: MASS

modnb = train(Species ~ ., data=training,method="nb")

plda = predict(modlda,testing); pnb = predict(modnb,testing)
table(plda,pnb)

##             pnb
## plda         setosa versicolor virginica
##   setosa         15          0         0
##   versicolor      0         17         0
##   virginica       0          1        12

Comparsion of results
see that just one value appears between the two classes appears to be not classified in the same way by the same way two algorithms but overall they perform very similarly.

equalPredictions = (plda==pnb)
qplot(Petal.Width,Sepal.Width,colour=equalPredictions,data=testing)

저작자표시

'MOOC > Practical Machine Learning (r programing)' 카테고리의 다른 글

Certification and Comments (0)	2015.12.04
Week 04: Regularized Regression (0)	2015.11.24
Week03: Boosting (0)	2015.11.23
Week 03: Random Forests (0)	2015.11.23
Week 03: Bagging (0)	2015.11.19

GOOD to GREAT