Certificate & Comment

강의가 수료되고 약 2주가 지나면 certificate가 발급이 된다.

강의 자체보다는 숙제와 퀴즈를 통해서 학습되는 양이 더 많은것 같다.

Coursera의 많은 강의들이 OnDemand 형태로 제공 되어지므로

certificate를 얻을 수 있는 기회가 줄어들고 있지만,

돈을 내지 않는다면 완강을 하기란 힘들어 보인다.

총점수가 70/100 일경우 certificate가 발급되며

90/100일 경우 with distinction으로 특별한 문구가 삽입된다. #### 'MOOC > R Programming' 카테고리의 다른 글

[4 Week] Str & Simulation & R Profiler  (0) 2015.07.30 2015.07.24 2015.07.16 2015.07.08

[4 Week] Str & Simulation & R Profiler

str: Compactly display the internal structure of an R object

A diagnostic function and an alternative to 'summary'

It is especially well suited to compactly display the (abbreviated) contents of (possibly nested) lists.

Roughly one line per basic object

Simulation - Generating Random Numbers

Functions for probability distributions in R

rnorm: generate random Normal variates with a given mean and standard deviation

dnorm: evaluate the Normal probability density (with a given mean/SD) at a point (or vector of

points)

pnorm: evaluate the cumulative distribution function for a Normal distribution

rpois: generate random Poisson variates with a given rate

Probability distribution functions usually have four functions associated with them. The functions are

prefixed with a

• d for density
• r for random number generation
• p for cumulative distribution
• q for quantile function
• n the number of random variable
• lambda: vector of (non-negative) means.

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) # cumulative distribution function
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) # p^-1 이다.
rnorm(n, mean = 0, sd = 1)

dpois(x, lambda, log = FALSE)
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
rpois(n, lambda) # lambda는 평균이다. 대충 평균 저정도 값이 나오는 포아송 분포를 생성해 준다. seed 설정을 통해서 같은 값을 생성 할 수 있다. <generating Poisson data>

the Poisson distribution, the mean is going to be equal to the rate (rpois에서의 두번째 인자값이 rate이다).

아래것을 계산해보면 1짜리는 0.8이고 2짜리는 2.2 이다. 정확히 rate과 일치하는것은 아니지만, 최대한 비슷한 분포를 만들어 주려고 한다.

So you can see that roughly in each of these three cases, the mean is roughly equal to the rate that I specified (1,2,20). Simulation - Simulating a Linear Model

Suppose we want to simulate from the following linear model

$$y=\beta _{ 0 }+\beta _{ 1 }+\epsilon$$

where $\epsilon$ ~ $N(0,2^2)$. Assume $x$ ~ $N(0,1^2)$, $\beta_0=0.5$ and $\beta_1 = 2$  x를 만약 binary로 설정 한다면  Suppose we want to simulate from a Poisson model where

$$Y = Poisson(\mu)$$

$$\log (\mu) = \beta_0+\beta_{1}x$$

and $\beta_0 = 0.5$ and $\beta_1 = 0.3$. We need to use the rpois function for this  Simulation - Random Sampling

The sample function draws randomly from a specified set of (scalar) objects allowing you to sample from arbitrary distributions

Usage

sample(x, size, replace = FALSE, prob = NULL)

sample.int(n, size = n, replace = FALSE, prob = NULL)

Arguments

x

Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’

n

a positive number, the number of items to choose from. See ‘Details.’

size

a non-negative integer giving the number of items to choose.

replace

Should sampling be with replacement?

prob

A vector of probability weights for obtaining the elements of the vector being sampled.  요약

확류분포를 r* 종류의 함수로 나타낼 수 있다.

Normal, Poisson, Binomial, Exponential, Gamma, etc를 표준 분포로서 생성 할 수 있다.

Sample 함수는 arbitrary vector를 생성하는 수단으로 사용 할 수 있다.

seed를 통해서 같은 number를 재생성 할 수 있다.

R Pro filer

Assignment 03

병원에서 환자의 케어를 제대로 했는지 안했는지에 대한 데이터 셋이다. 총 4000개의 U.S. 병원의 정보가 담겨져 있다.

세개의 파일은 다음과 같다.

outcome-of-care-measures.csv: 30 day mortality and readmission rates for heart attacks, heart failure, and pneumonia for over 4,000 hospitals.

hospital-data.csv: 각각의 병원에 대한 정보를 담고 있다.

Hospital Revised Flatfiles.pdf: 각각의 파일에 대한 설명.

1) Plot the 30-day mortality rates for heart attack

outcome-of-care-measures.csv를 보면 많은 column들이 존재 한다.

얼마나 많은 column들이 존재하는지는 ncol을 통해서 알 수 있다.

얼마나 많은 row들이 있는지는 nrow를 통해서 알 수 있다.

각각의 column들의 이름을 알기 위해서는 names(outcome)을 실행 한다.

2) Finding the best hospital in a state

함수 이름은 best

두개의 arguments를 받음: 미국 주의 약어와 outcome name

리턴값은 가장 30일 시한부 인생이 적은것을 설정한 주에서 찾아낸다. 그 해당 병원의 이름이 있는 문자열 백터를 반환 한다.

best <- function(state, outcome) {

## Check that state and outcome are valid

## Return hospital name in that state with lowest 30-day death

## rate

}

먼저 입력된 주가 올바른지 부터 확인 한다.

3) Ranking hospitals by outcome in a state

rankospital은 세개의 아규먼트를 가져 온다.

첫번째: 주 이름

두번째: outcome

세번째: ranking of a hosptial in that state for that outcome (num).

예)

> rankhospital("MD", "heart failure", 5)

heart failure에 의해서 30일 사망이 5번째로 작은 병원의 이름을 포함하는 백터를 반환 한다.

Note that: 마지막 세번째 숫자 인자는 "best"와 "worst"값을 가져올 수 있다.

만약 마지막 숫자가 포함한 병원의 수보다 큰 값이면 "NA"를 반환 한다.

4) Ranking hospitals in all states

rankall이란 함수를 만든다. 인자는 두개이다.

아웃컴과 병원 랭킹

#### 'MOOC > R Programming' 카테고리의 다른 글

Certificate & Comment  (0) 2015.08.26 2015.07.24 2015.07.16 2015.07.08

[3 Week] Loop Functions & Debugging Tools

Loop Functions

lapply: Loop over a list and evaluate a function on each element

sapply: Same as lapply but try to simplify the result

apply: Apply a function over the margins of an array

tapply: Apply a function over subsets of a vector

mapply: Multivariate version of lapply

rnorm function (The normal Distribution)

Density, distribution function, quantile function and random generation for the normal distribution.

Loop Functions - lapply

lapply takes three arguments:

(1) a list x;

(2) a function (or the name of a function) FUN;

(3) other arguments via its ... argument.

If x is not a list, it will be coerced to a list using as.list.

lapply always returns a list, regardless of the class of the input.

> x <- list(a=1:5, b = rnorm(10))
> x
$a  1 2 3 4 5$b
  0.34766773  1.88039654 -0.29986269  1.88896873  0.07806339 -1.63535799
  1.12373391  0.66304757  0.64747795 -0.38855335

> lapply(x,mean)
$a  3$b
 0.4305582


> x <- list(a=1:4, b=rnorm(10), c=rnorm(20,1), d=rnorm(100,5))
> lapply(x,mean)
$a  2.5$b
 0.0315751

$c  1.193494$d
 4.999784


> x <- 1:4
> lappy(x,runif)
Error: could not find function "lappy"
> lapply(x, runif)
[]
 0.1516973

[]
 0.5303134 0.7188454

[]
 0.61570965 0.03625812 0.79371658

[]
 0.06210734 0.59349463 0.83711023 0.38416463


"runif" function은 random 변수를 생성하기 위한 함수 이다.

I want to generate a uniform between zero and ten.

for that, we are passing these arguments (min, max) through the dot dot dot argument.

So here, we are calling lapply using several arguments.

> x<-1:4
> lapply(x,runif,min=0,max=10)
[]
 1.384929

[]
 4.474732 3.952107

[]
 2.406658 5.489504 7.572002

[]
 5.534824 0.325385 4.289476 4.976774


An anonymous function for extracting the first column of each matrix.

> x<- list(a=matrix(1:4,2,2), b=matrix(1:6,3,2))
> x
$a [,1] [,2] [1,] 1 3 [2,] 2 4$b
[,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

> lapply(x, function(elt) elt[,1])
$a  1 2$b
 1 2 3


Loop Functions - apply

apply is used to a evaluate a function (often an anonymous one) over the margins of an array.

It is most often used to apply a function to the rows or columns of a matrix.

It can be used with general arrays, e.g. taking the average of an array of matrices.

It is not really faster than writing a loop, but it works in one line !

> str(apply)

function (X, MARGIN, FUN, ...)

• X is an array
• MARGIN is an integer vector indicating which margins should be “retained”.
• FUN is a function to be applied
• ... is for other arguments to be passed to FUN For sums and means of matrix dimensions, we have some shorcuts.

• rowSums = apply(x, 1, sum)
• rowMeans = apply(x, 1, mean)
• colSums = apply(x, 2, sum)
• colMeans = apply(x, 2, mean)

This shorcut functions are much faster, but you won't notice unless you're using a large matrix.

Quantiles of the rows of a matrix. Loop Functions - mapply

Loop Functions - tapply

Loop Functions - split

Split takes a vector or other objects and splits it into groups determined by a factor or list of factors.

< Splitting a Data Frame > lapply를 이용해서 각각을 처리한 것이다.

there are five columns. instead of using lapply, we can use sapply to simplify the result.

What we will do is put all these numbers into a matrix.

where the three rows and in this case 5 columns.

For each of the tree variables, in a much more compact format, it's a matrix, instead of a list.

Of course we still got NA's for a lot of them, because the missing values in the original data.

So on thing I knew is I was going to pass the na.rm argument to call.

here you can see the monthly means < Splitting on More than One Level > Debugging

상태 정보의 종류

• message: A generic notification/diagnostic message produced by the message function; execution of the function continues
• warning: An indication that something is wrong but not necessarily fatal; execution of the function continues; generated by the warning function.
• error: An indication that a fatal problem has occurred; execution stops; produced by the stop function
• condition: A generic concept for indicating that something unexpected can occur; programmers can create their own conditions

디버깅에 활용할 수 있는 도구들

• traceback: prints out the function call stack after an error occurs; does nothing if there's no error. • debug: flags a function for "debug" mode which allows you to step through execution of a function one line at a time.
• browser: suspends the execution of a function wherever it is called and puts the function in debug mode.  • trace: allows you to insert debugging code into a function a specific places.

• recover: allows you to modify the error behavior so that you can browse the function call stack.

Programming Assignment 2

특이한점은 peer Assessments를 이용해서 과제를 제출한 사람들끼리 서로 서로 평가하는 방식을 택한다.

나름 부정행위를 막으려는 취지인것 같다.

최소한 1명의 과제를 체점해야 한다. 그렇지 않으면 20%의 감점을 당하게 된다.

Introduction

이미 한번 연산된 결과를 재사용하는 방법을 배우는 실습이다.

Example: Caching the Mean of Vector

<<- operator는 전역을 위한 것이다.

makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y # 전역변수
m <<- NULL  #전역변수
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}


Assignment: Caching the inverse of a Matrix

아래와 같이 코드를 최종 작성하고 제출 했다.

# Overall, makeCacheMatrix() sustains cache data for resuing it.
# cacheSolve() cacluates the inverse of a Matrix from Matrix or makeCachematrix().
# to validate my won code, you can use the following seqeunces:
# > m <- makeCacheMatrix()
# > m$set(matrix(c(4,2,2,4),2,2)) # > m$get()
#        [,1] [,2]
# [1,]    4    2
# [2,]    2    4
#
# > cacheSolve(m)
#             [,1]       [,2]
# [1,]  0.3333333 -0.1666667
# [2,] -0.1666667  0.3333333
#
# > cacheSolve(m)
# getting cached data
#             [,1]       [,2]
# [1,]  0.3333333 -0.1666667
# [2,] -0.1666667  0.3333333

# makeCacheMatrix: return a list of functions to:
# 1. Set the value of the matrix
# 2. Get the value of the matrix
# 3. Set the value of the inverse
# 4. Get the value of the inverse
makeCacheMatrix <- function(x = matrix()) {
## Initialize m
m <- NULL

## Create a function which is to keep global_x and global_m as passed matrix and Null, respectively.
set <- function(y) {
# y is the initial matrix from user. so it is stored in global_x.
global_x <<- y
# initialize global_m
global_m <<- NULL
}

# Create one line function(). a matrix stored by set() is returned.
get <- function() return(global_x)
# Create one line function(). a matrix is stored as global value.
set_global_m <- function(m) global_m <<- m
# Create one line function(). a matrix stored by set_global_m() is returned.
get_global_m <- function() return(global_m)
list(set = set, get = get,
set_global_m = set_global_m,
get_global_m = get_global_m)
}

# This function computes the inverse of matrix.
# by checking previous history, this function avoids for redundancy.
cacheSolve <- function(x) {
# try to get the value from the global environment.
m<- x$get_global_m() if(!is.null(m)) { # Check the result. # by checking if m is NULL, we can know whether this matrix was already computed or not. # if so, return computed value in last time, then print the message. message("getting cached data") return(m) } # if m is NULL, the inverse of matrix is computed by solve() function. # Then, this result should be stored in global value for reusing. data <- x$get()
inverseMatrix <- solve(data)
x\$set_global_m(inverseMatrix)
return(inverseMatrix)
}


#### 'MOOC > R Programming' 카테고리의 다른 글

Certificate & Comment  (0) 2015.08.26 2015.07.30 2015.07.16 2015.07.08

[Week 2] Programming with R

If

기본적인 제어문이다.

if(<condition>) {

## do something

} else {

## do something else

}

if(<condition1>) {

## do something

} else if(<condition2>) {

## do something different

} else {

## do something different

}

else는 없어도 된다.  for / while

for(i in 1:10) {

print(i)

}

This loop takes the i variable and in each iteration of the loop gives it values 1,2,3, ... , 10, and then exits.

These three loops have the same behavior.

x <- c("a", "b", "c", "d")

for(i in 1:4) {

print(x[i])

}

for(i in seq_along(x)) {

print(x[i])

}

for(letter in x) {

print(letter)

}

for(i in 1:4) print(x[i])

z <- 5

while(z >= 3 && z <= 10) {

print(z)

coin <- rbinom(1, 1, 0.5)

if(coin == 1) { ## random walk

z <- z + 1

} else {

z <- z - 1

}

}

Repeat

break를 써야만 중단할 수 있는 무한 루프를 생성 하게 된다.

The loop in the previous slide is a bit dangerous because there's no guarantee it will stop.

Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

Next

해당 iteration을 무시함. Return

만나는 즉시 해당 function에서 나오게 된다.

그리고 주어진 값을 반환 한다.

Functions

기본적인 예제  Functions are created using the function() directive and are stored as R objects just like anything else.

In particular, they are R objects of class "function".

f <- function (<arguments>) {

## Do something interesting

}

Functions in R are "first class objects", which means that they can be treated much like any other R object.

Importantly,

Functions can be passed as arguments to other functions

Functions can be nested, so that you can define a function inside of another function The return value of a function is the last expression in the function body to be evaluated.

Argument Matching

You can mix positional matching with matching by name. When an argument is matched by name, it is "taken out" of the argument list and the remaining unnamed arguments are atched in the order that they are listed in the function definition.

Most of the time, named arguments are useful on the command line when you have a long argument list and you want to use the

Lazy Evaluation

Arguments to functions are evaluated lazily, so they are evaluated only as needed.

f<- function (a,b)

{ a^2 }

f(2)

두번째 인자를 사용하지 않아도 에러가 발생하지 않는다.

실제로 b를 사용하지 않기 때문이다.

f <- function (a,b) {

print (a)

print (b)

}

f(45)

 45

Error: Argument "b" is missing, with no default

lazy evaluation의 의미는 위에서 보는것과 같이 최대한 error check을 뒤로 미뤄서 한다는 것에 있다.

The "..." Argument

The ... argument indicate a variable number of arguments that are usually passed on to other functions.

... is often used when extending another function and you don't want to copy the entire argument list of the original function.

myplot <- function (x, y, type = "1", ...) {

plot(x, y, type = type, ...)

}

Generic functions use ... so that extra arguments can be passed to methods (more on this later).

> mean

function (x, ...)

UseMethod("mean")

Scoping Rules

The scoping rules for R are the main feature that make it different from the original S language.

The scoping rules determine how a value is associated with a free variable in a function

R uses lexical scoping or static scoping. A common alternative is dynamic scoping.

Related to the scoping rules is how R uses the search list to bind a value to a symbol

Lexical scoping turns out to be particularly useful for simplifying statistical computations

Free variable의 정의

이것은 formal argument 가 아니다.

이것은 local variable이 아니다.

Lexical Scoping Exploring a Function Closure Lexical vs. Dynamic Scoping 함수 f에서 y와 함수 g는 free 변수 이다.

함수 G에서는 y가 free variable 이다.

With lexical scoping the value of y in the function g is looked up in the environment in which the function was defined,

in this case the global environment, so the value of y is 10.

With dynamic scoping, the value of y is looked up in the environment from which the function was called

(sometimes referred to as the calling environment).

- In R the calling environment is known as the parent frame

So the value of y would be 2.

When a function is defined in the global environment and is subsequently called from the global environment, then the defining environment and the calling environment are the same. This can sometimes give the appearance of dynamic scoping. Coding Standards

1. Always use text files / text editor

3. Limit the width of your code (80 columns?)

4. Limit the length of individual functions

아래는 들여쓰기 공간을 4로 준것이고, 배경도 다크로 변경한 것이다.

[tool] -> [global options] Dates and Times in R

R has developed a special representation of dates and times

Dates are represented by the Date class

Times are represented by the POSIXct or the POSIXlt class

Dates are stored internally as the number of days since 1970-01-01

Times are stored internally as the number of seconds since 1970-01-01

There are a number of generic functions that work on dates and times

weekdays: give the day of the week

months: give the month name

quarters: give the quarter number (“Q1”, “Q2”, “Q3”, or “Q4”)  Quiz 02

10문제이고 풀이는 첨부 파일과 같다. Quize 2 week (answer).pdf

Programming Assignment 1: Air Pollution

처음한 프로그래밍 과제

제출 방식이 R script를 이용한다.

아래와 같이 정상적으로 업로드 할경우 완료되는 것을 볼 수 있다. #### 'MOOC > R Programming' 카테고리의 다른 글

Certificate & Comment  (0) 2015.08.26 2015.07.30 2015.07.24 2015.07.08

[1 Week] Getting stated and R Nuts and Bolts

Overview and History of R

History

R is a dialect of the S language.

S is a language that was developed by John Chambers and others at Bell Labs.

S was initiated in 1976 as an internal statistical analysis environment - originally implemented as Fortran libraries.

R

1991: Created in New Zealand by Ross Ihaka and Robert Gentleman. Their experience

developing R is documented in a 1996 JCGS paper.

1993: First announcement of R to the public.

1995: Martin Mächler convinces Ross and Robert to use the GNU General Public License to

make R free software.

1996: A public mailing list is created (R-help and R-devel)

1997: The R Core Group is formed (containing some people associated with S-PLUS). The core

group controls the source code for R.

2000: R version 1.0.0 is released.

2013: R version 3.0.2 is released on December 2013.

Features of R

Syntax is very similar to S, making it easy for S-PLUS users to switch over.

Semantics are superficially similar to S, but in reality are quite different (more on that later).

Runs on almost any standard computing platform/OS (even on the PlayStation 3)

Frequent releases (annual + bugfix releases); active development.

·

Quite lean, as far as software goes; functionality is divided into modular packages

Graphics capabilities very sophisticated and better than most stat packages.

Useful for interactive work, but contains a powerful programming language for developing new

tools (user -> programmer)

Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow

It's free !

R Data Types: Objects and Attributes

모든것은 Object

5개의 기본적인 object가 존재한다.

character

numeric (real numbers)

integer

complex

logical (True / False)

Vector ()

A vector can only contain objects of the same class

List ()

can contain objects of different classes ( 서로다른 타입의 object를 포함하는 vector )

Numbers

Real number

to specify the L suffix. Ex: entering 1 gives a numerix object; entering 1L explicitly give you an integer.

the value NaN represents an undefined value ("not a number"); e.g. 0 / 0; NaN can also be thought of as a missing value (more on that later)

Attributes

R objects can have attributes

names, dimnames

dimensions (e.g. matrices, arrays)

class

length

Data Types - Vectors and Lists

Creating Vectors

The c() function can be used to create vectors of objects.

x <- c(0.5, 0.6) ## numeric

using the vector() function

x <- vector ("numeric", length = 10)

x

 0 0 0 0 0 0 0 0

Data Types - Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)  Matrices can also be created directly from vectors by adding a dimension attribute. Matrices can be created by column-binding or row-binding with cbind() and rbind(). Data Types - Factors

Factors are used to represent categorical data. Factors can be un-ordered or ordered. One can think of a factor as an integer vector where each integer has a label.

• Factors are treated specially by modelling functions like lm() and glm()
• Using factors with labels is better than using integers because factors are self-describing; having a variable that has values "Male" and "Female" is better than a variable that has values 1 and 2. The order of the levels can be set using the levels argument to factors().

This can be important in linear modeling because the first level is used as the baseline level. Data type - Missing Values

NaN은 na를 포함하지만

na는 NaN을 포함하지 않는다.

na나 NaN이나 모두 test object를 위해서 사용 된다.

가끔 Excel 파일을 읽어 들이면 NaN들이 많이 있는 것을 볼 수도 있다. Data Types - Frame

가장 중요한 key data 타입중 하나이다.

Data frames are used to store tabular data

• They are represented as a special type of list where every element of the list has to have the same length
• Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
• Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class
• Data frames also have a special attribute called row.names
• Can be converted to a matrix by calling data.matrix() Removing NA Values

A common task is o remove missing values (NAS)

> x <- c(1, 2, NA, 4, NA, 5)

 1 2 4 5

여러개의 백터가 있을 때 그중에서 no missing value들의 subset을 구하고 싶다면?

complete.cases (x, y)

Description

Return a logical vector indicating which cases are complete, i.e., have no missing values.  airquality [good, ] 로 할경우 모든 row가 다 출력됨.

ma.omit()

NA 값을 가지고 있는 모든 행을 삭제 한다.

Quiz 01

20문제이고 풀이는 첨부 파일과 같다. Quize 1 week (result2).pdf

#### 'MOOC > R Programming' 카테고리의 다른 글

Certificate & Comment  (0) 2015.08.26 2015.07.30 2015.07.24 2015.07.16