**[1 Week] Getting stated and R Nuts and Bolts**

**Overview and History of R**

History

R is a dialect of the S language.

S is a language that was developed by John Chambers and others at Bell Labs.

S was initiated in 1976 as an internal statistical analysis environment - originally implemented as Fortran libraries.

**R**

1991: Created in New Zealand by Ross Ihaka and Robert Gentleman. Their experience

developing R is documented in a 1996 JCGS paper.

1993: First announcement of R to the public.

1995: Martin Mächler convinces Ross and Robert to use the GNU General Public License to

make R free software.

1996: A public mailing list is created (R-help and R-devel)

1997: The R Core Group is formed (containing some people associated with S-PLUS). The core

group controls the source code for R.

2000: R version 1.0.0 is released.

2013: R version 3.0.2 is released on December 2013.

**Features of R**

Syntax is very similar to S, making it easy for S-PLUS users to switch over.

Semantics are superficially similar to S, but in reality are quite different (more on that later).

Runs on almost any standard computing platform/OS (even on the PlayStation 3)

Frequent releases (annual + bugfix releases); active development.

·

Quite lean, as far as software goes; functionality is divided into modular packages

Graphics capabilities very sophisticated and better than most stat packages.

Useful for interactive work, but contains a powerful programming language for developing new

tools (user -> programmer)

Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow

It's free !

**R Data Types: Objects and Attributes**

**모든것은 Object **

5개의 기본적인 object가 존재한다.

character

numeric (real numbers)

integer

complex

logical (True / False)

Vector ()

A vector can only contain objects of the same class

List ()

can contain objects of different classes ( 서로다른 타입의 object를 포함하는 vector )

**Numbers**

Real number

to specify the L suffix. Ex: entering 1 gives a numerix object; entering 1L explicitly give you an integer.

the value ** NaN **represents an undefined value ("not a number"); e.g. 0 / 0;

**can also be thought of as a missing value (more on that later)**

*NaN***Attributes**

R objects can have attributes

names, dimnames

dimensions (e.g. matrices, arrays)

class

length

other user-defined attributes/metadata

**Data Types - Vectors and Lists**

**Creating Vectors**

The **c()** function can be used to create vectors of objects.

x <- c(0.5, 0.6) ## numeric

using the **vector()** function

x <- vector ("numeric", length = 10)

x

[1] 0 0 0 0 0 0 0 0

**Data Types - Matrices **

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

Matrices can also be created directly from vectors by adding a dimension attribute.

Matrices can be created by column-binding or row-binding with **cbind()** and **rbind().**

**Data Types - Factors**

Factors are used to represent categorical data. Factors can be un-ordered or ordered. One can think of a factor as an integer vector where each integer has a *label*.

- Factors are treated specially by modelling functions like lm() and glm()
- Using factors with labels is better than using integers because factors are self-describing; having a variable that has values "Male" and "Female" is better than a variable that has values 1 and 2.

The order of the levels can be set using the levels argument to factors().

This can be important in linear modeling because the first level is used as the baseline level.

Data type - Missing Values

NaN은 na를 포함하지만

na는 NaN을 포함하지 않는다.

na나 NaN이나 모두 test object를 위해서 사용 된다.

가끔 Excel 파일을 읽어 들이면 NaN들이 많이 있는 것을 볼 수도 있다.

**Data Types - Frame**

가장 중요한 key data 타입중 하나이다.

Data frames are used to store tabular data

- They are represented as a special type of list where every element of the list has to have the same length
- Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
- Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class
- Data frames also have a special attribute called row.names
- Data frames are usually created by calling read.table() or read.csv()
- Can be converted to a matrix by calling data.matrix()

**Removing NA Values **

A common task is o remove missing values (NAS)

> x <- c(1, 2, NA, 4, NA, 5)

> bad <- **is.na(x)**

> x[!bad]

**[1] 1 2 4 5**

여러개의 백터가 있을 때 그중에서 no missing value들의 subset을 구하고 싶다면?

**complete.cases (x, y)**

**Description**

Return a logical vector indicating which cases are complete, i.e., have no missing values.

**airquality [good, ] **로 할경우 모든 **row**가 다 출력됨.

**ma.omit()**

NA 값을 가지고 있는 모든 행을 삭제 한다.

**Quiz 01**

20문제이고 풀이는 첨부 파일과 같다.

#### 'MOOC > R Programming' 카테고리의 다른 글

Certificate & Comment (0) | 2015.08.26 |
---|---|

[4 Week] Str & Simulation & R Profiler (0) | 2015.07.30 |

[3 Week] Loop Functions & Debugging Tools (1) | 2015.07.24 |

[2 Week] Programming with R (0) | 2015.07.16 |

[1 Week] Getting stated and R Nuts and Bolts (0) | 2015.07.08 |