Overview and History of R
R is a dialect of the S language.
S is a language that was developed by John Chambers and others at Bell Labs.
S was initiated in 1976 as an internal statistical analysis environment - originally implemented as Fortran libraries.
1991: Created in New Zealand by Ross Ihaka and Robert Gentleman. Their experience
developing R is documented in a 1996 JCGS paper.
1993: First announcement of R to the public.
1995: Martin Mächler convinces Ross and Robert to use the GNU General Public License to
make R free software.
1996: A public mailing list is created (R-help and R-devel)
1997: The R Core Group is formed (containing some people associated with S-PLUS). The core
group controls the source code for R.
2000: R version 1.0.0 is released.
2013: R version 3.0.2 is released on December 2013.
Features of R
Syntax is very similar to S, making it easy for S-PLUS users to switch over.
Semantics are superficially similar to S, but in reality are quite different (more on that later).
Runs on almost any standard computing platform/OS (even on the PlayStation 3)
Frequent releases (annual + bugfix releases); active development.
Quite lean, as far as software goes; functionality is divided into modular packages
Graphics capabilities very sophisticated and better than most stat packages.
Useful for interactive work, but contains a powerful programming language for developing new
tools (user -> programmer)
Very active and vibrant user community; R-help and R-devel mailing lists and Stack Overflow
It's free !
R Data Types: Objects and Attributes
5개의 기본적인 object가 존재한다.
numeric (real numbers)
logical (True / False)
A vector can only contain objects of the same class
can contain objects of different classes ( 서로다른 타입의 object를 포함하는 vector )
to specify the L suffix. Ex: entering 1 gives a numerix object; entering 1L explicitly give you an integer.
the value NaN represents an undefined value ("not a number"); e.g. 0 / 0; NaN can also be thought of as a missing value (more on that later)
R objects can have attributes
dimensions (e.g. matrices, arrays)
other user-defined attributes/metadata
Data Types - Vectors and Lists
The c() function can be used to create vectors of objects.
x <- c(0.5, 0.6) ## numeric
using the vector() function
x <- vector ("numeric", length = 10)
 0 0 0 0 0 0 0 0
Data Types - Matrices
Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)
Matrices can also be created directly from vectors by adding a dimension attribute.
Matrices can be created by column-binding or row-binding with cbind() and rbind().
Data Types - Factors
Factors are used to represent categorical data. Factors can be un-ordered or ordered. One can think of a factor as an integer vector where each integer has a label.
- Factors are treated specially by modelling functions like lm() and glm()
- Using factors with labels is better than using integers because factors are self-describing; having a variable that has values "Male" and "Female" is better than a variable that has values 1 and 2.
The order of the levels can be set using the levels argument to factors().
This can be important in linear modeling because the first level is used as the baseline level.
Data type - Missing Values
NaN은 na를 포함하지만
na는 NaN을 포함하지 않는다.
na나 NaN이나 모두 test object를 위해서 사용 된다.
가끔 Excel 파일을 읽어 들이면 NaN들이 많이 있는 것을 볼 수도 있다.
Data Types - Frame
가장 중요한 key data 타입중 하나이다.
Data frames are used to store tabular data
- They are represented as a special type of list where every element of the list has to have the same length
- Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
- Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class
- Data frames also have a special attribute called row.names
- Data frames are usually created by calling read.table() or read.csv()
- Can be converted to a matrix by calling data.matrix()
Removing NA Values
A common task is o remove missing values (NAS)
여러개의 백터가 있을 때 그중에서 no missing value들의 subset을 구하고 싶다면?
complete.cases (x, y)
Return a logical vector indicating which cases are complete, i.e., have no missing values.
airquality [good, ] 로 할경우 모든 row가 다 출력됨.
20문제이고 풀이는 첨부 파일과 같다.