rpy2와 Pandas를 활용한 R object 변환


Rdata파일을 읽어와서 Robject를 다루는 방법을 기술한다.
필요한 package들은 아래와 같다.

from rpy2.robjects import r
from rpy2.robjects import pandas2ri
import pandas as pd
pandas2ri.activate()

위와 같이 robjects를 다루는 r pandas r을 연결해주는 pandas2ri 두 개의 package가 필요하다.

로딩 방법

# Object를 생성 한다.

r.load("./Rdata/num10FclassDf10PassTrain.Rdata")

실행 결과

R object with classes: ('character',) mapped to:
<StrVector - Python:0x7fc114f787c8 / R:0x2cdfbd8>
['num10FclassDf10PassTrain']

데이터 출력

r['num10FclassDf10PassTrain']

실행결과

R object with classes: ('list',) mapped to:
<ListVector - Python:0x7fc115095bc8 / R:0x265d9f0>
[Matrix, Matrix, Matrix, ..., Matrix, Matrix, Matrix]
  acquaintanceSeula: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc1150877c8 / R:0x4072ca0>
[       1,        1,        1, ...,        1,        1,        1]
  ikhee: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc1150b7308 / R:0x407a9d0>
[       1,        2,        1, ...,        1,        1,        2]
  Jemin: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc1153085c8 / R:0x40843b0>
[       2,        1,        1, ...,        1,        2,        2]
  ...
  acquaintanceSeula: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc11530c048 / R:0x432ead0>
[       1,        2,        2, ...,        2,        2,        2]
  ikhee: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc114efff08 / R:0x434e400>
[       1,        1,        1, ...,        2,        2,        2]
  Jemin: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x7fc114eff888 / R:0x4354790>
[       1,        1,        3, ...,        1,        1,        2]

ri2py로 데이터형 변환

pandas2ri.ri2py(r['num10FclassDf10PassTrain'][0])

데이터 프레임으로 변경하는 방법

df1 = pd.DataFrame(pandas2ri.ri2py(r['num10FclassDf10PassTrain'][0]), columns=['AppName', "Title", "Hours", "Days", "RecentPhoneUsage", "Proximity", "Priority", "Activity", "PhoneStatus", "SeenTime", "class"])
df1.head()

참고문헌

https://pandas.pydata.org/pandas-docs/stable/r_interface.html


+ Recent posts