Convolutional Neural Network (CNN) 구현

JAYNUX 2016. 12. 31. 17:15

2016. 12. 31. 17:15

Convolutional Neural Network (CNN) 구현

Introduction

고양이 실험에서 시작 되었다.
고양이에게 어떤 그림을 보여줬더니, 동시에 뉴런이 모두 시작되지 않았다.
각각의 부분에 대해서 다르게 동작 하였다.

이러한 개념을 이용해서 필터를 생성하여 각각의 부분을 추출하는 방식을 이용 한다.

CNN을 Image가 아닌 input data에 대해서 적용 할 때는 CNN이 왜 잘 동작하는지에 대한 직관적인 이해가 필요하다.
만약 적용하려는 데이터가 여전히 이것을 만족한다면 구지 Image가 아니어도 CNN은 우수한 성능을 보장한다.

convnet 적용하기 위한 데이터 특징

CNN의 핵심은 Weight Sharing이라고 한다. 왜냐하면 보통 Stride=1을 주기 때문이 다수의 필터들이 한칸씩만 옆으로 이동하면서 적용 되기 때문에 많은 weight들이 서로 공유하고 있는 형태가 된다. 따라서 사진 처럼 인접한 픽셀들이 서로 관련성이 매우 높아야 한다.

즉, convnet은 지역적으로 weight값들이 공유 된다는 가정하게 수행 되게 된다.
만약 이미지가 아니라면 최소한 근처 데이터 끼리 상관 관계가 높아야 한다.
예를 들면, Audio데이터 같은 time series라고 하면 이전의 사건이 이후의 사건과 관계가 높으므로 convnet을 적용하기 좋다.

만약 data.frame데이터의 각 컬럼이 서로 연관성이 없고 scaling이 심지어 서로 다르다면 convnet을 적용할 의미는 없다.

Implementation

TensorFlow를 이용해서 간단한 CNN을 구현해 보자.
이전에 MLP에서 사용한 MNIST set을 그대로 사용한다.

구현하고자 하는 모델을 표현하면 아래와 같다.

conv를 적용하기 위해선 아래의 함수를 이용한다.

tf.nn.conv2d(X, w, strides=[1, 1, 1, 1], padding='SAME')

위와 같을 때 SAME의 의미는 원래 Image와 같은 크기의 Activiation Map을 만들 겠다는 의미이다.

파라메터는 순서대로 아래와 같다.

input: [batch, in_height, in_width, in_channels] 형식. 28x28x1 형식의 손글씨 이미지.
filter: [filter_height, filter_width, in_channels, out_channels] 형식. 3, 3, 1, 32의 w.
strides: 크기 4인 1차원 리스트. [0], [3]은 반드시 1. 일반적으로 [1], [2]는 같은 값 사용.
padding: 'SAME' 또는 'VALID'. 패딩을 추가하는 공식의 차이. SAME은 출력 크기를 입력과 같게 유지.

28x28x1의 이미지 이다.
필터는 32 filters로 (3x3x1)이다. 필터의 사이즈는 조절이 가능하다.

필터를 만든다는 것은 weight을 생성 한다는 것이다.
초기가값은 정규분포에 따르는 랜덤 값을 주게되고
학습이 되어야 하는 필터이므로 Variable로 선언한다.

w=tf.Variable(tf.random_normal([3,3,1,32], stddev=0.01))

이제 각각의 필터들을 이동하면서 적용을 해주어야 하는데 이게 귀찮기 때문에 TensorFlow에서는 이것을 지원 한다.

tf.nn.cov2d(X,W)함수로 이것을 지원 한다.
X는 이미지가 되고 w는 필터가 된다.
strides=[1,1,1,1], padding='SAME으로 합치게 된다.
strides = [1, stride, stride, 1]

아래와 같이 결국 activation map은 32개 층을 이루며 size는 same 이기 때문에 28x28의 크기를 가지게 된다.

그리고 Convolution과 Relu를 같이 주고 싶다면 아래 코드로 간단하게 적용이 가능하다.

l1a = tf.nn.relu(tf.nn.conv2d(X, w, strides=[1, 1, 1, 1], padding='SAME'))

풀링도 아래와 같이 간단히 구현이 가능하다.
c1은 컨볼루셔널 결과를 받는 것이다.

결과는 2by2로 2,2를 하기 때문에 반 줄어든
14,14,32가 된다.
SAME 이라도 stride 2라서 14기 때문이다.

print l1a를 하면 shape을 얻을 수 있다. 오류를 막기 위해서 직접 출력해 보는 것도 좋은 방법이다.

Tensor reshape

tf.reshape(tensor, shape, name=None)

위 함수를 이용해서 주어진 tensor를 다른 모양으로 변경해서 반환 하게 된다.
-1을 쓰게 되면 평평한 1-D 모양을 생성 한다.
최종적인 the number of elements는 결국 같아야 한다.

동작 예제는 아래와 같다.

# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9]
# tensor 't' has shape [9]
reshape(t, [3, 3]) ==> [[1, 2, 3],
                        [4, 5, 6],
                        [7, 8, 9]]

# tensor 't' is [[[1, 1], [2, 2]],
#                [[3, 3], [4, 4]]]
# tensor 't' has shape [2, 2, 2]
reshape(t, [2, 4]) ==> [[1, 1, 2, 2],
                        [3, 3, 4, 4]]

# tensor 't' is [[[1, 1, 1],
#                 [2, 2, 2]],
#                [[3, 3, 3],
#                 [4, 4, 4]],
#                [[5, 5, 5],
#                 [6, 6, 6]]]
# tensor 't' has shape [3, 2, 3]
# pass '[-1]' to flatten 't'
reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]

# -1 can also be used to infer the shape

# -1 is inferred to be 9:
reshape(t, [2, -1]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],
                         [4, 4, 4, 5, 5, 5, 6, 6, 6]]
# -1 is inferred to be 2:
reshape(t, [-1, 9]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],
                         [4, 4, 4, 5, 5, 5, 6, 6, 6]]
# -1 is inferred to be 3:
reshape(t, [ 2, -1, 3]) ==> [[[1, 1, 1],
                              [2, 2, 2],
                              [3, 3, 3]],
                             [[4, 4, 4],
                              [5, 5, 5],
                              [6, 6, 6]]]

# tensor 't' is [7]
# shape `[]` reshapes to a scalar
reshape(t, []) ==> 7

코드

import tensorflow as tf
import numpy as np
import input_data
import time

batch_size = 128
test_size = 256

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

# Filter weight vectors: w, w2, w3, w4, w_0
def model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden):
    l1a = tf.nn.relu(tf.nn.conv2d(X, w,                       # l1a shape=(?, 28, 28, 32)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l1 = tf.nn.max_pool(l1a, ksize=[1, 2, 2, 1],              # l1 shape=(?, 14, 14, 32)
                        strides=[1, 2, 2, 1], padding='SAME')
    l1 = tf.nn.dropout(l1, p_keep_conv)

    l2a = tf.nn.relu(tf.nn.conv2d(l1, w2,                     # l2a shape=(?, 14, 14, 64)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l2 = tf.nn.max_pool(l2a, ksize=[1, 2, 2, 1],              # l2 shape=(?, 7, 7, 64)
                        strides=[1, 2, 2, 1], padding='SAME')
    l2 = tf.nn.dropout(l2, p_keep_conv)

    l3a = tf.nn.relu(tf.nn.conv2d(l2, w3,                     # l3a shape=(?, 7, 7, 128)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l3 = tf.nn.max_pool(l3a, ksize=[1, 2, 2, 1],              # l3 shape=(?, 4, 4, 128)
                        strides=[1, 2, 2, 1], padding='SAME')
    l3 = tf.reshape(l3, [-1, w4.get_shape().as_list()[0]])    # reshape to (?, 2048)
    l3 = tf.nn.dropout(l3, p_keep_conv)

    l4 = tf.nn.relu(tf.matmul(l3, w4))
    l4 = tf.nn.dropout(l4, p_keep_hidden)

    pyx = tf.matmul(l4, w_o)
    return pyx

# Read data
mnist = input_data.read_data_sets("MNIST_DATA/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

# trx.reshape( n-inputs, image size, image size, depth )
 # this variable is input in model()
trX = trX.reshape(-1, 28, 28, 1)  # 28x28x1 input img
teX = teX.reshape(-1, 28, 28, 1)  # 28x28x1 input img

X = tf.placeholder("float", [None, 28, 28, 1])
Y = tf.placeholder("float", [None, 10])

w = init_weights([3, 3, 1, 32])       # 3x3x1 conv, 32 outputs
w2 = init_weights([3, 3, 32, 64])     # 3x3x32 conv, 64 outputs
w3 = init_weights([3, 3, 64, 128])    # 3x3x32 conv, 128 outputs
w4 = init_weights([128 * 4 * 4, 625]) # FC 128 * 4 * 4 inputs, 625 outputs
w_o = init_weights([625, 10])         # FC 625 inputs, 10 outputs (labels)

p_keep_conv = tf.placeholder("float")
p_keep_hidden = tf.placeholder("float")
py_x = model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(py_x, 1)

# Launch the graph in a session
with tf.Session() as sess:
    # you need to initialize all variables
    start_time = time.time()
    tf.initialize_all_variables().run()

    for i in range(100):
        training_batch = zip(range(0, len(trX), batch_size),
                             range(batch_size, len(trX)+1, batch_size))
        for start, end in training_batch:
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end],
                                          p_keep_conv: 0.8, p_keep_hidden: 0.5})

        test_indices = np.arange(len(teX)) # Get A Test Batch
        np.random.shuffle(test_indices)
        test_indices = test_indices[0:test_size]

        print(i, np.mean(np.argmax(teY[test_indices], axis=1) ==
                         sess.run(predict_op, feed_dict={X: teX[test_indices],
                                                         Y: teY[test_indices],
                                                         p_keep_conv: 1.0,
                                                         p_keep_hidden: 1.0})))

    print("time elapsed: {:.2f}s".format(time.time() - start_time))

출력

/root/tensorflow/bin/python /root/DataScience/TensorFlowLecture/5.CNN/CNNforMNIST.py
Extracting MNIST_DATA/train-images-idx3-ubyte.gz
/usr/lib/python2.7/gzip.py:268: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  chunk = self.extrabuf[offset: offset + size]
/root/DataScience/TensorFlowLecture/5.CNN/input_data.py:47: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  data = data.reshape(num_images, rows, cols, 1)
Extracting MNIST_DATA/train-labels-idx1-ubyte.gz
Extracting MNIST_DATA/t10k-images-idx3-ubyte.gz
Extracting MNIST_DATA/t10k-labels-idx1-ubyte.gz
(0, 0.95703125)
(1, 0.96875)
(2, 0.98828125)
(3, 0.984375)
(4, 0.984375)
(5, 0.9765625)
(6, 0.99609375)
(7, 0.98046875)
(8, 1.0)
(9, 0.9921875)
(10, 0.99609375)
(11, 0.99609375)

원래 100번 Loop를 수행 해야하나 CPU로는 너무 느려서 그냥 중간에 포기했다.
대충 몇번만 Epoch해도 정확도가 1에 근접하는 놀라운 성능을 보여준다.

작성된 완전한 코드는 아래에 있습니다.
https://github.com/leejaymin/TensorFlowLecture/blob/master/5.CNN/CNNforMNIST.ipynb

참고 사이트

모두를위한 딥러닝
CNN을 Text Classification에 적용하기

저작자표시 (새창열림)

'AI > TensorFlow, PyTorch, Keras, Scikit' 카테고리의 다른 글

Tensor Board (0)	2017.02.08
Early Stopping 및 Index Shuffling (0)	2017.01.03
Convolutional Neural Network (CNN) 이론 (1)	2016.12.31
AWS의 GPU를 이용한 TensorFlow (1)	2016.12.31
rpy2 Windows 10에 설치 하기 (0)	2016.12.13

GOOD to GREAT