Introduction to OpenCL

JAYNUX 2017. 2. 21. 21:18

2017. 2. 21. 21:18

Introduction to OpenCL

What is openCL (personal definition)

Low-level language for high-performance heterogeneous data-parallel computation

low-level: manual memory management and parallelization. You choose the global dimensions and allocate data
Language: a framework with C-like computation kernels. Not really a language
High-performance: if your algorithm is a good fit for the hardware E.g., data-parallel
Heterogeneous: Code is protable, but perfomrance is not. Different vendors/versions require different optimizations
Data-parallel: hardware and software only support data-parallel. Task parallel support is limited on GPUs today.

Why Limited Synchronization?

Scales well in hardware
- Only work-items within a work-group need to communicate
- GPUs run 32-128 work-gropus(Thread) in parallel

Global Synchronization

OpenCL only supoorts global synchronization at the end of a kernel execution

Utility Functions

information about each work-item
- get_global_id(dim)
  - current work-item's ID in a particular dimension

int id = get_global_id(0);
data[id] = sin(data[id]);

- get_work_dim()
	- number of global dimensions in use
- get_global_size(dim)
	- number of global work-items in a particular dimension
- get_local_size(), get_local_id(), get_num_groups(), get_group_id()
	- information about the local dimensions

What do we need to change to use the GPU?

All we have to do is change the CL_DEVICE_TYPE when we get the device to get a_GPU type deivce. The rest of the code will then use the GPU.

Howeverm if we have a complicated kernel while it will run on both the cpu and the gpu you won't get optimal performance.

To get optimal performance, we should modify kernel code along with the number of available cores in the machine.

OpenCL programming model

Setup
- Get the devices (and platform)
어떤 장치를 사용할지를 선택한다.
- Create a context (for sharing between devices)
data 연산을 위해서 어디에서 처리할 지를 선택 하는 것이다.
- Create command queues (for submitting work)
선택한 장치에 어떻게 작업을 할당 할지를 선택 한다.
work을 submit하는 작업을 수행 한다.
Compilation
- Create a program
- Build the program (compile)
- Create kernels
생성된 커널을 command queue에 할당하는 방식을 취한다.
Create memory objects
Enqueue writes to copy data to the GPU
데이터를 CPU로부터 복사해서 GPU로 전달하는 작업을 수행 한다.
Set the kernel arguments
Enqueue kernel executions
kernel을 command queue로 전달해서 작업을 처리한다.
Enqueue reads to copy data back from the GPU
Wait for your commands to finish

왜 근데 기다려야 하는가?
이유는 OpenCL은 asynchronous이기 때문이다.
When we enqueue a command we have no idea when it will finish (it's just in the queue.)
By explicitly waiting we make sure it is finished before continuiting.

저작자표시

GOOD to GREAT

Introduction to OpenCL

Introduction to OpenCL

What is openCL (personal definition)

Why Limited Synchronization?

Global Synchronization

Utility Functions

What do we need to change to use the GPU?

OpenCL programming model

+ Recent posts

티스토리툴바