NVDLA: NVIDIA Deep Learning Accelerator (DLA) 개론

JAYNUX 2019. 2. 8. 11:16

2019. 2. 8. 11:16

NVDLA: NVIDIA Deep Learning Accelerator (DLA) 개론

공식(Deep Dive)
http://nvdla.org/primer.html

무료 공개 아키텍쳐이다.
이것을 통해 딥러닝 추론 가속기의 디자인 표준을 제공하는 목적을 가진다.

Accelerating Deep Learning Inference using NVDLA

NVDLA 하드웨어는 다음의 컴포넌트들을 포함하고 있다.

Convolution Core
Single Data Processor
Planar Data Processor
Channel Data Processor
Dedicated Memory and Data Reshape Engines

정보

Deep Learning Accelerator
The other accelerators on-die is the deep learning accelerator (DLA) which is actually a physical implementation of the open source Nvidia NVDLA architecture. Xavier has two instances of NVDLA which can offer a peak theoretical performance of 5.7 teraFLOPS (half precision FP) or twice the throughput at 11.4 TOPS for int8.

Throughput
11.4 TOPS (int8)
5.7 TFLOPS (FP16)

https://en.wikichip.org/wiki/nvidia/tegra/xavier

Sample Platforms

Simulation

FPGA

Amazon EC2 F1 environment

https://aws.amazon.com/ko/ec2/instance-types/f1/

Xavier

실제 Product로 검증되어서 출시된것이 Jetson Xavier이다.
DLA (FP16 at this time)을 지원하며 이것을 TensorRT에서 사용할 수 있도록 지원 한다.

공식 문서

공식문서 컨텐츠
http://nvdla.org/contents.

software와 하드웨어 메뉴얼 등등이 수록되어 있음

공식 사이트: http://nvdla.org/

full understanding guide doc: http://nvdla.org/primer.html

포럼 DLA 사용

https://devtalk.nvidia.com/default/topic/1045030/tensorrt/sample_mnist-segmentation-faults-on-dla-on-xavier/

./sample_mnist --useDLACore=1

 jsteward@jetson-0423718016929:~/Work/tensorrt/bin$ ./sample_mnist --useDLACore=1
 Building and running a GPU inference engine for MNIST
 WARNING: Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
 WARNING: Default DLA is enabled but layer (Unnamed Layer* 9) [Constant] is not running on DLA, falling back to GPU.
 WARNING: (Unnamed Layer* 10) [ElementWise]: DLA cores do not support SUB ElementWise operation.
 WARNING: Default DLA is enabled but layer (Unnamed Layer* 10) [ElementWise] is not running on DLA, falling back to GPU.
 Segmentation fault (core dumped)

Working with DLA 보면 코드가 있다.
즉 CUDA와 다르게 막 쓸 수는 없고 DLA는 TensorRT를 통해서만 사용할 수 있다.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_topic

Jetpack에 있는 Doc.을 보고 TensorRT에서 어떻게 NVDLA를 사용할 수 있는지 알 수 있다.
https://devtalk.nvidia.com/default/topic/1041768/jetson-agx-xavier/tensorrt-5-docs-and-examples/post/5284437/#5284437

하드웨어 구성요소

Convolution Core: 서로 다른 크기로 지원
Single Data Point Processor: activation function을 위한 방법이다. linear와 non-linear모두를 지원한다.
Planar Data Processor: pooling을 위한것
Cross-Channel Data Processor: local normalization을 위한것 같다.
Data Reshape Engines: 텐서 복사나 리쉐입을 위한 memory to meory transformation acceleration. 즉, splitting, slicing, merging, contraction, reshape transpose.
Bridge DMA: 시스템 DRAM사이에서의 데이터 전송을 관장한다.

스크린샷 2019-01-15 오후 4.39.14

출처: 공식문서

스크린샷 2019-02-08 오전 11.07.33

출처: SysML 18: Bill Dally, Hardware for Deep Learning

소프트웨어

Compilation tools: (model conversion).
Runtime environment: (run-time software to load and execute entworks on NVDLA).

Xavier DLA는 코딩이 가능한가?
https://devtalk.nvidia.com/default/topic/1041868/jetson-agx-xavier/nvdla-programming-interface/

NVDLA의 전력 소모

스크린샷 2019-02-08 오전 11.09.54

출처: SysML 18: Bill Dally, Hardware for Deep Learning

참고자료

[1] 공식 깃허브: https://github.com/nvdla/

[2] DEVIEW2018: https://tv.naver.com/v/4580068

[3] NVidia TensorRT: High-performance deep learning inference accelerator (TensorFlow Meets): https://www.youtube.com/watch?v=G_KhUFCUSsY

저작자표시

'AI > NVIDIA' 카테고리의 다른 글

TensorRT 개론 및 Docker기반 실행 (0)	2021.02.04
TensorRT이용한 Xavier DLA (NVDLA) 실행 (4)	2019.02.08
NVIDIA AI Tech Workshop at NIPS 2018 -- Session3: Inference and Quantization (0)	2019.02.06
DeepStream을 통한 low precision YOLOv3 실행 (0)	2019.01.24
YOLOv3 on Jetson AGX Xavier 성능 평가 (2)	2019.01.10

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

GOOD to GREAT