Sunday, February 22, 2009

Accelerating Software Applications Using OpenCL

Processor performance over the past decade was achieved through a technique of  packing more transistors in a per unit area of a silicon. For a while there, it appeared that Moore's Law was unbeatable. However, transistors have gotten so small, they leak more current than they consume in an active state. This resulted in increased power utilization and extreme heat transfusion. This is when multi core idea came by. Instead of packing transistors, hardware manufacturers decided to pack similar processors in a chip. The idea is to have multiple instructions of a software execute parallely on this multi processors chips; known now as multi-core execution.

The idea was great until instructions themselves shared the result of a previously executed instruction or they shared memory. In such a case, one processor has to stall until another has done its job, making the execution on instructions sequential. This basically made the multi processor chip useless. Then came the idea of a vector processor.

"A vector processor, or array processor, is a CPU design where the instruction set includes operations that can perform mathematical operations on multiple data elements simultaneously... The math operations thus completed far faster overall [1]."

Using vector processor have become a standard for speeding up certain type of software applications. One wouldn't speed up applications such as a word processor or a power point presentation using vector processors. One wouldn't use vector processors to run an operating system either. However, certain applications such as 3D games saw immense performance gains.

The kinds of applications that can get tremendous performance gains are those that conform to a SIMD architecture (Single Instruction, Multiple Data). Basically any data hungry applications can benefit from vector processing. For example, KJAYA Medical, of Stamford, Conn accelerates their medical diasnostic advanced visualization application using vector processors and observes 30 times performance gain over standard CPU performance. Other applications that can benefit from vector processors include but not limited to video transcoding, financial modeling and facial recognition.

The cheapest of-the-shelf vector processors are in a form of a video graphics card meant to run games. For example, the latest graphics card 4870 from ATI/AMD costs USD$500 and provides 2.4 Teraflops of processing power. Similarly, NVIDIA has an offering with similar performance, the GTX 295. Intel is working on a product, code named Larrabee. The processor present in a graphics card is acrnoymed GPU to stand for Graphics Processing Unit. A discussion on a GPU architecture is well presented by Justin Hensley at Sigraph 2008 [2].

The question remains on how one would accelerate software applications using vector processors. Till end of 2008, the answer was to use gaming programming model such as OpenGL or DirectX. Also, graphics hardware vendors NVIDIA and ATI/AMD provided programming languanges such as CUDA and Brook+. PeakStream (acquired by Google) and RapidMind offered software development kit as an abstraction over multi core and vector processors. However, OpenGL/DirectX required that one knew 3D graphics programming, while the vendor specific languanges tools ran on proprietary systems. Apple Inc. being frustrated over this, conceived an open standard languange and proposed it to Khronous Group; OpenCL was born.

"OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the heterogeneous platform. OpenCL provides parallel programming using both task-based and data-based parallelism [3]."

Various reference exist on OpenCL regarding its intricate technical detail. Jason Yang presented recently at Sigraph 2008 [4]. Khronos Group has the version 1.0 specification available online [5]. Dr. Tim Mattson is interviewed on a web audio telecast [6]. 

NVIDIA and ATI/AMD have promised OpenCL implementations in as early as first quarter of 2009 [7].

This is a birth of a new programming paradigm that is going to shape the next decade.

References

[1] Vector Processor Definition
http://en.wikipedia.org/wiki/Vector_processor

[2] Presentation by Justin Hensley at Sigraph 2008 - Throughput Computing: Hardware Basics
http://xcellerated.com/community/presentations/hensley-gpu-architecture.pdf

[3] OpenCL Definition
http://en.wikipedia.org/wiki/OpenCL 

[4] Jason Yang presented an introduction to OpenCL at Sigraph 2008
http://xcellerated.com/community/presentations/yang-opencl-intro.pdf

[5] Khronos Group on OpenCL specification
http://www.khronos.org/opencl/

[6] Parallel Programming Talk - OpenCL with Dr. Tim Mattson
http://software.intel.com/en-us/blogs/2009/01/21/parallel-programming-talk-opencl-with-tim-mattson/

[7] Press Release from Khronos Group, NVIDIA & ATI/AMD regarding OpenCL
http://www.khronos.org/news/press/releases/the_khronos_group_releases_opencl_1.0_specification/
http://www.nvidia.com/object/io_1228825271885.html
http://hothardware.com/News/AMD-Adopts-OpenCL-10-Specification


No comments:

Post a Comment