Session 1I: Recovering Loop Structure from First-Order Functional Programs

Session Number

Session 1I: 2nd Presentation

Advisor(s)

Professor John Reppy, University of Chicago

Location

Room A119

Start Date

26-4-2018 9:40 AM

End Date

26-4-2018 10:25 AM

Abstract

GPUs are able to provide supercomputer-level performance at vastly lower prices and, as a result, have become increasingly popular for general purpose computing, such as machine learning and cryptography. However, GPUs have been historically hard to program. NESL is a first order functional programming language that utilizes Nested Data Parallelism (NDP). NDP is the ability to apply any function, even parallel ones, to a set of values. This allows us to raise the level of abstraction for GPU programming, however NESL is not as heavily optimized as CUDA, a parallel programming platform developed by NVIDIA. CuNESL is a compiler that generates CUDA code from a NESL source. This is done through an Intermediate Representation (IR) called λCU. At the top level of λCU, the CPU level, we explore how to intelligently determine when it is efficient to convert a tail-end recursive call in NESL into a loop in imperative CUDA. This was done by benchmarking pure CUDA and CuNESL compiled implementations of the same algorithm, k-means clustering.

Share

COinS
 
Apr 26th, 9:40 AM Apr 26th, 10:25 AM

Session 1I: Recovering Loop Structure from First-Order Functional Programs

Room A119

GPUs are able to provide supercomputer-level performance at vastly lower prices and, as a result, have become increasingly popular for general purpose computing, such as machine learning and cryptography. However, GPUs have been historically hard to program. NESL is a first order functional programming language that utilizes Nested Data Parallelism (NDP). NDP is the ability to apply any function, even parallel ones, to a set of values. This allows us to raise the level of abstraction for GPU programming, however NESL is not as heavily optimized as CUDA, a parallel programming platform developed by NVIDIA. CuNESL is a compiler that generates CUDA code from a NESL source. This is done through an Intermediate Representation (IR) called λCU. At the top level of λCU, the CPU level, we explore how to intelligently determine when it is efficient to convert a tail-end recursive call in NESL into a loop in imperative CUDA. This was done by benchmarking pure CUDA and CuNESL compiled implementations of the same algorithm, k-means clustering.