Session 1I: Recovering Loop Structure from First-Order Functional Programs
Session Number
Session 1I: 2nd Presentation
Advisor(s)
Professor John Reppy, University of Chicago
Location
Room A119
Start Date
26-4-2018 9:40 AM
End Date
26-4-2018 10:25 AM
Abstract
GPUs are able to provide supercomputer-level performance at vastly lower prices and, as a result, have become increasingly popular for general purpose computing, such as machine learning and cryptography. However, GPUs have been historically hard to program. NESL is a first order functional programming language that utilizes Nested Data Parallelism (NDP). NDP is the ability to apply any function, even parallel ones, to a set of values. This allows us to raise the level of abstraction for GPU programming, however NESL is not as heavily optimized as CUDA, a parallel programming platform developed by NVIDIA. CuNESL is a compiler that generates CUDA code from a NESL source. This is done through an Intermediate Representation (IR) called λCU. At the top level of λCU, the CPU level, we explore how to intelligently determine when it is efficient to convert a tail-end recursive call in NESL into a loop in imperative CUDA. This was done by benchmarking pure CUDA and CuNESL compiled implementations of the same algorithm, k-means clustering.
Session 1I: Recovering Loop Structure from First-Order Functional Programs
Room A119
GPUs are able to provide supercomputer-level performance at vastly lower prices and, as a result, have become increasingly popular for general purpose computing, such as machine learning and cryptography. However, GPUs have been historically hard to program. NESL is a first order functional programming language that utilizes Nested Data Parallelism (NDP). NDP is the ability to apply any function, even parallel ones, to a set of values. This allows us to raise the level of abstraction for GPU programming, however NESL is not as heavily optimized as CUDA, a parallel programming platform developed by NVIDIA. CuNESL is a compiler that generates CUDA code from a NESL source. This is done through an Intermediate Representation (IR) called λCU. At the top level of λCU, the CPU level, we explore how to intelligently determine when it is efficient to convert a tail-end recursive call in NESL into a loop in imperative CUDA. This was done by benchmarking pure CUDA and CuNESL compiled implementations of the same algorithm, k-means clustering.