Researchers from the School of Informatics at The University of Edinburgh, UK have been looking at the application of machine learning to perform OpenCL Task Partitioning in the Presence of GPU Contention. There is an increasing number of applications that make use of the integrated GPU on PCs and mobiles. On these systems it is typical to have multiple programs running at the same time, competing for the shared resources including the GPU. Under such settings, decisions about which computing device (the CPU or the GPU) to use to run the program and how the work should be partitioned across different devices have significant impact on the application’s performance. While multi-task scheduling on the general purpose CPU is a well-studied area, how to partitioning and map tasks onto the underlying platform in the presences of GPU contention remains an outstanding problem.
The papers was presented at the 26th International Workshop on Languages and Compilers for Parallel Computing between September 25-27, 2013
Abstract. Heterogeneous multi- and many-core systems are increasingly prevalent in the desktop and mobile domains. On these systems it is common for programs to compete with co-running programs for resources.While multi-task scheduling for CPUs is a well-studied area, how to partitioning and map computing tasks onto the hetergeneous system in the presence of GPU contention (i.e. multiple programs compete for the GPU) remains an outstanding problem. In this paper we consider the problem of partitioning OpenCL kernels on a CPU-GPU based system in the presence of contention on the GPU. We propose a machine learning-based approach that predicts the optimal partitioning of OpenCL kernels, explicitly taking GPU contention into account. Our predictive model achieves a speed-up of 1.92 over a scheme that always uses the GPU. When compared to two state-of-the-art dynamic approaches our model achieves speed-ups of 1.54 and 2.56 respectively.
The experiments were carried out on an Intel IvyBridge platform with a quad-core CPU and an integrated GPU running on Windows 7 and the Intel SDK for OpenCL Applications 2013.
22 different benchmarks from the Intel SDK and the AMD SDK to evaluate our approach. These benchmarks were chosen because they are not specifically tuned for GPUs but for use on both CPUs and GPUs, e. g. by using vector data types.