



## SYCL State of the Union Keynote SYCLCon 2021 Specification Release



Michael Wong
SYCL WG Chair
Codeplay Distinguished Engineer
ISOCPP Director & VP
ISO C++ Directions Group

michael@codeplay.com | wongmichael.com/about

#### SYCL 2020 is here!

#### Open Standard for Single Source C++ Parallel Heterogeneous Programming

SYCL 2020 is released after 3 years of intense work
Significant adoption in Embedded, Desktop and HPC markets
Improved programmability, smaller code size, faster performance
Based on C++17, backwards compatible with SYCL 1.2.1
Simplify porting of standard C++ applications to SYCL
Closer alignment and integration with ISO C++
Multiple Backend acceleration and API independent

SYCL 2020 increases expressiveness and simplicity for modern C++ heterogeneous programming



https://research-portal.uws.ac.uk/en/publications/trisycl-for-xilinx-fpga

### SYCL 2020 Industry Momentum



intel® oneAPI DPC++: Kernel and API interoperability with OpenCL® and SYCL®

# K H R O S O S

### SYCL 2020 Major Features



- Unified Shared Memory (USM)
  - Code with pointers can work naturally without buffers or accessors
  - Simplifies porting from most code (e.g. CUDA, C++)
- Parallel Reductions
  - Added built-in reduction operation to avoid boilerplate code and achieve maximum performance on hardware with built-in reduction operation acceleration.
- Work group and subgroup algorithms
  - Efficient parallel operations between work items
- Class template argument deduction (CTAD) and template deduction guides
  - Simplified class template instantiation
- Simplified use of Accessors with a built-in reduction operation
  - Reduces boilerplate code and streamlines the use of C++ software design patterns
- Expanded interoperability
  - Efficient acceleration by diverse backend acceleration APIs
- SYCL atomic operations are now more closely aligned to standard C++ atomics
  - Enhances parallel programming freedom

## Parallel Industry Initiatives



















**SYCL 1.2** C++11 Single source programming

SYCL 1.2.1 C++11 Single source programming

**SYCL 2020** C++17 Single source programming Many backend options

SYCL 202X C++20 Single source programming Many backend options







OpenCL 2.1 SPIR-V in Core





OpenCL 2.2



Open**CL** 

OpenCL 3.0







2011

2015

2017

2020

202X

## **SYCL Evolution**

#### SYCL 2020 compared with SYCL 1.2.1

Easier to integrate with C++17 (CTAD, Deduction Guides...)
Less verbose, smaller code size, simplify patterns
Backend independent

Multiple object archives aka modules simplify interoperability

Ease porting C++ applications to SYCL

Enable capabilities to improve programmability

Backwards compatible but minor API break based on user feedback

#### SYCLCon 2020 Talks and Events

- SYCL, DPC++, SPUs, oneAPI a View from Intel by James Reinders
  - oneAPI Developer Summit Monday Apr 26, Biagio Cosenza, Peter Zuzek, Steffen Larsen
- Hands on SYCL Tutorial Tuesday Apr 27 by Rod Burns and SYCL team
- Sylkan: Towards a Vulkan Compute Target Platform for SYCL by Peter Thorman
- Performance-Portable Distributed K-Nearest neighbours using Locality-Sensitive Hashing and SYCL by Marcel Breyer
- Toward Performance Portability of Highly Parametrizable TRSM Algorithm Using SYCL by Thales Sabino
- On Measuring the Maturity of SYCL implementations by Tracking Historical Performance improvements by Wei-Chen Lin
- Experiences Supporting DPC++ in AMRex by Sravani Konda
- Developing Medical Imaging Applications Across GPU, FPGA, and CPU using oneAPi
- hipSYCL in 2021: Peculiarities, Unique Features and SYCL 2020 by Aksel Alpay
  - Experiences with Adding SYCL Support to GROMACS by Andrewy Alekseenko
- Extending DPC++ with SUpport for Huawei Ascend Al Chipset
- Toward a Better SYCL Memory Consistency Model by Ben Ashb
- Bringing SYCL to A100 Ampere Architecture on Perlmutter Steffen Larsen and LBNL
- SYCL and OpenCL Meet Challenges of Functional Safety by illya Rudkin
- Enabling OpenCL and SYCL for RISC-V processors by Colin Davidson, Aidan Dodds





Over 40 Selected

Features for SYCL 2020

Unified Shared Memory)

Parallel Reductions adds a built in reduction

operation

Work-group and sub-group algorithms

Improvements to atomic operations

Class template argument deduction (CTAD) and deduction guides

Simplification of accessors

Expanded interoperability with different

backends

Extension mechanism Address spaces

**SYCL 2020** 

#### SYCL Future Roadmap (MAY CHANGE)

#### Improving Software Ecosystem

Books, Tutorials, Tool, libraries, GitHub

#### **Expanding Implementation**

DPC++ ComputeCpp triSYCL hipSYCL

#### neoSYCL Regular Maintenance Updates

Spec clarifications, formatting and bug fixes https://www.khronos.org/registry/SVC

#### Repeat The Cycle every 1.5-3 years



NEXT

**Conformance Tests** 

Working on Implementations

Future SYCL NEXT Proposals

Integration of successful Extensions plus new Core functionality

Converge SYCL with ISO
C++ and continue to
support OpenCL to
deploy on more devices
CPU

GPU FPGA Al processors Custom Processors

Vector rework
Specialization Constants
...

### **SYCL Implementations in Development**



#### SYCL user and developer Growth







#### SYCL Ecosystem, Research and Benchmarks

























**Working Group Members** eative Commons Attribution 4.0 International License

# K H RON OS

### SYCL in Embedded Systems, Automotive, and Al

Networks trained on high-end desktop and cloud systems

Applications link to compiled inferencing code or call vision/inferencing API

Diverse Embedded Hardware

Multi-core CPUs, GPUs

DSPs, FPGAs, Tensor Cores

\* Vulkan only runs on GPUs



#### **Safety Critical API Evolution**



















Industry Need for GPU Acceleration APIs designed to ease system safety certification is increasing ISO 26262 / ASIL-D





**UL 4600** 





ISO/IEC JTC 1/SC 42

## K H R O N O S

#### Embedded/Automotive/Al/Safety





"Xilinx is excited about the progress achieved with SYCL 2020," said Ralph Wittig, fellow, Xilinx.

"For Renesas, SYCL is a key enabler for automotive ADAS/AD software developers ....," said Cyril Cordoba, Director of ADAS Segment Marketing Department, Renesas.



"NSITEXE supports the SYCL 2020 technology, which is gaining attention in embedded applications," **said Hideki Sugimoto, CTO, NSITEXE, Inc.** "



"Imagination recognises the benefit of SYCL across multiple markets. Our software stacks have been designed to improve SYCL performance, enabling a straightforward path to exploit the teraflops of compute performance in our latest IP," said Mark Butler, Vice President of Software Engineering, Imagination Technologies.

SYCL support from embedded systems, through desktops to supercomputers

### **SYCL in HPC/Supercomputers**

#### Simulation

HPC Languages Solver Libraries, Parallel RT

I

#### Data

Productivity Languages
Big Data Stack, Stats Lib, Databases

#### Learning

Productivity Languages Deep Learning, Linear Alg, ML Three Pillars of Science Problem



Need Languages that allow control of these Data Issues

Set Data affinity, Data Layout, Data movement, Data Locality, highly Parameterized Code and dynamically compose the algorithms (C++ templates, parallel STL, inlining and fusion, abstractions)

Libraries augment compiler optimizations for Performance Portable programs

Use open standards to run Performance Portable code on new generation, or different vendor's, hardware with compiler optimization, explicit parametrization and dynamically composed algorithm

Today's Supercomputing **Development Workflow** needs knowledge of system architecture and tools that control data Choose Algorithm for target **Implement** and Test **Algorithm Optimize** Algorithm

Based on IWOCL/SYCLCon 2020 keynote Hal Finkel: https://www.iwocl.org/wpcontent/uploads/iwocl-syclcon-2020-finkel-keynote-slides.pdf

## K H R O N O S

### **Exascale computing**





"Our users will benefit from features in the SYCL 2020 specification. New features, such as support for unified memory (USM) and reductions, are important capabilities for programming high-performance-computing hardware. ..." said Nevin Liber, computer scientist, Argonne National Laboratory's Leadership Computing Facility

"At Cineca, based on our experience, we confirm the value that SYCL is bringing to the development of high-performance computing in a hybrid environment. ..." said Sanzio Bassini, director of supercomputing, Application Innovation Dept, Cineca.

SYCL support from embedded systems, through desktops to supercomputers

## 

## **HPC Computing**







**Group at the University of Innsbruck** 



"... we see modern C++ language-based approaches to accelerator programming, such as SYCL, as an important component of our programming environment offering for users of Perlmutter," said Brandon Cook, application performance specialist at NERSC.



SYCL support from embedded systems, through desktops to supercomputers

"The SYCL 2020 final specification brings significant features to the industry that enable C++ developers to more productively build high-performance heterogeneous applications with unified programming across XPU architectures," said Jeff McVeigh, Intel vice president, Datacenter XPU Products and Solutions.



#### What now?

Deep Dive into HPC future

#### When I was OpenMP CEO, I learned







SYCL is great for modern C++, AI, Automotive

Here are some opportunities for HPC growth across Europe, Asia

### What about Europe? EPI, ARM and RISC-V RVV



## SYCL as a universal programming model for HPC

Starting with US National Labs

Across Europe, Asia are many Petascale and pre-exascale systems

- With many variety of CPUs GPUs FPGAs, custom devices
- Often with interconnected usage agreements



#### **HPCAsia 2021: neoSYCL thanks to Hiroyuki Takizawa**

Open standard for offload programming = SYCL

BFS using Rodina Benchmark at HPC Asia 2021

No loss in performance between using SYCL and VEO

Programming with SYCL Leads to lower Code Complexity

VERSION NLOC AvgCCN Avg.token

|             |                                            |               |                                            |            |                                            | * *    |            |     | 0   | 0     |
|-------------|--------------------------------------------|---------------|--------------------------------------------|------------|--------------------------------------------|--------|------------|-----|-----|-------|
|             |                                            |               |                                            |            |                                            | STREAM | SYCL       | 148 | 1.2 | 96    |
|             | Kernel time Execution time                 |               | Execution time                             |            | ■ Execution time                           | SIKEAW | VE-Offload | 296 | 3.8 | 159.7 |
| 160000      |                                            |               | Kernel time (include data copy)  (SE) aEL  | 160000     | Kernel time (include data copy)            | N-body | SYCL       | 86  | 2.7 | 233.3 |
| (SE) 120000 |                                            |               |                                            |            |                                            |        | VE-Offload | 166 | 5   | 240.2 |
| Time        |                                            | 00008 Time (n |                                            | n) Time (n |                                            |        | Origin     | 66  | 3.3 | 173.7 |
| ecution     | 111                                        | ecution       |                                            | ecution    |                                            | ppe    | SYCL       | 133 | 4.5 | 248   |
| ¥ 40000     |                                            | ¥ 40000       |                                            | ₩ 40000    |                                            | BFS    | VE-Offload | 225 | 7.4 | 302   |
| 0           | 1k 4k 16k 64k 256k 1M 4M 16M<br>Graph size | 0             | 1k 4k 16k 64k 256k 1M 4M 16M<br>Graph size | 0          | 1k 4k 16k 64k 256k 1M 4M 16M<br>Graph size |        | Origin     | 116 | 4.5 | 196.2 |
|             |                                            |               |                                            |            |                                            |        |            |     |     |       |

Application



#### Final words

- SYCL can be a part of a standard programming model for all HPC including Europe/Asia/NA
  - HPC is now used in Embedded and Automotive
- SYCL is home grown EU, UK company led its development since 2012, now open standard with multiple company contributions, lots of European/Asia projects
  - Celerity from the University of Innsbruck and Salerno, CINECA Bologna, neoSYCL
- Moves with ISO C++, updates every 1.5-3 years
- Part of oneAPI
- Adapts to HPC hardware changes, moving towards safety critical
- Adapted by ECP for first Exascale computer in Aurora, now also in the Perlmutter, and we hope in European and Asia HPC



### **Enabling Industry Engagement**

- SYCL working group values industry feedback
  - https://community.khronos.org/c/sycl
  - https://sycl.tech
- SYCL FAQ
  - https://www.khronos.org/blog/sycl-2020-what-do-you-need-to-know
- What features would you like in future SYCL versions?

https://community.khronos.org/www.khr.io/slack https://app.slack.com/client/TDMDFS87M/CE9UX4CHG https://community.khronos.org/c/svcl/ https://stackoverflow.com/questions/tagged/svc https://www.reddit.com/r/sycl https://github.com/codeplaysoftware/syclacademy https://sycl.tech/

Open to all!

**Advisory Panel** Chaired by Tom Deakin of U of Bristol

- SYCL Advisory Panel meeting here at IWOCL/SYCLCon
- Regular meetings to give feedback on roadmap and draft specifications



Khronos SYCL Forums, Slack Channels, Stackoverflow, reddit, and SYCL.tech Khronos GitHub Contribute to SYCL open source specs, CTS, tools and ecosystem SYCL Advisory SYCL **Panels** Working Group

https://www.khronos.org/members/ https://www.khronos.org/registry/SYCL/