Performance Engineering on CPUs and GPUs
Konu özeti
-
Skill Level: Intermediate
Language: English
Workload: 2 hours total
Topic: Performance Engineering on CPUs and GPUs
Overview: This lecture delves into performance engineering on CPUs and GPUs, focusing on optimizing computation and memory access patterns for high-performance computing applications.
Course Description: The lecture covers simple but key architectural concepts, including pipelining, memory hierarchies, and caches. On GPUs, the emphasis is placed coalesced memory access, shared memory usage, and addressing bank conflicts. By examining real-world examples such as matrix multiplication and dot product computations, the lecture provides practical insights into maximizing computational throughput and minimizing bottlenecks. This session equips participants with the foundational knowledge to design and implement efficient algorithms for modern multicore and manycore systems.
Course Contents:
Part 1: Introduction to CPUs
Part 2: Optimizing Memory Access Patterns for Performance
Part 3: Utilizing the Cache and Spatial-Temporal Locality
Part 4: Prefetching and Latency Mitigation
Part 5: Fundamentals of GPU Architecture and Programming
Part 6: Efficient Memory Accesses on GPUs
Part 7: Bank Conflicts and Shared Memory Performance on GPUs
Part 8: GPU Occupancy and Parallel Matrix Multiplication
Who Should Enroll: Anyone who thinks performance matters and want more for their codes.
Prerequisite: Experience with C++ and CUDA
Tools, libraries, frameworks used: g++, nvcc, perf
Learning Objectives: By participating in this course, you will learn:
simple things that can be useful to improve the performance on a CPU
simple things that can be useful to improve the performance on a GPU
About the instructor(s): Kamer Kaya is an Associate Professor at the Faculty of Engineering and Natural Sciences at Sabancı University. His research areas include high-performance computing, machine learning on sparse data, and graph algorithms.