Events
Dates: 05 February 2012 - 08 February 2012
Venue: University College Dublin Belfield, Dublin 4
Education and Research Center APPLIED PARALLEL COMPUTING
NVIDIA Advanced CUDA Programming Course Plan
- From GPU to GPGPU
- Performance and parallelism
- GPU evolution
- Parallel systems: multicore and clustering
- CUDA programming model
- Key principles
- Threads and blocks
- Language extensions
- Attributes
- Builtin types and variables
- Kernel invocation operator
- CUDA runtime API
- Asynchronous execution
- Handling runtime errors in CUDA
- Querying GPU capabilities
- Memory hierarchy
- Global memory
- Example: matrix multiplication
- Optimizing global memory usage
- Block-shared memory
- Example: matrix multiplication
- Shared memory access patterns
- Constant memory
- Texture memory
- Unified virtual address space (UVA)
- Global memory
- Implementing basic data processing
- Parallel reduction
- Prefix sum (scan)
- CUDA implementation
- CUDPP implementation
- CUDA Libraries
- CUBLAS
- CUSPARSE
- CUFFT
- CURAND
- CUDA Fortran Overiew
- Using multiple GPUs
- CUDA context
- fork
- MPI
- POSIX-threads
- OpenMP
- Boost.Threads
- CUDA Streams
- Example: concurrent kernels execution
- Example: matrix multiplication
- Example: Multi-GPU Async Copy
- Debugging
- Principles and terminology
- gdb
- cuda-gdb
- Nsight
- CUDA (Visual) Profiler
- cuda-memcheck
- OpenCL Overview
- Simple example
- OpenCL host API
- Developing and deploying OpenCL kernels
- Comparison with CUDA
- Optimization Techniques
Hands-ons
- Parallel sine function computation.
- Matrix-matrix multiply with shared memory.
For more information: http://cuda-course-eorg.eventbrite.com/


