Search Network



Newsletter Sign-Up

Events

print version send to a friend share on facebook share on linkedin
Dates: 05 February 2012 - 08 February 2012
Venue: University College Dublin Belfield, Dublin 4

Education and Research Center APPLIED PARALLEL COMPUTING

NVIDIA Advanced CUDA Programming Course Plan

  1. From GPU to GPGPU
    • Performance and parallelism
    • GPU evolution
    • Parallel systems: multicore and clustering
  2. CUDA programming model
    • Key principles
    • Threads and blocks
    • Language extensions
      • Attributes
      • Builtin types and variables
      • Kernel invocation operator
    • CUDA runtime API
      • Asynchronous execution
      • Handling runtime errors in CUDA
      • Querying GPU capabilities 
  3. Memory hierarchy
    • Global memory
      • Example: matrix multiplication
      • Optimizing global memory usage
    • Block-shared memory
      • Example: matrix multiplication
      • Shared memory access patterns
    • Constant memory
    • Texture memory
    • Unified virtual address space (UVA)
  4. Implementing basic data processing
    • Parallel reduction
    • Prefix sum (scan)
      • CUDA implementation
      • CUDPP implementation
  5. CUDA Libraries
    • CUBLAS
    • CUSPARSE
    • CUFFT
    •  CURAND
  6. CUDA Fortran Overiew
  7. Using multiple GPUs
    • CUDA context
    • fork
    • MPI
    •  POSIX-threads
    • OpenMP
    • Boost.Threads
  8. CUDA Streams
    • Example: concurrent kernels execution
    • Example: matrix multiplication
    • Example: Multi-GPU Async Copy
  9. Debugging
    • Principles and terminology
    • gdb
    • cuda-gdb
    • Nsight
    • CUDA (Visual) Profiler
    • cuda-memcheck
  10. OpenCL Overview
    • Simple example
    • OpenCL host API
    •  Developing and deploying OpenCL kernels
    • Comparison with CUDA
  11. Optimization Techniques

Hands-ons

  1. Parallel sine function computation.
  2.  Matrix-matrix multiply with shared memory.

For more information: http://cuda-course-eorg.eventbrite.com/