Nvidia GPU cluster

Introduction

The Nvidia GPU cluster is currently comprised of 3 machines all of which have high performance Nvidia vidio cards installed. These machine are all running the Torque batch processing daemon. The queue for these machines is managed on acl-primary and can be access via the "cuda" queue". These machines get all NIS, sudo, and hosts information from acl-primary. These machines also mount their scratch space, home and projects direction's from acl-storage. These machines are all running the Nvidia CUDA SDK and video drivers.

NVIDIA CUDA™ technology is the only C language environment that unlocks the processing power of GPUs to solve the most complex computation-intensive challenges. NVIDIA’s CUDA development tools are consisted of three key components to help you get started:

  1. The latest CUDA driver
  2. A complete CUDA toolkit
  3. CUDA SDK code samples

CUDA Installation

  1. Download the driver, toolkit and sdk from http://www.nvidia.com/object/cuda_get.html.
  2. Ensure the following RPMs or components are installed on the machine. Installs can be done with sudo yum install -y.
    1. kernel-devel
    2. gcc
    3. gcc-c++
    4. libgcc
    5. glibc
    6. glibc-utils
    7. freeglut
    8. freeglut-devel
    9. glibc-devel
  3. run the downloaded cuda driver install program.
    1. Accept the License agreement
    2. If No precompiled kernel interfaces are found choose to compile your own. This had to be done for Fedora Core 8.
    3. Install the 32 bit compatibility OpenGL libraries.
    4. Run the nvidia-xconfig utility to automatically update your X configuration file.
  4. run the downloaded cuda toolkit install program.
    1. Keep the default install path of /usr/local/cuda.
  5. run the downloaded cuda SDK install program. Note this only has to be done on new installs. In the case of the CADI cluster the sdk is already built into /shared-space/CUDA/NVIDIA_CUDA_SDK
    1. Enter the intall Path
    2. Build the SDK project examples by typing make emu=1 dbg=1 at the root of the SDK install path.
  6. Insert the startup script attached to the page in to /etc/init.d/. This ensures that if a machine is only booting to run level 3 (non graphical) It loads the nvidia module and sets up the appropriate nodes in /dev as needed.
  7. Symlink the run levels so that OS will start the "cuda" drivere by running...
    1. sudo ln -s /etc/init.d/cuda /etc/rc.d/rc3.d/K15cuda
    2. sudo ln -s /etc/init.d/cuda /etc/rc.d/rc3.d/S99cuda

References

  1. http://developer.download.nvidia.com/compute/cuda/1_1/CUDA_SDK_release_n...
  2. http://www.nvidia.com/object/cuda_develop.html
  3. http://forums.nvidia.com/lofiversion/index.php?t52629.html
AttachmentSize
cuda.txt1.67 KB