Introduction
The Nvidia GPU cluster is currently comprised of 3 machines all of which have high performance Nvidia vidio cards installed. These machine are all running the Torque batch processing daemon. The queue for these machines is managed on acl-primary and can be access via the "cuda" queue". These machines get all NIS, sudo, and hosts information from acl-primary. These machines also mount their scratch space, home and projects direction's from acl-storage. These machines are all running the Nvidia CUDA SDK and video drivers.
NVIDIA CUDA™ technology is the only C language environment that unlocks the processing power of GPUs to solve the most complex computation-intensive challenges. NVIDIA’s CUDA development tools are consisted of three key components to help you get started:
- The latest CUDA driver
- A complete CUDA toolkit
- CUDA SDK code samples
CUDA Installation
- Download the driver, toolkit and sdk from http://www.nvidia.com/object/cuda_get.html.
- Ensure the following RPMs or components are installed on the machine. Installs can be done with
sudo yum install -y
.
- kernel-devel
- gcc
- gcc-c++
- libgcc
- glibc
- glibc-utils
- freeglut
- freeglut-devel
- glibc-devel
- run the downloaded cuda driver install program.
- Accept the License agreement
- If No precompiled kernel interfaces are found choose to compile your own. This had to be done for Fedora Core 8.
- Install the 32 bit compatibility OpenGL libraries.
- Run the nvidia-xconfig utility to automatically update your X configuration file.
- run the downloaded cuda toolkit install program.
- Keep the default install path of
/usr/local/cuda
.
- run the downloaded cuda SDK install program. Note this only has to be done on new installs. In the case of the CADI cluster the sdk is already built into
/shared-space/CUDA/NVIDIA_CUDA_SDK
- Enter the intall Path
- Build the SDK project examples by typing
make emu=1 dbg=1
at the root of the SDK install path.
- Insert the startup script attached to the page in to /etc/init.d/. This ensures that if a machine is only booting to run level 3 (non graphical) It loads the nvidia module and sets up the appropriate nodes in /dev as needed.
- Symlink the run levels so that OS will start the "cuda" drivere by running...
sudo ln -s /etc/init.d/cuda /etc/rc.d/rc3.d/K15cuda
sudo ln -s /etc/init.d/cuda /etc/rc.d/rc3.d/S99cuda
References
- http://developer.download.nvidia.com/compute/cuda/1_1/CUDA_SDK_release_n...
- http://www.nvidia.com/object/cuda_develop.html
- http://forums.nvidia.com/lofiversion/index.php?t52629.html