Multi-GPU Training Lab

Overview

The Multi-GPU Training Lab is our most powerful training environment — two NVIDIA A100 GPUs with 40 GB of VRAM each, connected via NVLink for ultra-fast inter-GPU communication. This is the same class of hardware used by AI research labs to train state-of-the-art models. You'll use this lab for distributed training exercises: scaling a training job across multiple GPUs using PyTorch FSDP or DeepSpeed, pre-training small transformer language models, training large diffusion models, and running compute-intensive research experiments. These labs are reserved for advanced course exercises where single-GPU training would be impractical.

What You'll Do in This Lab

Pre-train transformer language models (GPT-2 scale, 124M+ params) from scratch
Scale training across 2 GPUs using PyTorch DDP and FSDP
Train diffusion models (DDPM) for image generation
Run mixed-precision training with BF16 on A100 hardware
Implement DeepSpeed ZeRO stages for memory-efficient training
Benchmark single-GPU vs multi-GPU throughput and scaling efficiency

Lab Workflow

1

Request

Multi-GPU labs are scheduled sessions. Book a time slot from your course dashboard. Slots are 2-4 hours depending on the exercise.

2

Launch

At your scheduled time, the A100 instance launches automatically. You'll receive a JupyterLab link with the multi-GPU environment ready.

3

Configure

Set up your distributed training configuration — number of GPUs, batch size per GPU, gradient accumulation steps, and communication backend (NCCL).

4

Train

Launch distributed training with torchrun or DeepSpeed launcher. Monitor both GPUs in real-time with TensorBoard and nvidia-smi.

5

Analyze

Compare scaling efficiency: measure throughput (samples/sec) on 1 GPU vs 2 GPUs. Identify communication bottlenecks.

6

Checkpoint

Save distributed model checkpoints to Cloud Storage. The lab auto-shuts down at the end of your time slot.

Hardware & Environment

Machine Type	a2-highgpu-2g (24 vCPU, 170 GB RAM)
GPU	2x NVIDIA A100 40 GB (Ampere architecture)
GPU Interconnect	NVLink 600 GB/s bidirectional
GPU Capabilities	FP64, FP32, TF32, BF16, FP16, INT8, Tensor Cores 3rd gen
Storage	500 GB SSD persistent disk
Session Length	2-4 hour scheduled slots

Frequently asked questions about this lab

What is the Multi-GPU Training Lab? +

High-performance multi-GPU environment for distributed training, large model experiments, and advanced deep learning research. Supports data parallelism, model parallelism, and mixed-precision training.

Which courses use this lab? +

This lab is included in: Computer Vision & Visual AI.

What hardware does this lab run on? +

GCP Compute Engine / Vertex AI Custom Training. Machine Type: a2-highgpu-2g (24 vCPU, 170 GB RAM); GPU: 2x NVIDIA A100 40 GB (Ampere architecture); GPU Interconnect: NVLink 600 GB/s bidirectional; GPU Capabilities: FP64, FP32, TF32, BF16, FP16, INT8, Tensor Cores 3rd gen.

What software comes pre-installed? +

Comes pre-loaded with PyTorch Distributed (FSDP / DDP), DeepSpeed, NVIDIA Apex, NCCL, TensorBoard. No local installs or dependency setup required — open your browser and start working.

Can I bring my own datasets and code into this lab? +

Yes. Datasets can be uploaded directly or synced from Google Cloud Storage. Notebooks and source files have built-in Git integration so you can push work to your own GitHub or GitLab repos.

Do I need to enroll in a course to use this lab? +

Yes. Lab environments are provisioned per-student as part of an AI Labs course enrollment. Browse the courses linked above to find programs that include this lab.

Related labs

Other AI Labs environments students typically use alongside this one.

Ready to Try This Lab?

Enroll in a course that uses this lab, or visit our Houston center for a hands-on demo.

Browse Courses View All Labs