Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0

Gradient-boosted decision trees (GBDTs) power everything from real-time fraud filters to petabyte-scale demand forecasts. XGBoost open source library has long been the tool of choice thanks to state-of-the-art accuracy, SHAP-ready explainability, and flexibility to run on laptops, multi-GPU nodes, or Spark clusters. XGBoost version 3.0 was developed with scalability as its north star. A single NVIDIA GH200 Grace Hopper Superchip can now process datasets from gigabyte scale all the way to 1 terabyte (TB) scale.

The coherent memory architecture allows the new external-memory engine to stream data over the 900 GB/s NVIDIA NVLink-C2C, so a 1 TB model can be trained in minutes—up to 8x faster than a 112-core (dual socket) CPU box. This reduces the need for complex multinode GPU clusters, and makes scalability simpler to achieve.

This post explains new features and enhancements in the milestone XGBoost 3.0 release, including a deep dive into external memory and how it leverages the Grace Hopper Superchip to reach 1 TB scale.

Blog

Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0

Leave a CommentCancel reply