ProTrain: Efficient LLM Training via Memory-Aware Techniques

Yang, Hanmei; Zhou, Jin; Fu, Yao; Wang, Xiaoqun; Roane, Ramine; Guan, Hui; Liu, Tongping

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2406.08334 (cs)

[Submitted on 12 Jun 2024]

Title:ProTrain: Efficient LLM Training via Memory-Aware Techniques

Authors:Hanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu

View PDF HTML (experimental)

Abstract:It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
Cite as:	arXiv:2406.08334 [cs.DC]
	(or arXiv:2406.08334v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2406.08334

Submission history

From: Hanmei Yang [view email]
[v1] Wed, 12 Jun 2024 15:40:06 UTC (781 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:ProTrain: Efficient LLM Training via Memory-Aware Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:ProTrain: Efficient LLM Training via Memory-Aware Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators