AIoT-–-When-Internet-of-Things-Converges-with-Artificial-Intelligence-ITSW.jpg

Efficient LLM Finetuning

STADLE Value Proposition for LLMs

Outcomes:

8% - 14% reduction in time required to fine-tune LLMs, irrespective of model training frameworks used (NeMo, DeepSpeed)
90% retention of learnings from older datasets, when models are fine-tuned on newer datasets

How?

Before STADLE: Standard fine-tuning approaches work on the entire dataset as a singular learning “task”
After STADLE: STADLE instead works in parallel on multiple meaningful subsets of the entire dataset pertaining to “subtasks” of the learning task (e.g. data from a specific location)

STADLE + NeMo = Improved Training Efficiency

NeMo simplifies the deployment and management of distributed training tasks at scale, with support for many of the techniques used for efficient LLM pretraining and fine-tuning (3D-parallelism, flash attention; PEFT, MoE)

STADLE, on the other hand, modifies the model update algorithm and model synchronization methodology, with a focus on reducing interference and redundant learning across nodes

This allows for:

Data-efficient incremental learning
Modified sharding based on reducing single-node training subtask complexity
Reduction in necessary inter-node communication

Combining the higher-level optimizations from STADLE with lower-level optimizations and orchestration from NeMo allows for improved training efficiency without significant infrastructure modifications

Your LLM bemomes much more powerful with STADLE.

Efficient LLM Finetuning

STADLE Value Proposition for LLMs

STADLE + NeMo = Improved Training Efficiency

Our Solutions