Llama3 AI News List | Blockchain.News
AI News List

List of AI News about Llama3

Time Details
2026-02-03
21:49
Latest Analysis: FP8 Training Enables 4.3% Speedup for GPT-2 Model on H100 GPUs, Cost Drops to $20

According to Andrej Karpathy on Twitter, enabling FP8 precision training for GPT-2 using H100 GPUs has resulted in a 4.3% improvement in training time, reducing it to just 2.91 hours. Karpathy highlights that with 8xH100 spot instance pricing, the total cost to reproduce the GPT-2 model now stands at approximately $20. This marks a dramatic cost reduction compared to OpenAI's original $43,000 GPT-2 training seven years ago. As reported by Karpathy, further optimization using techniques such as Flash Attention 3 kernels, the Muon optimizer, and advanced attention patterns have contributed to these gains. While FP8 offers theoretical FLOPS advantages, Karpathy notes practical challenges including overhead from scale conversions and limited support, especially at the GPT-2 model scale. Nonetheless, the industry shift to FP8 hints at broader opportunities for cost-effective LLM training, as evidenced by torchao's reported 25% speedup on larger models like Llama3-8B. According to Karpathy, continued improvements in FP8 application and model training strategies can reduce both time and financial barriers for LLM development, opening further business and research opportunities.

Source