Five years ago, pioneering large language models (LLMs) like GPT and BERT had hundreds of millions of parameters. Today, Megatron-Turing Natural Language Generation (MT-NLP) has 530 billion parameters ...