Abstract: Transformers have become the backbone of large language decoder and encoder models, but their compute- and memory-intensive nature makes them inefficient on traditional von Neumann ...