Copyright 2004-2025, Lars Nerger, Alfred Wegener Institute, Helmholtz Center for Polar and Marine Research, Bremerhaven, Germany. For license information, please see ...
Run any Falcon Model at up to 16k context without losing sanity Current Falcon inference speed on consumer GPU: up to 54+ tokens/sec for 7B and 18-25 tokens/sec for 40B 3-6 bit, roughly 38/sec and ...