Santa Clara University is committed to the Responsible Conduct of Research (RCR). All students, staff, and faculty who conduct research are strongly encouraged to complete the online RCR training ...
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, for LLMs beyond 100 billion parameters, ...