Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
This fork extends the framework to support Hybrid Multi-Agent Systems, where agents can use different model checkpoints (from the same family) while still communicating via latent representations. The ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results