VILA is a family of high-performance vision language models developed by NVIDIA Research and MIT. The largest model comes with ~40B parameters and the smallest…
VILA is a family of high-performance vision language models developed by NVIDIA Research and MIT. The largest model comes with ~40B parameters and the smallest model comes with ~3B parameters. It is fully open source (including model checkpoints and even training code and training data). In this post, we describe how VILA performs against other models to deliver edge AI 2.0.