THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型

CogVLM/CogAgent is a state-of-the-art open visual language model with impressive performance on various cross-modal benchmarks. CogVLM-17B and CogAgent-18B offer advanced capabilities in image understanding and multi-turn dialogue, with CogAgent focusing on GUI operations. The tools support deployment through web demos, CLI, and finetuning options. Hardware requirements and model checkpoints are provided for efficient usage. CogVLM/CogAgent's recent updates and features enhance user experience and model performance significantly.

Cover Image for THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型
;