THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型
CogVLM/CogAgent is a state-of-the-art open visual language model with impressive performance on various cross-modal benchmarks. CogVLM-17B and CogAgent-18B offer advanced capabilities in image understanding and multi-turn dialogue, with CogAgent focusing on GUI operations. The tools support deployment through web demos, CLI, and finetuning options. Hardware requirements and model checkpoints are provided for efficient usage. CogVLM/CogAgent's recent updates and features enhance user experience and model performance significantly.
![Cover Image for THUDM/CogVLM: a state-of-the-art-level open visual language model | 多模态预训练模型](https://firebasestorage.googleapis.com/v0/b/aiboom-sa4fnl.appspot.com/o/aiboom-tools-clipper%2Fhttps%253A%252F%252Fgithub.com%252FTHUDM%252FCogVLM.png?alt=media&token=91f71e11-9fb5-488f-b954-763ce5f0be46)