Tsinghua KEG Lab and Zhipu AI jointly launched CogAgent, a large image understanding model

2023-12-28 08:27:29

Bit News Tsinghua KEG Lab recently cooperated with Zhipu AI to jointly launch a new generation of image understanding large model CogAgent. Based on the previously launched CogVLM, the model uses visual modalities instead of text to provide a more comprehensive and direct perception of the GUI interface through a visual GUI agent for planning and decision-making. It is reported that CogAgent can accept 1120×1120 high-resolution image input, with visual question answering, visual positioning (Grounding), GUI Agent and other capabilities, in 9 classic image understanding lists (including VQAv2, STVQA, DocVQA, TextVQA, MM-VET, POPE, etc.) has achieved the first result in general ability.

VET6.26%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#Token of Love: Cheer on Square & Win Tickets
11k Popularity
#Crypto Market Rebound
204k Popularity
#FOMC July Minutes
29k Popularity
#Show My Alpha Points
175k Popularity
#Crypto-Related xStocks Rally
4k Popularity

Sitemap