Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Reasoning
/
Natural Language Visual Grounding
/
ScreenSpot
Natural Language Visual Grounding on ScreenSpot
Metric: Accuracy (%) (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
Accuracy (%)
▼
Extra Data
Paper
Date
↕
Code
1
UGround-V1-7B
86.34
No
Navigating the Digital World as Humans Do: Unive...
2024-10-07
Code
2
Aguvis-7B
83
No
Aguvis: Unified Pure Vision Agents for Autonomou...
2024-12-05
Code
3
OS-Atlas-Base-7B
82.47
No
OS-ATLAS: A Foundation Action Model for Generali...
2024-10-30
Code
4
Aria-UI
81.1
No
Aria-UI: Visual Grounding for GUI Instructions
2024-12-20
Code
5
Aguvis-G-7B
81
No
Aguvis: Unified Pure Vision Agents for Autonomou...
2024-12-05
Code
6
UGround-V1-2B
77.67
No
Navigating the Digital World as Humans Do: Unive...
2024-10-07
Code
7
ShowUI
75.1
No
ShowUI: One Vision-Language-Action Model for GUI...
2024-11-26
Code
8
ShowUI-G
75
No
ShowUI: One Vision-Language-Action Model for GUI...
2024-11-26
Code
9
UGround
73.3
No
Navigating the Digital World as Humans Do: Unive...
2024-10-07
Code
10
OmniParser
73
No
OmniParser for Pure Vision Based GUI Agent
2024-08-01
Code
11
OS-Atlas-Base-4B
68
No
OS-ATLAS: A Foundation Action Model for Generali...
2024-10-30
Code
12
SeeClick
53.4
No
SeeClick: Harnessing GUI Grounding for Advanced ...
2024-01-17
Code
13
CogAgent
47.4
No
CogAgent: A Visual Language Model for GUI Agents
2023-12-14
Code
14
Qwen2-VL-7B
42.1
No
Qwen2-VL: Enhancing Vision-Language Model's Perc...
2024-09-18
Code
15
Qwen-GUI
28.6
No
GUICourse: From General Vision Language Models t...
2024-06-17
Code
16
MiniGPT-v2
5.7
No
MiniGPT-v2: large language model as a unified in...
2023-10-14
Code
17
Groma
5.2
No
Groma: Localized Visual Tokenization for Groundi...
2024-04-19
Code
18
Qwen-VL
5.2
No
Qwen-VL: A Versatile Vision-Language Model for U...
2023-08-24
Code