🛠️Perception Accuracy (PA) and Hallucination Resistance (HR) scores are reported.
Models are ranked based on the average score of PA and HR. 🥇🥈🥉 indicate the top-3 models.
Visual-Audio-LLMs
# | Model | Date | Overall | Spurious Inter-modality Correlations | Overreliance on Unimodal Priors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Visual-Language | Audio-Language | Visual-Audio-Language | Visual Dominance | Audio Dominance | Language Dominance | |||||||||||
pa | hr | pa | hr | pa | hr | pa | hr | pa | hr | pa | hr | pa | hr | |||
1 | Gemini-1.5-flash🥇 | 2024-10-04 | 88.4 | 64.2 | 93.5 | 90.0 | 88.5 | 39.5 | 88.5 | 70.5 | 79.0 | 36.5 | 90.5 | 86.5 | 90.5 | 62.0 |
2 | Gemini-1.5-pro🥈 | 2024-10-04 | 87.1 | 58.3 | 91.0 | 90.5 | 94.0 | 14.5 | 86.0 | 67.0 | 82.5 | 34.0 | 90.5 | 82.0 | 78.5 | 61.5 |
3 | Reka-Core🥉 | 2024-10-04 | 63.7 | 80.9 | 87.0 | 94.5 | 25.0 | 76.0 | 76.7 | 85.1 | 35.6 | 69.4 | 80.8 | 82.7 | 75.0 | 76.0 |
4 | VideoLLaMA2 | 2024-10-04 | 71.7 | 81.1 | 75.0 | 86.0 | 77.5 | 94.0 | 78.0 | 98.0 | 62.0 | 75.5 | 80.0 | 90.0 | 57.5 | 43.0 |
5 | FAVOR | 2024-10-04 | 92.2 | 42.1 | 91.0 | 55.0 | 94.5 | 45.0 | 94.5 | 69.0 | 89.0 | 21.5 | 92.0 | 43.5 | 92.0 | 18.5 |
6 | GroundingGPT | 2024-10-04 | 96.6 | 14.3 | 95.5 | 36.5 | 100 | 0.0 | 97.5 | 18.0 | 99.5 | 1.0 | 98.5 | 23.5 | 88.5 | 7.0 |