Blog

Insights from the AnalogyAI team

April 14, 2026·Research

How VLMs Failed on Exhaustive Visual Read-out

We introduce Grid2Matrix (G2M), a diagnostic benchmark that exposes a critical gap in Vision-Language Models: the inability to faithfully read out fine-grained visual structure, a failure we term Digital Agnosia.

March 9, 2026·Research

Beyond Logical Reasoning: How Far Are Frontier Models from a Professional Artist?

We introduce VULCA-BENCH, a multicultural art-critique benchmark revealing that frontier VLMs suffer a 31–40 percentage-point drop from surface visual perception to deep cultural interpretation.

March 1, 2026·Research

CFE-Bench: Can AI Pass a Real University Exam?

We introduce CFE-Bench, a multimodal STEM reasoning benchmark built from authentic university exams with instructor-verified solutions. Frontier models top out around 60%.