← 一覧へ戻る
frontend 重要度 4/5 2026/5/12 4:00:00

arXivでAI評価・信頼性研究論文公開、「Where Reliability Lives in Vision-Language Models: A Mechanistic Study…」

arXivでAI評価・信頼性研究論文公開、「Where Reliability Lives in Vision-Language Models: A Mechanistic Study…」

arXiv に「Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits」が公開されました。研究段階の提案ですが、実装・評価・安全性の前提を見直す材料として注目できます。

arXiv:2605.08200v1 Announce Type: new Abstract: A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region should imply a confident, calibrated answer. We test this Attention-Confidence Assumption directly. We instrument three open-weight VLM families (LLaVA-1.5, PaliGemma, Qwen2-VL; 3-7B parameters) with a unified mechanistic pipeline -- the VLM Reliability Probe (VRP) -- that compares attention structure, generation dynamics, and hidden-state geometry against a single correctness label. Three results emerge. (i) Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]; R_pb(H_s,y)=-0.012, [-0.047,0.024] on a pooled n=3,090 split), even though attention remains causally necessary for feature extraction (top-30% patch masking drops accuracy by 8.2-11.3 pp, p 0.95 on POPE for two of three families, and self-consistency at K=10 is the strongest behavioral predictor we measure at 10x inference cost (R_pb=0.43). (iii) Causal neuron-level ablations expose a sharp architectural split with direct monitor-design implications: late-fusion LLaVA concentrates reliability in a fragile late bottleneck (-8.3 pp object-identification accuracy after top-5 probe-neuron ablation), whereas early-fusion PaliGemma and Qwen2-VL distribute it widely and absorb destruction of ~50% of their peak-layer hidden dimension with <=1 pp degradation. The takeaway is narrow but c…

Related tools

この記事に関連するおすすめツール

比較検討しやすい導入候補を優先して表示しています。一部リンクは広告・アフィリエイトを含む場合があります。

フェレット記者の用語メモ

arxiv

arxivは用語だけでなく、何を改善できる技術なのかを押さえると実務で活きるよ。

比較: baseline

research

researchは用語だけでなく、何を改善できる技術なのかを押さえると実務で活きるよ。

比較: baseline

出典: arXiv

要点を短く整理して掲載しています。詳細は出典を確認してください。

朝の要約メール待機リスト

毎朝7時に「今日の3本」をメールで受け取る(先行導入)。

関連記事