← 一覧へ戻る
frontend 重要度 4/5 2026/5/25 4:00:00

arXivでAI評価・信頼性研究論文公開、「BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems」

arXivでAI評価・信頼性研究論文公開、「BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems」

arXiv に「BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems」が公開されました。研究段階の提案ですが、実装・評価・安全性の前提を見直す材料として注目できます。

arXiv:2605.22866v1 Announce Type: new Abstract: Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), which decompose a coalition value function into per-component marginal contributions and require evaluation of the system on arbitrary component subsets. That requirement fails for third-party APIs, opaque endpoints, and agentic orchestrators that concentrate routing on a few tools, leaving most coalitions un-evaluable from the deployed orchestrator. We introduce BOHM, which extracts a hierarchical attribution tree directly from the routing weights such systems already maintain: leaf attribution is the path product of root-to-leaf routing weights; level-k attribution is the induced distribution over depth-k nodes. The method has zero marginal cost, requires no access to component internals, and provides multi-resolution attribution at every level simultaneously, which flat methods cannot offer at any evaluation budget. BOHM and SHAP answer different questions and converge when the deployed router routes near-optimally. On 18 LLMs in a 3-level hierarchy over 880 LiveCodeBench problems, BOHM yields Kendall tau=0.928; SHAP reaches tau=0.980 at 9,000x more coalition evaluations per seed. On a 5-driver, 7-benchmark agentic study (35 cells, complete coverage), drivers concentrate routing on a single tool (top-share median 0.65), and cell-level tau(BOHM,SHAP) is predicted by whether the driver's top pick is the empirically best tool (mean +0.22 vs ~+0.01). On a US Census hier…

Related tools

この記事に関連するおすすめツール

比較検討しやすい導入候補を優先して表示しています。一部リンクは広告・アフィリエイトを含む場合があります。

フェレット記者の用語メモ

arxiv

arxivは用語だけでなく、何を改善できる技術なのかを押さえると実務で活きるよ。

比較: baseline

research

researchは用語だけでなく、何を改善できる技術なのかを押さえると実務で活きるよ。

比較: baseline

出典: arXiv

要点を短く整理して掲載しています。詳細は出典を確認してください。

朝の要約メール待機リスト

毎朝7時に「今日の3本」をメールで受け取る(先行導入)。

関連記事