security Priority 4/5 6/16/2026, 11:05:16 AM

SEVRA-BENCH Evaluates LLM Code Reviewer Susceptibility to Social Engineering Attacks

A new research paper published on arXiv introduces SEVRA-BENCH, a security benchmark designed to measure the resilience of large language model code reviewers against adversarial pull requests. As software development pipelines increasingly adopt LLM-based agents to review and approve pull requests, they face the risk of attackers using social engineering alongside malicious code. Standard benchmarks for static vulnerability detection do not capture this threat vector, where an adversary controls both the functional code changes and the persuasive PR description.

Related tools

Comparison

Aspect	Before / Alternative	After / This
Evaluation Focus	Static vulnerability detection and direct code generation benchmarks	Combined code and social engineering text manipulation in pull requests
Source Vulnerabilities	Synthetic or generic code templates	Real-world CVEs from the top 10 categories of the 2025 CWE Top 25
Reviewer Context	Analyzing code files in isolation	Analyzing code changes wrapped in 15 different social-engineering framings
Model Performance Gap	Assumed uniform security improvements across LLMs	Identified sharp security capability gaps between open-source and proprietary models

Action Checklist

Evaluate your current LLM reviewer setup against adversarial contexts Do not rely solely on the model's ability to spot bugs when the PR description is misleading
Avoid granting auto-merge privileges to LLM reviewers Ensure human review remains mandatory for all external and high-impact contributions
Deploy dedicated static analysis tools alongside LLMs Complement language models with traditional deterministic security scanners
Incorporate SEVRA-BENCH principles into internal LLM evaluations Test review pipelines against historical CVE rollbacks to measure detection rates

Source: arXiv

This page summarizes the original source. Check the source for full details.

More English news Open source

SEVRA-BENCH Evaluates LLM Code Reviewer Susceptibility to Social Engineering Attacks

Recommended tools for this topic

Comparison

Action Checklist

Related