Back to news
security Priority 4/5 6/16/2026, 11:05:16 AM

SEVRA-BENCH Evaluates LLM Code Reviewer Susceptibility to Social Engineering Attacks

SEVRA-BENCH Evaluates LLM Code Reviewer Susceptibility to Social Engineering Attacks

A new research paper published on arXiv introduces SEVRA-BENCH, a security benchmark designed to measure the resilience of large language model code reviewers against adversarial pull requests. As software development pipelines increasingly adopt LLM-based agents to review and approve pull requests, they face the risk of attackers using social engineering alongside malicious code. Standard benchmarks for static vulnerability detection do not capture this threat vector, where an adversary controls both the functional code changes and the persuasive PR description.

Related tools

Recommended tools for this topic

These picks prioritize high-intent tools relevant to this topic. Some links may include partner or affiliate tracking.

#arxiv#research#security#agent#data

Comparison

AspectBefore / AlternativeAfter / This
Evaluation FocusStatic vulnerability detection and direct code generation benchmarksCombined code and social engineering text manipulation in pull requests
Source VulnerabilitiesSynthetic or generic code templatesReal-world CVEs from the top 10 categories of the 2025 CWE Top 25
Reviewer ContextAnalyzing code files in isolationAnalyzing code changes wrapped in 15 different social-engineering framings
Model Performance GapAssumed uniform security improvements across LLMsIdentified sharp security capability gaps between open-source and proprietary models

Action Checklist

  1. Evaluate your current LLM reviewer setup against adversarial contexts Do not rely solely on the model's ability to spot bugs when the PR description is misleading
  2. Avoid granting auto-merge privileges to LLM reviewers Ensure human review remains mandatory for all external and high-impact contributions
  3. Deploy dedicated static analysis tools alongside LLMs Complement language models with traditional deterministic security scanners
  4. Incorporate SEVRA-BENCH principles into internal LLM evaluations Test review pipelines against historical CVE rollbacks to measure detection rates

Source: arXiv

This page summarizes the original source. Check the source for full details.

Related