Endor Labs Launches Agentic Code Security Benchmark and Public Leaderboard

April 15, 2026

Endor Labs has introduced the Agentic Code Security Benchmark and the Agent Security League leaderboard to evaluate the security and functionality of AI-generated code, extending the SusVibes framework from Carnegie Mellon University researchers.

Endor Labs announced in a press release the launch of its Agentic Code Security Benchmark, a new evaluation framework designed to measure how securely AI coding agents generate code in real-world conditions. The benchmark extends the SusVibes framework developed by researchers at Carnegie Mellon University and introduces the Agent Security League, a public leaderboard tracking agents' performance on both functional correctness and security.

The benchmark assesses 200 tasks from 108 open-source projects and covers 77 vulnerability classes defined in the Common Weakness Enumeration (CWE) system. Endor Labs added new test harnesses for agents such as Cursor, implemented anti-cheating safeguards, and automated detection systems to ensure fair evaluation. Results show a significant gap between code functionality and security: while the top-performing agent achieved 84.4% functional correctness, its security correctness reached only 17.3%.

The leaderboard compares various agent and model combinations, including OpenAI Codex with GPT 5.4 and Cursor with Claude Opus 4.6, highlighting that 87% of AI-generated code samples contained at least one security vulnerability. The system will be updated as new agents and models are released, providing developers and security teams with an ongoing reference for evaluating AI coding tools.

This initiative builds on Endor Labs' previous research and products aimed at improving software supply chain security, including its AURI security harness for integrating real-time security context into AI-assisted development workflows.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Cybersecurity AI Weekly, AI Programming Weekly or Daily AI Brief.

Also, consider following us on social media:

Cybersecurity AI AI Brief AI Brief (X)

More from: Cybersecurity

04/15

Thoma Bravo and Google Cloud Form AI Partnership for Enterprise Software

04/15

OpenAI Introduces GPT-5.4-Cyber for Defensive Cybersecurity

04/14

AFP Survey: 76% of U.S. Firms Faced Payments Fraud in 2025, AI Use Remains Low

04/14

Hadrius Introduces AI-Native Compliance Infrastructure

04/14

Unisys and Worldsys Partner on Core Banking AML Platform

Subscribe to AI Programming Weekly

Weekly news about AI tools for software engineers, AI enabled IDE's and much more.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Categories

Companies

Resources

Endor Labs Launches Agentic Code Security Benchmark and Public Leaderboard

We hope you enjoyed this article.

More from: Cybersecurity

Subscribe to AI Programming Weekly

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

You May Also Like

CodeSignal Launches Agentic Coding Assessments for AI-Era Hiring

ESET Previews AI Security Features for Chatbot and Agent Protection

Orca Security Report Finds AI Credential Leaks in 42% of Organizations

C3 AI Introduces C3 Code for Automated Enterprise AI Application Development

OutSystems Report Finds 96% of Enterprises Already Using AI Agents

OpenAI Expands Agents SDK With Sandboxing and Harness Features

Netwrix Expands 1Secure Platform to Monitor AI Agent Data Access

1Password Launches Unified Access Platform for AI Agent Security

Anthropic Launches Project Glasswing to Secure Critical Software with AI

OpenAI Publishes Child Safety Blueprint to Combat AI-Enabled Exploitation

SpartanX Launches Autonomous AI Red Teaming Platform

EverMind Launches EverOS Public Beta with Self-Evolving Memory for AI Agents