Endor Labs Launches Agentic Code Security Benchmark and Public Leaderboard

April 15, 2026
Endor Labs has introduced the Agentic Code Security Benchmark and the Agent Security League leaderboard to evaluate the security and functionality of AI-generated code, extending the SusVibes framework from Carnegie Mellon University researchers.

Endor Labs announced in a press release the launch of its Agentic Code Security Benchmark, a new evaluation framework designed to measure how securely AI coding agents generate code in real-world conditions. The benchmark extends the SusVibes framework developed by researchers at Carnegie Mellon University and introduces the Agent Security League, a public leaderboard tracking agents' performance on both functional correctness and security.

The benchmark assesses 200 tasks from 108 open-source projects and covers 77 vulnerability classes defined in the Common Weakness Enumeration (CWE) system. Endor Labs added new test harnesses for agents such as Cursor, implemented anti-cheating safeguards, and automated detection systems to ensure fair evaluation. Results show a significant gap between code functionality and security: while the top-performing agent achieved 84.4% functional correctness, its security correctness reached only 17.3%.

The leaderboard compares various agent and model combinations, including OpenAI Codex with GPT 5.4 and Cursor with Claude Opus 4.6, highlighting that 87% of AI-generated code samples contained at least one security vulnerability. The system will be updated as new agents and models are released, providing developers and security teams with an ongoing reference for evaluating AI coding tools.

This initiative builds on Endor Labs' previous research and products aimed at improving software supply chain security, including its AURI security harness for integrating real-time security context into AI-assisted development workflows.

We hope you enjoyed this article.

Consider subscribing to one of our newsletters like Cybersecurity AI Weekly, AI Programming Weekly or Daily AI Brief.

Also, consider following us on social media:

Subscribe to AI Programming Weekly

Weekly news about AI tools for software engineers, AI enabled IDE's and much more.

Market report

AI’s Time-to-Market Quagmire: Why Enterprises Struggle to Scale AI Innovation

ModelOp

The 2025 AI Governance Benchmark Report by ModelOp provides insights from 100 senior AI and data leaders across various industries, highlighting the challenges enterprises face in scaling AI initiatives. The report emphasizes the importance of AI governance and automation in overcoming fragmented systems and inconsistent practices, showcasing how early adoption correlates with faster deployment and stronger ROI.

Read more