Skyhawk Security Launches Comprehensive Generative AI Benchmark Ranking LLMs Based on Cyber Threat Scoring Capabilities

Free resource analyzes the performance of ChatGPT, Google BARD, Claude, LLAMA2-based open LLMs

TEL AVIV, Israel, Sept. 20, 2023 (GLOBE NEWSWIRE) — Skyhawk Security, the originator of cloud threat detection and response, today launched the industry’s first benchmark for evaluating large language models’ (LLMs) ability to identify and score cybersecurity threats within various cloud logs and telemetries. The resource also provides a ranking of these LLMs based on their performance. As part of efforts to strengthen the broader cloud security industry, the data will be regularly updated and available to view free of charge on Skyhawk’s website.

The benchmark and LLM leaderboard will be formally presented today during a session led by Skyhawk’s Director of AI and Research, Amir Shachar, at the Cloud Security Alliance’s SECtember conference. The session takes place at 1:30 p.m. Pacific in room 405.

“The importance of swiftly and effectively detecting cloud security threats cannot be overstated. We firmly believe that harnessing generative AI can greatly benefit security teams in that regard, however, not all large language models are created equal,” said Amir Shachar. “In creating this benchmark, we hope to increase confidence in the power of LLMs for cloud security by providing a clear view of how well these tools can classify malicious activities. We’re testing them for you on human-labeled attack flow sequences based on business-driven evaluation metrics. We also integrate human security researchers’ insights with self-improving LLM-based AI agents to enhance the classification process.”

In this benchmark, Skyhawk looks at ChatGPT, Google Bard, Falcon and other LLAMA2-based open LLMs. The goal was to see how accurately each of these LLMs predicted the maliciousness of an attack sequence that was extracted and created by Skyhawk Security’s machine learning models. The output from the models was compared to a sample of hundreds of human-labeled sequences and scored in three ways: Precision, Recall and F1 Score. The closer to “one” the scores are, the more accurate the predictability of the LLM.

The release of Skyhawk’s LLM benchmark reinforces the company’s dedication to innovating with generative AI in the cloud security space. The news comes on the heels of the launch of Skyhawk’s Shift Left CDR solution within its existing Skyhawk Synthesis Security Platform. The novel approach shifts the threat detection process to the “left,” or the perimeter, of the cloud network as well as IAM. Skyhawk’s cloud threat detection and response uses contextual analysis of the cloud infrastructure and determines potential paths hackers could take to a company’s “crown jewels.” This information enables security teams to identify serious threats much earlier in the incident and prioritize those that pose the highest risk to crown jewels to prevent them from becoming a breach.

To learn more about Skyhawk Security’s product offering, visit https://skyhawk.security/. For continuing updates follow Skyhawk Security on LinkedIn and Twitter.

About Skyhawk Security
Skyhawk Security is the originator of Cloud Threat Detection and Response (CDR), helping hundreds of users map and remediate sophisticated threats to cloud infrastructure in minutes. Led by a team of cybersecurity and cloud professionals who built the original CSPM category, Skyhawk Security evolves cloud security posture management far beyond scanning and static configuration analysis. Instead, using advanced generative AI and ML sequencing of context-based behaviors, Skyhawk provides CDR within a ‘Runtime Hub’ to quickly detect and remediate malicious activities across multiple cloud platforms as they happen. Skyhawk Security is a spin-off of Radware^® (NASDAQ:RDWR).

Media Contacts: 
Sherlyn Rijos-Altman
Montner Tech PR
srijos@montner.com

Skyhawk Security Launches Comprehensive Generative AI Benchmark Ranking LLMs Based on Cyber Threat Scoring Capabilities

Why the Most Qualified Candidates Often Perform Worst in Interviews — And How AI Is Changing That

0G Labs Publishes Verification Framework for Decentralized AI Training as Models Cross 100 Billion Parameters

LadeSofort Launches the Most Comprehensive Platform for Ad-Hoc EV Charging Stations in Germany and Europe

Estate Software Wins Prestigious AI Award

Scytale Expands SOX ITGC Compliance Capabilities Following AudITech Acquisition

Gench Education Announces 2025 Annual Results

You may have missed