• Contact us
      • Contact Us
      Have a question?
      Try speaking to one of our experts
      Contact us
      Information
      • Careers
      • Privacy Notice
      • Cookie Notice
      • Terms of Use
      • Office Locations
      Sign up for industry updates
      Stay up to date on Celent's latest features and releases.
      Sign up
      • Privacy Notice
      • Cookie Notice
      • Terms of Use
      BLOG
      OpenAI Is Warning Us
      New Models Will Become A Cybersecurity Nightmare
      12th December 2025
      //OpenAI Is Warning Us

      Our industry cybersecurity landscape is rapidly evolving and being dramatically shaped by advances in artificial-intelligence-driven hacking and defense tools, illustrated through a detailed case study of Stanford University’s recent experiment with an AI system named Artemis.

      Artemis is an AI-powered vulnerability-discovery and exploitation engine developed and tested by Stanford researchers. Modeled after techniques used by sophisticated threat actors—including China-linked hackers identified by Anthropic—Artemis autonomously scans networks, identifies vulnerabilities, and attempts to exploit them. The research team deployed Artemis against Stanford’s own School of Engineering network and benchmarked its performance against ten professional penetration testers. Contrary to the researchers’ expectations that the system would perform below average, Artemis outperformed nearly all human testers, identifying bugs rapidly and at dramatically lower cost—approximately $60 per hour compared to the typical human tester’s $2,000–$2,500 daily rate.

      However, the system was not flawless. Roughly 18% of Artemis’s reported bugs were false positives, and it completely missed a simple vulnerability that most human testers discovered. Even so, Artemis demonstrated unique advantages: it uncovered an obscure issue on an outdated webpage by accessing it with a tool that rendered the page differently than common browsers—specifically, by using Curl, a widely used software utility. This capability highlighted how AI systems can find classes of bugs that humans overlook due to assumptions about normal tool behavior.

      Stanford’s security leadership viewed the experiment as beneficial, emphasizing both the controlled environment—Artemis had an instant kill switch—and the value of identifying real gaps in a production academic network. Experts noted substantial long-term defensive benefits from using AI to examine vast quantities of untested code, but also warned of short-term risks: large amounts of existing software have never been audited by advanced LLMs and may contain vulnerabilities that AI systems can now uncover at scale. Artemis results are a reflection of a broader industry shift. Many security researchers—approximately 70% according to HackerOne—now use AI tools to accelerate bug discovery. At the same time, developers are encountering both low-quality and exceptionally high-quality AI-generated bug reports, signaling that AI has become deeply embedded in the software-security ecosystem. The narrative also discusses Anthropic’s findings that Chinese threat groups used generative models for offensive operations, and includes China’s response denying the allegations.

      OpenAI’s recent and related warning supplements this context. The company has declared that forthcoming model generations may pose “high” cybersecurity risks, potentially capable of discovering zero-day vulnerabilities or aiding complex intrusion operations. OpenAI outlines its investments in defensive tooling, enhanced access controls, stronger infrastructure protections, and the creation of a Frontier Risk Council to advise on cybersecurity and future high-risk AI capabilities. A here is the parallel threat vector: malicious use of AI chatbots for social-engineering attacks. By generating convincing phishing emails and mimicking individual writing styles, AI systems enable adversaries to craft more persuasive messages designed to steal personal or organizational data.

      For a broader view of the impact of Gen and Agentic AI in cybersecurity, please see my latest report "The AI Arms Race - Gen and Agentic AI in Cybersecurity".

      Author
      Keith Raymond
      Keith Raymond
      Principal Analyst
      Details
      Geographic Focus
      Asia-Pacific, EMEA, LATAM, North America
      Horizontal Topics
      Artificial Intelligence, Artificial Intelligence - Generative AI e.g. ChatGPT, Emerging Technologies, Risk: Cybersecurity, Identity and Trust
      Industry
      Life Insurance, Property & Casualty Insurance
      Mentioned Company
      OpenAI