|

How to Manipulate Large Language Models, Pt. II

By

Nate Nelson, Contributing Writer

By

Access Point Consulting

When experts and laypeople talk about the threat of artificial intelligence (AI), almost always they refer to one of two phenomena:

The broad, overarching fear that general AI could disrupt society in such a way that would cause irreparable harm to our existing way of life.
The more pointed damage people can do with the help of generative AI-based applications, like spreading fake news and deepfakes, planning crimes, etc.

There is, however, an entire third category of threat that receives far less air time: the ways in which AI can be weaponized to accelerate existing cybersecurity threats, sometimes beyond what’s capable with traditional malware.

Large language models (LLMs) in particular are fast becoming the perfect vehicle for internet-based attacks. The more ubiquitous, embedded, and trusted they are, the more malicious damage they can wreak. In recent years, researchers at the cutting edge of cybersecurity have been projecting and modeling how LLMs might be manipulated to conduct unique, advanced cyberattacks. Sometimes, these tactics resemble existing techniques with an added twist. Sometimes, they signal threats stealthier and faster-spreading than anything we’ve seen to date.

Here are just some of the ways LLMs could be weaponized in years to come:

Manipulating the weights inside an LLM (MaleficNet)

In theory, attackers with skill and motive could hack an LLM itself. They could download an open-source model from the web, then tinker with it to cause some sort of undesirable outcome.

In practice, though, pulling this off could be tricky. How would they hide their imprint from whoever then receives and uses the model?

In March, a team of seven European researchers came up with a solution they called “MaleficNet 2.0.”

The key to MaleficNet came out of left field: code division multiple access (CDMA), a radio communications technology common in older 3G mobile phones, which allows multiple transmitters to communicate over a single channel.

Using CDMA, the researchers essentially dissolved a malware payload into its constituent 0s and 1s, and spread those bits evenly across the millions of individual weights that comprise an LLM. This made the malware all but undetectable since, to an observer, no bit or group of bits would resemble anything but noise, if even that. And yet, a simple activation command was all they needed to wake the program, bringing all of its pieces together to execute whatever malicious acts they wished it to carry out.

Poisoning the serialization process (Sleepy Pickle)

MaleficNet is sophisticated—realistic only for hackers with significant skill and motivation. By contrast, in June, a security engineer developed a far easier method to achieve the same end, without sacrificing much by way of stealth or impact. He called it “sleepy pickle.”

With this method, a hacker ignores the model entirely, instead focusing on how it’s stored and distributed. They package an otherwise legitimate and untainted LLM inside of a Pickle serialization file (.pkl), which stores Python objects as bytecode. Alongside that LLM in the Pickle file, they inject a malicious payload.

When a victim executes the file and triggers the deserialization process, the payload poisons the model it came with. It would be difficult to detect this using any kind of static analysis, and no trace of malware is left on the disk.

That malware, meanwhile, can be designed to do any number of things. It can manipulate the model’s parameters, or its code. It can be made to steal data, or manipulate the output of the LLM. In his blog post, the engineer demonstrated how a sleepy pickle-d LLM could be made to suggest bleach as a cure for the flu.

Indirect prompt injection

Now what if, unlike MaleficNet and Sleepy Pickle, you could get an LLM to do bad things without even touching it? Six researchers at last year’s Black Hat demonstrated how, by leveraging perhaps the most significant, least solvable security flaw in LLMs today.

All they did was prompt a local instance of the ChatGPT-integrated Bing search engine. The prompt triggered Bing to load an HTML file. To the naked eye, the file seemed harmless. But hidden inside of it—for example, in a font that was white against a white background, or so small as to be unreadable—was a prompt which instructed the AI to carry out malicious behavior.

“Indirect prompt injection” works because LLMs like ChatGPT are trained on trillions of data points—far too many to be entirely labeled by humans. As a result, they don’t have a surefire mechanism for distinguishing instructions from data. Taking advantage of this fact is as easy as editing a Wikipedia page, an image, or a website which a chatbot might query, to include any kind of malicious instructions one can describe in a prompt.

Self-replicating prompt injection (Morris II)

Prompt injection can be scaled, too, almost without limit.

Earlier this year, a team of Israeli researchers developed what they called “Morris II,” after the infamous Morris worm which ripped through the early internet in the late 1980s. The name signaled just how dangerous they believed their creation to be. In practice it’s sophisticated, but the underlying premise is simple:

Where in indirect prompt injection an attacker hides a prompt in data with the aim of tricking an AI to produce a malicious output, with Morris II that output is, itself, a prompt for yet another AI. A “self-replicating adversarial prompt.”

Morris II, perhaps, best encapsulates what makes threats from LLMs so different than anything we’ve seen before. Not only can it carry just about any kind of cyber threat but, as AI becomes more and more integrated into everything we do, it will be able to spread faster and less perceptibly than all but the most notorious cyber worms of history. One person’s AI assistant could spread a self-replicating adversarial prompt to everyone else’s, limited only by the speed at which such information could travel, without any human being taking part in the process.

As with their impact on the rest of society, the scale of LLMs’ threat to cybersecurity may well exceed any other technology of recent decades. And it may be that the only way to fight back is to harness this same technology in our defense. It’s a new cat and mouse game, the cliche goes, where the consequences of losing will be greater than ever before.

‍

Resources

Latest Resources

April 10, 2025

Employing the Concept of “Continuity of Care” in Cybersecurity

My wife, Kelly, was a pediatric nurse, having worked in healthcare for over 30 years. I'm biased, but she always got high marks in her profession, from both her peers and from patients for whom she provided care. She provided a level of care that was absolutely critical to ensure patients receive consistent, high-quality treatment across all stages of care. The importance of documentation, communication and a continuity of care was imperative – children’s lives depended on it. But what does continuity of care look like outside the world of healthcare? In the realm of cybersecurity consulting, the principle of continuity is just as vital and plays a pivotal role in safeguarding organizations from evolving cyber threats.

Find out more

April 2, 2025

Scott "Monty" Montgomery (Island) | Navigating CMMC compliance for organizations of every size

Scott Montgomery, known as Monty, joined the CyberWatch Expert Series podcast to discuss his extensive background in cybersecurity, particularly in building and designing network security tools for high-assurance environments like the Department of Defense (DoD) and the intelligence community. His experience includes significant tenure at McAfee (now Trellix), which led him to his current role at Island, where he focuses on innovative approaches to cybersecurity compliance.

Find out more

February 24, 2025

Access Point Consulting Announces MSSP Partnership with Fortinet

Access Point Consulting is pleased to announce that it has become a Fortinet Managed Security Services Provider (MSSP) partner. This partnership places Access Point Consulting among a select group of providers in the Mid-Atlantic region that can offer Fortinet security solutions as both a Certified Fortinet Partner and a Fortinet MSSP.

Find out more

Resources

CyberWatch

April 2, 2025

Scott "Monty" Montgomery (Island) | Navigating CMMC compliance for organizations of every size

Scott Montgomery, known as Monty, joined the CyberWatch Expert Series podcast to discuss his extensive background in cybersecurity, particularly in building and designing network security tools for high-assurance environments like the Department of Defense (DoD) and the intelligence community. His experience includes significant tenure at McAfee (now Trellix), which led him to his current role at Island, where he focuses on innovative approaches to cybersecurity compliance.

Find out more

March 19, 2025

Michael Sviben (DomainGuard) | Defending against phishing and building proactive security awareness

Cybersecurity threats evolve rapidly, and one tactic consistently rises above the rest: phishing. In this episode of CyberWatch, Michael Sviben, co-founder of DomainGuard, discusses why phishing remains so effective, how businesses and individuals become targets, and what you can do to stay vigilant.

Find out more

March 5, 2025

David Habib (Brightspot) | Building a culture of cybersecurity awareness

Cybersecurity awareness is often reduced to check-the-box training, but David Habib, CIO at Brightspot, argues that real security awareness isn’t about formal programs—it’s about making security part of a company’s culture. In this episode, he shares practical insights on how organizations can move beyond stale training sessions to create an engaged and security-conscious workforce.

Find out more

Meet with an Expert

How to Manipulate Large Language Models, Pt. II

Manipulating the weights inside an LLM (MaleficNet)

Poisoning the serialization process (Sleepy Pickle)

Indirect prompt injection

Self-replicating prompt injection (Morris II)

Latest Resources

CyberWatch

Scott "Monty" Montgomery (Island) | Navigating CMMC compliance for organizations of every size

Michael Sviben (DomainGuard) | Defending against phishing and building proactive security awareness

David Habib (Brightspot) | Building a culture of cybersecurity awareness

Peace of mind starts here, at Access Point Consulting