Editorial

How to Manipulate Large Language Models, Pt. II

By

Nate Nelson, Contributing Writer

By

Access Point Consulting

When experts and laypeople talk about the threat of artificial intelligence (AI), almost always they refer to one of two phenomena:

  • The broad, overarching fear that general AI could disrupt society in such a way that would cause irreparable harm to our existing way of life.
  • The more pointed damage people can do with the help of generative AI-based applications, like spreading fake news and deepfakes, planning crimes, etc.

There is, however, an entire third category of threat that receives far less air time: the ways in which AI can be weaponized to accelerate existing cybersecurity threats, sometimes beyond what’s capable with traditional malware.

Large language models (LLMs) in particular are fast becoming the perfect vehicle for internet-based attacks. The more ubiquitous, embedded, and trusted they are, the more malicious damage they can wreak. In recent years, researchers at the cutting edge of cybersecurity have been projecting and modeling how LLMs might be manipulated to conduct unique, advanced cyberattacks. Sometimes, these tactics resemble existing techniques with an added twist. Sometimes, they signal threats stealthier and faster-spreading than anything we’ve seen to date.

Here are just some of the ways LLMs could be weaponized in years to come:

Manipulating the weights inside an LLM (MaleficNet)

In theory, attackers with skill and motive could hack an LLM itself. They could download an open-source model from the web, then tinker with it to cause some sort of undesirable outcome.

In practice, though, pulling this off could be tricky. How would they hide their imprint from whoever then receives and uses the model?

In March, a team of seven European researchers came up with a solution they called “MaleficNet 2.0.”

The key to MaleficNet came out of left field: code division multiple access (CDMA), a radio communications technology common in older 3G mobile phones, which allows multiple transmitters to communicate over a single channel.

Using CDMA, the researchers essentially dissolved a malware payload into its constituent 0s and 1s, and spread those bits evenly across the millions of individual weights that comprise an LLM. This made the malware all but undetectable since, to an observer, no bit or group of bits would resemble anything but noise, if even that. And yet, a simple activation command was all they needed to wake the program, bringing all of its pieces together to execute whatever malicious acts they wished it to carry out.

Poisoning the serialization process (Sleepy Pickle)

MaleficNet is sophisticated—realistic only for hackers with significant skill and motivation. By contrast, in June, a security engineer developed a far easier method to achieve the same end, without sacrificing much by way of stealth or impact. He called it “sleepy pickle.”

With this method, a hacker ignores the model entirely, instead focusing on how it’s stored and distributed. They package an otherwise legitimate and untainted LLM inside of a Pickle serialization file (.pkl), which stores Python objects as bytecode. Alongside that LLM in the Pickle file, they inject a malicious payload.

When a victim executes the file and triggers the deserialization process, the payload poisons the model it came with. It would be difficult to detect this using any kind of static analysis, and no trace of malware is left on the disk.

That malware, meanwhile, can be designed to do any number of things. It can manipulate the model’s parameters, or its code. It can be made to steal data, or manipulate the output of the LLM. In his blog post, the engineer demonstrated how a sleepy pickle-d LLM could be made to suggest bleach as a cure for the flu.

Indirect prompt injection

Now what if, unlike MaleficNet and Sleepy Pickle, you could get an LLM to do bad things without even touching it? Six researchers at last year’s Black Hat demonstrated how, by leveraging perhaps the most significant, least solvable security flaw in LLMs today.

All they did was prompt a local instance of the ChatGPT-integrated Bing search engine. The prompt triggered Bing to load an HTML file. To the naked eye, the file seemed harmless. But hidden inside of it—for example, in a font that was white against a white background, or so small as to be unreadable—was a prompt which instructed the AI to carry out malicious behavior.

Indirect prompt injection” works because LLMs like ChatGPT are trained on trillions of data points—far too many to be entirely labeled by humans. As a result, they don’t have a surefire mechanism for distinguishing instructions from data. Taking advantage of this fact is as easy as editing a Wikipedia page, an image, or a website which a chatbot might query, to include any kind of malicious instructions one can describe in a prompt.

Self-replicating prompt injection (Morris II)

Prompt injection can be scaled, too, almost without limit.

Earlier this year, a team of Israeli researchers developed what they called “Morris II,” after the infamous Morris worm which ripped through the early internet in the late 1980s. The name signaled just how dangerous they believed their creation to be. In practice it’s sophisticated, but the underlying premise is simple:

Where in indirect prompt injection an attacker hides a prompt in data with the aim of tricking an AI to produce a malicious output, with Morris II that output is, itself, a prompt for yet another AI. A “self-replicating adversarial prompt.”

Morris II, perhaps, best encapsulates what makes threats from LLMs so different than anything we’ve seen before. Not only can it carry just about any kind of cyber threat but, as AI becomes more and more integrated into everything we do, it will be able to spread faster and less perceptibly than all but the most notorious cyber worms of history. One person’s AI assistant could spread a self-replicating adversarial prompt to everyone else’s, limited only by the speed at which such information could travel, without any human being taking part in the process.

As with their impact on the rest of society, the scale of LLMs’ threat to cybersecurity may well exceed any other technology of recent decades. And it may be that the only way to fight back is to harness this same technology in our defense. It’s a new cat and mouse game, the cliche goes, where the consequences of losing will be greater than ever before.

Resources

Trending Articles & Security Reports

Resources

CyberWatch

November 22, 2024

Patch Updates, New Malware Threats, and the Ongoing Supply Chain Battle

On this episode of the CyberWatch podcast, there are updates to software across the application and OS spectrum. New malicious campaigns are threatening victims of all sizes, and researchers have performed dissections on malware to give defenders new clues about just what it is they're fighting. All this today, in CyberWatch.

Find out more
October 25, 2024

Ransomware, Supply Chain Attacks, and Nation-State Threats

CyberWatch, by Access Point Consulting, is your weekly source for emerging cybersecurity news, regulatory updates, and threat intelligence. Backed by experts in security consulting, regulatory compliance, and security operations, Access Point enables you to manage cyber risks, respond to incidents, and drive innovation in your company. Read here or on our website; listen on Spotify or Apple Podcasts; or watch on YouTube.website; listen on Spotify or Apple Podcasts; or watch on YouTube. .

Find out more
October 7, 2024

VINs and Losses: How Hackers Take Kias for a Ride

In the age of smart cars and connected devices, convenience often comes with hidden risks. A recently discovered critical vulnerability in Kia vehicles serves as a stark reminder of how our increasingly digital world is making cars new targets for cyberattacks. This vulnerability allowed hackers to remotely control various vehicle functions—using nothing more than a car's license plate number. It highlights the growing threat of cyberattacks on connected cars and the importance of cybersecurity in the automotive industry.

Find out more