Original Paper: https://arxiv.org/abs/2405.18137

By: Kazuki EgashiraMark VeroRobin StaabJingxuan HeMartin Vechev

Abstract:

Quantization leverages lower-precision weights to reduce the memory usage of large language models (LLMs) and is a key technique for enabling their deployment on commodity hardware.

While LLM quantization's impact on utility has been extensively explored, this work for the first time studies its adverse effects from a security perspective.

We reveal that widely used quantization methods can be exploited to produce a harmful quantized LLM, even though the full-precision counterpart appears benign, potentially tricking users into deploying the malicious quantized model.

We demonstrate this threat using a three-staged attack framework:

(i) first, we obtain a malicious LLM through fine-tuning on an adversarial task

(ii) next, we quantize the malicious model and calculate constraints that characterize all full-precision models that map to the same quantized model

(iii) finally, using projected gradient descent, we tune out the poisoned behavior from the full-precision model while ensuring that its weights satisfy the constraints computed in step (ii).

This procedure results in an LLM that exhibits benign behavior in full precision but when quantized, it follows the adversarial behavior injected in step (i).

We experimentally demonstrate the feasibility and severity of such an attack across three diverse scenarios: vulnerable code generation, content injection, and over-refusal attack.

In practice, the adversary could host the resulting full-precision model on an LLM community hub such as Hugging Face, exposing millions of users to the threat of deploying its malicious quantized version on their devices.


Summary Notes

image.png

Figure: Our work highlights the potential threat posed by LLM quantization. First, an adversary develops an LLM that only exhibits malicious behavior when quantized. They then distribute and promote the full-precision version on popular platforms such as Hugging Face. Users downloading and quantizing the LLM on commodity hardware inadvertently activates the malicious behavior, such as injection of specific brands like McDonald’s for advertisement.

Introduction

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) have become indispensable tools, powering applications from chatbots to code generation.

However, the deployment of these models on commodity hardware often necessitates a process called quantization, which reduces the precision of the model weights to make them more memory-efficient.

While quantization is celebrated for its ability to maintain performance while reducing computational load, new research reveals a darker side to this technique.

This blog post delves into a groundbreaking study that explores the security vulnerabilities introduced by LLM quantization, highlighting the potential for malicious exploitation.

Key Methodologies

To understand the implications of LLM quantization, the researchers devised a comprehensive three-staged attack framework:

  1. Malicious Model Creation: An adversary starts with a malicious LLM, ensuring it exhibits harmful behavior.