Original Paper: https://arxiv.org/abs/2303.02861

By: Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

Abstract:

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters.

Summary Notes

Streamlining AI Model Adaptation with Multitask Prompt Tuning

In the dynamic world of AI and machine learning, customizing pre-trained language models for specific tasks is a common practice. However, as these models grow larger, this process becomes more resource-intensive.

This challenge has led to the development of more efficient transfer learning strategies like Adapters, BitFit, and notably, Prompt Tuning (PT). Despite their benefits, these methods often face limitations in performance or flexibility across different tasks.

This is where Multitask Prompt Tuning (MPT) comes in, offering an innovative solution that enhances efficiency and applicability by utilizing shared knowledge between tasks.

Tackling the Parameter Efficiency Challenge

Traditionally, fine-tuning has been the preferred method for adapting large language models to specific tasks, requiring adjustments to a vast number of parameters.

To address this, parameter-efficient methods aim to modify fewer parameters without significantly impacting performance.

PT, for example, modifies a model using a small set of task-specific vectors. However, PT can be sensitive to how it's initially set up and generally performs worse than full fine-tuning.