This is a PyTorch implementation of the paper Attention to the Burstiness in Visual Prompt Tuning!.
@InProceedings{wang2025attention,
author = {Yuzhu Wang and Manni Duan and Shu Kong},
title = {Attention to the Burstiness in Visual Prompt Tuning!},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2025},
}-
The implementation mainly consists of three parts:
image data preprocessing,loading of pre-trained models, andtraining scripts. -
This repository includes some demos to help reproduce. See demo_BPT_eval.ipynb, demo_BPT_det.ipynb and demo_BPT_dis.ipynb for details.
-
This repository is based on PyTorch==1.10.0 and timm==0.6.0.
-
Dataset
-
See Table 6 in the appendix for dataset details.
-
Fine-Grained Visual Classification tasks (FGVC). The datasets can be downloaded following the official links.
-
The file folder
./Datasetimplements image loading and preprocessing.
-
-
Pre-trained Models
Take MAE pre-training as an example.
-
BPT-fwhiten or BPT-twhiten
-
These two methods are implemented in files
./Models/MAE_bpt_shallow.pyand./Models/MAE_bpt_deep.py; -
For BPT-fwhiten or BPT-twhiten, we need to set
whitening=Trueand prepare the pre-trained weights ofWq, Wkand patch embeddingsX. See Line 33-57 of./Models/MAE_bpt_shallow.pyfor detials;
-
-
BPT-bilinear, we set
whitening=False;
# for instance, prompt tuning on CUB-200 dataset via shallow variant.
torchrun --nproc_per_node=4 \
train_MAE.py \
--model_name MAE_bpt_vit_b \
--finetune ${MAE_Pretain_ckpt} \
--drop_path 0.0 \
--dataset CUB200 \
--tuning_type "prompt" \
--num_prompts 100 \
--channels 75 \
--epochs 100 \
--batch_size 32 \
--weight_decay 5e-3 \
--wd_head 0.5 \
--lr 5e-2 \
--min_lr 1e-8 \
--warmup_epochs 10 \
--model_ema \
--save_dir ${SAVE_PATH}-
num_promptsis 100 for shallow, and 50 for deep. -
For deep variant, set model_name as
MAE_bpt_deep_vit_band turn onprompt_deep. -
The training recipes for other tasks are similar, please refer to
./script/MAE.
# for instance, deep variant on dogs-120.
torchrun --nproc_per_node=4 \
train_MoCo.py \
--model_name bpt_deep_vit_b \
--drop_path 0.0 \
--dataset DOG120 \
--tuning_type "prompt" \
--num_prompts 50 \
--channels 50 \
--prompt_deep \
--epochs 100 \
--batch_size 32 \
--weight_decay 2e-3 \
--wd_head 0.1 \
--lr 5e-2 \
--min_lr 1e-8 \
--warmup_epochs 10 \
--model_ema \
--save_dir ${SAVE_PATH}-
The script is similar to MAE pre-training.
-
Please refer to
./script/MoCoand./script/supfor other task recipes.
-
We provide some test-accuracy curves for reference. (BPT-bilinear-shallow with MAE pre-trained ViT-B backbone.)
- CUB-200
- NABirds-555
- CAR-196
- COCO, Cascade Mask R-CNN




