Source code and data for A Template Is All You Meme.
Contact person: Luke Bates
https://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
finetune_clip.py
-- fine-tuning CLIP with TSplit / Original splits / baseline (downsampling)clip_model.py
-- CLIP model for fine-tuningtsplit.py
-- Template-Aware Splittertlc.py
-- Template-Label Countermain.py
-- for running TLCmemetils.py
-- util codescraping/
-- scraping scripts
Install clip first.
Then, please use the requirements.txt
file.
Our data files are some chonky bois
Please see our scraping code for creating your own KYMKB!
Remember, sometimes memes are mean. We take no responsiblility if they are offensive nor do they reflect our views in any way.
To setup, please follow the instructions below.
git clone https://github.com/UKPLab/naacl2025-a-template-is-all-you-meme.git
cd a-template-is-all-you-meme
python -m venv mvenv
source mvenv/bin/activate
pip install --upgrade pip
#install clip here please
pip install -r requirements.txt
The finetune_clip.py
script fine-tunes OpenAI’s CLIP model on various datasets from the paper. You can specify multiple configurations using command-line arguments.
Run the script with:
python finetune_clip.py --dataset <dataset> --feature_extraction <encoder> --epochs <num_epochs> ...
You can finetune CLIP with python finetune_clip.py
. You can specifiy which configurations by passing arguments to python.
You can customize the fine-tuning process with the following arguments:
--dataset
- The dataset to use from the paper.
--feature_extraction
- Which encoder to use?
- Options:
ViT-L/14@336px
,ViT-B/32
,ViT-B/16
.
-
--data_root
- Location of dataset files.
- Required for:
figmemes
,mami
,multioff
. - Paths:
data/annotations
(forfigmemes
)data/MAMI_DATASET
(formami
)data/MultiOFF_DATASET
(formultioff
)
-
--split
- Dataset split strategy.
- Required for:
figmemes
,mami
,multioff
. - Options:
standard
,task5_style
(formami
),standard
(formultioff
).
-
--task
- Task specification (for
Memotion 3
andMAMI
). - Options:
1 = A
,2 = B
.
- Task specification (for
-
--reorganize
- How to split and downsample the dataset.
- Options:
original
(original splits)baseline
(random downsampling)max
,mean
,median
,quantile
(for TSplit variations).
-
--batch_size
(default:16
)- Batch size for fine-tuning.
-
--epochs
(default:20
)- Number of epochs for fine-tuning.
-
--seed
(default:0-4
)- Random seed for modeling/sampling.
-
--sample_train
,--random_downsample_tsplit
,--sample_tsplit
,--overfit
(default:False
)- Various options for downsampling or skipping model selection.
- Table References:
sample_train
→ Table 3random_downsample_tsplit
→ Table 9sample_tsplit
→ Table 9overfit
(Test eval on model trained forepochs
without selection) → Table 6
If all four of these arguments are False
, the script will TSplit the entire dataset (Table 4).
Results will be written to disk in a json file following this structure:
clip_results/{args.overfit}/{args.sample_train}/{args.random_downsample_tsplit}/{args.sample_tsplit}/{args.dataset}/{args.reorganize}/{args.feature}/{args.task}/{args.seed}/
You can run TLC by passing arguments to pythonwith:
python main.py
--template_path # Directory where the KYMKB is located
--dataset # Which dataset from the paper you want to play with
--data_root # Where the datafiles are located.
# Required for figmemes, mami, and multioff
# Paths:
# - data/annotations (for figmemes)
# - data/MAMI_DATASET (for mami)
# - data/MultiOFF_DATASET (for multioff)
--num_neigh # Number of neighbors to consider
--vote_type # Template vs label vote
--split # Only relevant for figmemes, mami, and multioff
# Values: standard, task5_style (for mami), standard (for multioff)
--include_examples # Template or templates + examples?
# True (template only) or False (template + examples)
--feature_extraction # Which encoder to use?
# Options: ViT-L/14@336px, ViT-B/32, ViT-B/16
--task # Only relevant for Memotion 3 and MAMI
# Options: 1 = A, 2 = B
--combine # How to model the modalities
# Options: None (template vs memes), concatenate, fusion,
# latefusion, fancy (normalize then average)
--just_text # Use just about vs OCR?
# True or False
--need_to_read # Use pre-written embeddings or not?
# True or False
Once finished, results will be printed out.
If our work was helpful for your work, please be so kind as to cite us:
@article{atiaym_2023,
url = {https://arxiv.org/abs/2311.06649},
author = {Luke Bates and Peter Ebert Christensen and Preslav Nakov and Iryna Gurevych},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
journal={arXiv preprint arXiv:2311.06649},
title = {A Template Is All You Meme},
publisher = {arXiv},
year = {2023},
}