GitHub - Wellesley-EASEL-lab/GlyphPattern: code and data for GlyphPattern

Dataset Construction

The images are under dataset_construction/dataset_images The code for generating the images is in dataset_construction/CreateImage.py Note: You might need to install a variety of fonts in the system in order to run the code.
Run generate_qa.ipynb to generate the full dataset and save them to a local dataset.

Running Experiments

Our code for running experiments rely on prl_ml(https://github.com/nuprl/prl_ml), which supports resumption.

Keep batch-size as 1 for all experiments, as batching is not possible for those models.

To run the experiments, first clone the prl_ml repository. Then drag all the files in run_model directory into prl_ml/batched_lm_generation directory. cd into the repository to run the generations with the commands below.

Multiple Choice

For each model, run the code file named by the model name. argument --fewshot-prompt: choose from '1shot', '3shot', '5shot', None(leave empty for zero-shot)

GPT-4o eg. Run GPT-4o using gpt4o_vision.py few-shot with three examples.

python3 -m prl_ml.batched_lm_generation.gpt4o_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir gpt4o-4choice-3shot \
    --model-name gpt-4o \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 10 \
    --prompt-keys prompt,images \
    --fewshot-prompt 3shot \
    --extra-columns file_name,answer

eg. cot: bash python3 -m prl_ml.batched_lm_generation.gpt4o_vision_cot \ --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \ --output-dir gpt4o-4choice-fewshot-cot \ --model-name gpt-4o-2024-05-13 \ --temperature 0 \ --batch-size 1 \ --completion-limit 1 \ --max-tokens 10 \ --prompt-keys prompt,images \ --fewshot-prompt 3shot \ --extra-columns file_name,answer

Gemini 1.5 eg. Run Gemini1.5Pro using geminipro_vision.py few-shot with five examples.

python3 -m prl_ml.batched_lm_generation.geminipro_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir geminipro-4choice-5shot \
    --model-name gemini-1.5-pro \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 10 \
    --prompt-keys prompt,images \
    --fewshot-prompt 5shot \
    --extra-columns file_name,answer

For Idefics models and other chat models that support AutoModelForVision2Seq Class, run the idefics_vision code file with output-dir and model-name replaced. 3. Idefics2 eg.Run Idefics2 from HuggingFaceM4/idefics2-8b few shot with one example:

```bash
python3 -m prl_ml.batched_lm_generation.idefics_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir Idefics2-4choice-zeroshot \
    --model-name HuggingFaceM4/idefics2-8b \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --prompt-keys prompt,images \
    --fewshot-prompt 1shot \
    --extra-columns file_name,answer
```

Idefics3 eg.Run Idefics3 from HuggingFaceM4/Idefics3-8B-Llama3 zero shot:

python3 -m prl_ml.batched_lm_generation.idefics_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir Idefics2-4choice-zeroshot \
    --model-name HuggingFaceM4/idefics2-8b \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --prompt-keys prompt,images \
    --extra-columns file_name,answer

LlavaNext eg. Run LlavaNext from llava-hf/llava-v1.6-mistral-7b-hf Zero-shot

python3 -m prl_ml.batched_lm_generation.llavanext_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir llavanext-4choice-zeroshot \
    --model-name llava-hf/llava-v1.6-mistral-7b-hf \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --prompt-keys prompt,images \
    --extra-columns file_name,answer

InstructBlip Run InstructBlip from Salesforce/instructblip-vicuna-7b, only zero-shot is supported.

python3 -m prl_ml.batched_lm_generation.instructblip_vision \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir instructblip-4choice-zeroshot \
    --model-name Salesforce/instructblip-vicuna-7b \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --prompt-keys prompt,images \
    --extra-columns file_name,answer

Few-shot Free Response

We support few-shot free response for the best two performing models, gpt-4o and gemini 1.5. Run code files with suffix freeresponse, and use dataset-split gt_descriptions.

GPT-4o

python3 -m prl_ml.batched_lm_generation.gpt4o_freeresponse \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/gt_descriptions" \
    --output-dir gpt4o-free \
    --model-name gpt-4o \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 500 \
    --prompt-keys prompt,images \
    --extra-columns file_name,answer

Gemini1.5Pro

python3 -m prl_ml.batched_lm_generation.geminipro_freeresponse \
    --dataset "disk:path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/gt_descriptions" \
    --output-dir geminipro-free-leftright \
    --model-name gemini-1.5-pro \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 500 \
    --prompt-keys prompt,images \
    --extra-columns file_name,answer

Few-shot Chain-Of-Thought

```
python3 -m prl_ml.batched_lm_generation.gpt4o_vision_cot \
    --dataset "path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir gpt4o-4choice-fewshot-cot \
    --model-name gpt-4o-2024-05-13 \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 1024 \
    --prompt-keys prompt,images \
    --fewshot-prompt 3shot \
    --extra-columns file_name,answer
```
```
python3 -m prl_ml.batched_lm_generation.geminipro_vision_cot \
    --dataset "path_to_GlyphPattern_repo/dataset_construction/GlyphPattern/four_choice" \
    --output-dir geminipro-4choice-fewshot-cot \
    --model-name gemini-1.5-pro-001 \
    --temperature 0 \
    --batch-size 1 \
    --completion-limit 1 \
    --max-tokens 1024 \
    --prompt-keys prompt,images \
    --fewshot-prompt 3shot \
    --extra-columns file_name,answer
```

Process Model Results:

The model results csv files are in model_results directory.

To process the generated results from running experiment above, run the check_answers.py in process_result directory. Replace the result_dir with the model output directory created by running batched_lm_generation.

For example, to check the outputs of running llavanext zero shot multiple choice: bash python process_results/check_answers.py \ --result_dir llavanext-4choice-zeroshot \ --model_name llavanext This produces a csv file at where the result directory was.

To calculate accuracy base on visual output, run outputAccuracy.py with the file_path replaced.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
analysis		analysis
dataset_construction		dataset_construction
model_results		model_results
process_results		process_results
run_model		run_model
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
accuracy_results.csv		accuracy_results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Construction

Running Experiments

Multiple Choice

Few-shot Free Response

Few-shot Chain-Of-Thought

Process Model Results:

About

Releases

Packages

Contributors 2

Languages

License

Wellesley-EASEL-lab/GlyphPattern

Folders and files

Latest commit

History

Repository files navigation

Dataset Construction

Running Experiments

Multiple Choice

Few-shot Free Response

Few-shot Chain-Of-Thought

Process Model Results:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages