Source code for ICLR 2025 Spotlight paper "VLMaterial: Procedural Material Generation with Large Vision-Language Models" by Li et al. ([arXiv], [OpenReview])
This repository comprises three parts:
data_scripts/
- Python scripts for training data pre-processing and augmentation;infinigen/
- A forked version of the Infinigen repo for building Blender with required Python libraries;llava_hf/
- Implementation of VLM training and inference.
Clone the repo in your workspace.
cd /path/to/workspace
git clone git@github.com:mit-gfx/VLMaterial.git
For Anaconda/Miniconda users, we recommend creating a Python virtual environment using the provided configuration file.
cd VLMaterial
conda env create -f environment.yml
conda activate vlmaterial
For other users, install the following packages in a Python 3.10 environment using pip
:
- Base libraries:
pip install numpy scipy Pillow pyyaml tqdm
- PyTorch 2.4.1 (refer to the official guide for other installation options):
pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu118
- Hugging Face related libraries:
pip install transformers==4.45.2 peft==0.13.2 accelerate==1.0.1 deepspeed==0.15.2
- OpenAI Python API:
pip install openai
- Utility libraries:
pip install lpips tensorboardX fake-bpy-module-3-3
For all users, install the Flash Attention library with pip
:
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.6.3+cu118torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl --no-build-isolation
rm flash_attn*.whl
Warning
You should download the pip wheel that is compatible with PyTorch's build configuration.
The example command assumes _GLIBCXX_USE_CXX11_ABI=0
, which applies to most PyTorch distributions on Conda and Pip. Switch to alternative wheels if you built PyTorch from source with _GLIBCXX_USE_CXX11_ABI=1
.
Although we barely use any code from Infinigen, it is still required to install a dedicated Blender distribution which includes additional packages in the internal Python environment.
Simply follow Infinigen's installation guide and ignore the git clone
step since we already have the repo locally.
As we will render procedural materials using Blender's OpenGL-based EEVEE renderer, we need to set up an X11 display server to provide OpenGL contexts, especially in a headless environment.
Note
The instructions below are intended for machines hosting Ubuntu 22.04 or later with NVIDIA GPUs. The same procedure should apply to alternative software and hardware but the exact steps might be different.
-
If your machine is connected to a desktop monitor, an X11 display is most likely already running. Check its status with:
glxinfo | head
The output should normally include the name of the display (e.g.,
:1
) and GPU vendor information. -
If you are accessing a remote server via SSH, make sure the GPU driver is properly installed by checking
nvidia-smi
.Create an X11 configuration file
/etc/X11/xorg.conf
usingnvidia-xconfig
:sudo nvidia-xconfig --enable-all-gpus
If needed, you can manually edit
xorg.conf
to ensure it properly includes all GPUs and the screens are correctly assigned.Then, manually start the X server. Export the
DISPLAY
variable so that OpenGL applications can connect to the X server:sudo X :0 & export DISPLAY=:0
Verify that OpenGL rendering is working with the following command. If everything is set up correctly, you should see your NVIDIA GPU listed as the OpenGL renderer.
glxinfo | grep "OpenGL renderer"
This section covers training and inference using our released dataset. Please skip to "Creating Your Own Dataset" for more information on how to set up a custom dataset for your task.
Important
We recommend fine-tuning the LLaVA-NeXT model described in our paper (based on LLaMA-3 8B) using GPUs with at least 48GB VRAM, e.g., RTX A6000, A100, or better.
Download the dataset archive from this link and place it in the repo folder. Then, decompress the dataset.
cd /path/to/workspace/VLMaterial
unzip -q material_dataset_filtered.zip
Verify that the dataset is in the right location using the following command, which should return the relative path as-is without raising an error.
ls material_dataset_filtered/infinigen/wood/blender_full.py
The pre-trained weights are available via this link. Download the ZIP archive and decompress its content into llava_hf/
.
unzip -q checkpoints_pretrained.zip -d llava_hf/
Pre-cache the VLM checkpoint into local storage before the first training run.
python -c "from transformers import AutoModelForVision2Seq as AM; AM.from_pretrained('llava-hf/llama3-llava-next-8b-hf')"
Edit the training script llava_hf/scripts/peft_sllm_p10.sh
:
- Update
ROOT_DIR
using the path to the cloned repo. - Specify the GPU IDs used for training at
DEVICE_IDS
and change--per_device_train_batch_size
and--per_device_eval_batch_size
based on available GPU VRAM. The default settings apply to 8x H100 80GB GPUs.
Run the edited script to start training.
cd llava_hf
bash scripts/peft_sllm_p10.sh
Model checkpoints are saved to checkpoints/llava-llama3-8b-sllm-p10
.
The inference process entails two steps:
- Generation - The model generates a series of candidate material programs for each test image from random sampling.
- Rendering - The verifier checks the correctness of the generated candidates and renders valid materials into images.
Edit the corresponding scripts, llava_hf/scripts/eval_sllm_p10.sh
and llava_hf/scripts/render_sllm_p10.sh
, respectively:
- Update
ROOT_DIR
using the path to the cloned repo. - If using the pre-trained weights, change
--model_path
to${WORK_DIR}/checkpoints_pretrained/${EXP_NAME}/checkpoint-epoch5
. - Set
--display_id
with the display number (e.g.,1
for the X11 display:1
). - Specify the number of GPUs with
--num_processes
and provide their IDs using--device_id
.
Execute the scripts in order.
bash scripts/eval_sllm_p10.sh # Generation
bash scripts/render_sllm_p10.sh # Rendering
The results are located in results/llava-llama3-8b-sllm-p10/eval-epoch5
, where each folder stores an input image, the generated materials, and their renderings.
Tip
We noticed that Blender renders much faster on conventional gaming and workstation GPUs (or data center GPUs with graphics-oriented architectures), such as RTX 3090, RTX A6000, and L4, than AI-focused GPUs like A100 and H100. Therefore, we recommend running Step 2 on dedicated rendering GPUs.
This tip also works for other rendering-heavy tasks like node parameter optimization and dataset creation, which will be introduced later.
You can run param_search.py
to further optimize the node parameters of generated materials using Markov Chain Monte Carlo (MCMC) sampling. The example command below runs node parameter optimization on 8 GPUs in parallel:
python param_search.py results/llava-llama3-8b-sllm-p10/eval-epoch5 --max_iter 200 --num_processes 8 --display_id 0 --device_id 0 1 2 3 4 5 6 7
where the --max_iter
argument controls the number of MCMC iterations in each optimization.
The optimized materials are saved to results_opt/llava-llama3-8b-sllm-p10/eval-epoch5
which has the same structure as the inference result.
We provide the necessary tools to build and augment a procedural material dataset for VLM fine-tuning in the data_scripts/
folder.
The process takes Blender procedural materials in the *.blend
format as input. Start by collecting materials over the Internet and organizing them into the following directory structure (the example folder names are placeholders):
/path/to/workspace/VLMaterial
└── material_dataset/ # Root folder of source materials
├── source_1/ # Each `source_i` folder contains materials downloaded from the same source
│ ├── material_1/ # `material_j` refers to the material name and must be unique
│ │ └── material.blend # Blender file containing the shader node graph
│ ├── material_2/
│ ├── ...
│ └── material_M/
├── source_2/
│ ...
└── source_N/
Tip
As mentioned previously, we recommend running the scripts in this section on dedicated rendering GPUs for optimal performance. For instance, we used 8x NVIDIA L4 24GB GPUs in our paper.
First, iterate over the dataset to identify node types, including custom node groups.
cd /path/to/workspace/VLMaterial
python data_scripts/analyze_all.py --data_root material_dataset --output_folder material_dataset_info
Make sure the OpenGL renderer is active by checking the output of glxinfo
. Filter materials that have excessively large node graphs or do not render into meaningful appearances.
glxinfo | grep "OpenGL renderer" # The output should show the GPU for rendering
python data_scripts/filter_all.py --data_root material_dataset --info_dir material_dataset_info --output_folder material_dataset_filtered
The materials that have passed filtering are stored in material_dataset_filtered/
with the same structure.
Our implementation invokes OpenAI's Python API to prompt a commercial LLM to write novel procedural material programs.
Edit data_scripts/gen_programs_llm_async.py
and fill in your API credentials at the instantiation of the AsyncOpenAI
client. An example command is shown below:
python data_scripts/gen_programs_llm_async.py -d material_dataset_filtered -i material_dataset_info -o material_dataset_filtered/llm -n 500 --num_workers_gen 8 --num_workers_verify 8 --display_id 0 --device_id 0 1 2 3 4 5 6 7
Explanations for command line options:
-n 500
: Randomly sample 500 pairs of programs. The LLM rewrites each program pair into several new candidate programs.--num_workers_gen 8
: Use 8 asynchronous workers to query the LLM.--num_workers_verify 8
: Use 8 parallel workers to validate the correctness of generated programs, including OpenGL rendering.--display_id
,--device_id
: Specify the X11 display name and the associated GPU IDs.
Other useful options:
--model_id
: Indicate which LLM to use. Defaults togpt-4o-mini
.--num_samples
,--max_samples
: The LLM generates up tomax_samples
programs for each randomly sampled program pair. The verifier picks the firstnum_samples
correct programs until exhausting the generated programs.
Next, add the validated programs into the dataset and follow the same directory structure as existing materials:
python data_scripts/copy_subset_llm.py -s material_dataset_filtered/llm -o material_dataset_filtered/mat_llm
Note
You can repeat the commands in this section to conduct multiple rounds of augmentation. Choose a different output folder in each round to avoid overwriting previous results.
The following command augments the dataset with parameter-space variations:
python data_scripts/sample_params_all.py --data_root material_dataset_filtered --info_dir material_dataset_info --output_folder material_dataset_filtered --num_processes 8 --display_id 0 --device_id 0 1 2 3 4 5 6 7
Useful command line options:
--num_processes
: Parallelize the generation and verification of material variations over multiple GPUs (designated by--device_id
).--num_samples
,--max_samples
: Randomly sample up tomax_samples
node parameter variations for each material. The firstnum_samples
valid variations are added to the dataset.
The hyperparameters of parameter sampling are currently hard-coded in sample_params.py
and can be tweaked as needed.
In addition, the augmented materials are placed alongside the original materials in their respective folders. The resulting directory structure is:
/path/to/workspace/VLMaterial
└── material_dataset_filtered/ # Material dataset folder
└── source_i/ # Just an example for simplicity
└── material_j/
├── analysis_result.json # Node type information and parameter values
├── blender_full.py # Transpiled Python program
├── transpiled_render.jpg # Material rendering
├── var_00000_full.py # Parameter variation 0 (Python code)
├── var_00000_render.jpg # Parameter variation 0 (rendering)
├── var_00001_full.py # Parameter variation 1 (Python code)
├── var_00001_render.jpg # Parameter variation 1 (rendering)
└── ...
At the last step, we divide the augmented dataset into training, validation, and test splits.
python data_scripts/package_data.py --data_root material_dataset_filtered --info_dir material_dataset_info --output_folder material_dataset_filtered/dataset_splits --add_llm --filter_llm --save_prefix llava_sllm_p10_
This command generates three JSON files, llava_sllm_p10_{train|val|test}.json
, in the material_dataset_filtered/dataset_splits
folder.
Useful command line options:
--split_ratios
: The fraction of data in each split. Defaults to0.9 0.05 0.05
(train/valid/test).--add_llm
: Incorporates LLM-generated materials into training data.--filter_llm
: Optionally removes LLM-generated materials that are too similar to validation/testing materials.
We would appreciate it if you cite our paper when using this codebase in your project.
@inproceedings{li2025vlmaterial,
title={{VLM}aterial: Procedural Material Generation with Large Vision-Language Models},
author={Beichen Li and Rundi Wu and Armando Solar-Lezama and Liang Shi and Changxi Zheng and Bernd Bickel and Wojciech Matusik},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=wHebuIb6IH}
}
Please contact beichen@mit.edu if you have any questions. This repository is provided as an archival reference for the paper. Contributions will not be actively reviewed or merged.
The code and pre-trained weights are released under the MIT License - see the LICENSE file for details.
The Blender procedural material dataset is released under the CC BY-NC 4.0 License for non-commerical purposes only.
This material is based on work that is partially funded by an unrestricted gift from Google.