-
Tsinghua University
- Germany
Lists (27)
Sort Name ascending (A-Z)
3d
agent
avatar
avatar-motion
control
depth
diffusion
distributed dl
image generation
isaac gym
learning isaac gymlayout
llm
ocr
outpainting
physics
rl
reinforcement learningsegment
sr
video+3d
video edit
video generation
Video Stabilization
video understand
vlm
voice
vqa
world model
Starred repositories
Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers"
No fortress, purely open ground. OpenManus is Coming.
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Wan: Open and Advanced Large-Scale Video Generative Models
FantasyID: Face Knowledge Enhanced ID-Preserving Video Generation
[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
[arXiv'25] AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
SkyReels V1: The first and most advanced open-source human-centric video foundation model
Pippo: High-Resolution Multi-View Humans from a Single Image
Benchmarking physical understanding in generative video models
Investigating CoT Reasoning in Autoregressive Image Generation
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
[CVPR 2025] X-Dyna: Expressive Dynamic Human Image Animation