- shenzhen
Stars
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
OCR, layout analysis, reading order, table recognition in 90+ languages
A feature-rich command-line audio/video downloader
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
Background Matting: The World is Your Green Screen
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
CLIP⚡NCNN⚡基于自然语言的图片搜索(Image Search)⚡以字搜图⚡x86⚡Android
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A curated list of recent diffusion models for video generation, editing, and various other applications.
Fine-Tuning Dataset Auto-Generation for Graph Query Languages.
TuGraph: A High Performance Graph Database.
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
An embodiment robot run on iphone+macbook+Arduino+GPT API
The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.
Learning audio concepts from natural language supervision
ESC-50: Dataset for Environmental Sound Classification
Sample Repository for the AlibabaCloud Bailian Speech SDK
An infant cry audio corpus that's being built through the Donate-a-cry campaign - see http://donateacry.com
[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
A high-throughput and memory-efficient inference and serving engine for LLMs