GitHub

Updated on 2025.01.06

Table of Contents

6D Pose
Point Cloud Registration
Point Cloud Segmentation
Zero-shot

6D Pose

Publish Date	Title	Authors	PDF	Code
2025-01-03	Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions	Xincheng Shuai et.al.	2501.01425v2	null
2025-01-02	On Unifying Video Generation and Camera Pose Estimation	Chun-Hao Paul Huang et.al.	2501.01409v1	null
2025-01-02	L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild	Soumyaratna Debnath et.al.	2501.01174v1	null
2024-12-31	Relative Pose Observability Analysis Using Dual Quaternions	Nicholas B. Andrews et.al.	2501.00657v1	null
2024-12-31	VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception	Zhaoliang Wan et.al.	2501.00510v1	null
2024-12-30	Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields	Evgenii Kruzhkov et.al.	2412.20976v1	null
2024-12-30	ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning	Hrishikesh Gupta et.al.	2412.20830v1	link
2024-12-30	Frequency-aware Event Cloud Network	Hongwei Ren et.al.	2412.20803v1	null
2024-12-30	KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences	Keng-Wei Chang et.al.	2412.20767v1	null
2024-12-30	Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study	Boris Bačić et.al.	2412.20733v1	null
2024-12-29	Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation	Qucheng Peng et.al.	2412.20538v1	link
2024-12-28	MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing	Shuo Wang et.al.	2412.20082v1	null
2024-12-28	GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting	Atticus J. Zeller et.al.	2412.20056v1	link
2024-12-27	Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation	Guangsheng Xu et.al.	2412.19676v1	link
2024-12-27	Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images	Xudong Cai et.al.	2412.19518v1	null
2024-12-26	Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos	Changwoon Choi et.al.	2412.19089v1	null
2024-12-23	Reconstructing People, Places, and Cameras	Lea Müller et.al.	2412.17806v1	null
2024-12-22	Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry	Zhaoxing Zhang et.al.	2412.16923v1	null
2024-12-21	EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose	Yung-Hong Sun et.al.	2412.16742v1	null
2024-12-21	FACTS: Fine-Grained Action Classification for Tactical Sports	Christopher Lai et.al.	2412.16454v1	null
2024-12-20	Can Generative Video Models Help Pose Estimation?	Ruojin Cai et.al.	2412.16155v1	null
2024-12-20	Monkey Transfer Learning Can Improve Human Pose Estimation	Bradley Scott et.al.	2412.15966v1	null
2024-12-19	Scaling 4D Representations	João Carreira et.al.	2412.15212v1	null
2024-12-13	IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education	Roberto Daza et.al.	2412.14195v1	link
2024-12-18	Level-Set Parameters: Novel Representation for 3D Shape Analysis	Huan Lei et.al.	2412.13502v1	null
2024-12-18	Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation	Xiaoqi An et.al.	2412.13454v1	null
2024-12-17	CondiMen: Conditional Multi-Person Mesh Recovery	Brégier Romain et.al.	2412.13058v1	null
2024-12-17	ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries	Wangyu Xue et.al.	2412.12675v1	null
2024-12-16	Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion	Adam Bethell et.al.	2412.11420v1	null
2024-12-13	ExeChecker: Where Did I Go Wrong?	Yiwen Gu et.al.	2412.10573v1	null
2024-12-11	CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty	Harry Zhang et.al.	2412.10431v1	null
2024-12-13	RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting	Lizhi Bai et.al.	2412.09868v1	null
2024-12-12	Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos	Linyi Jin et.al.	2412.09621v1	null
2024-12-12	FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction	Jiale Xu et.al.	2412.09573v1	null
2024-12-11	BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation	Shengze Wang et.al.	2412.08640v1	null
2024-12-12	Drift-free Visual SLAM using Digital Twins	Roxane Merat et.al.	2412.08496v2	null
2024-12-11	Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization	Siyan Dong et.al.	2412.08376v1	link
2024-12-10	LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models	Ziqi Lu et.al.	2412.07746v1	null
2024-12-09	MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds	Zhenggang Tang et.al.	2412.06974v1	null
2024-12-09	An Efficient Scene Coordinate Encoding and Relocalization Method	Kuan Xu et.al.	2412.06488v1	link
2024-12-09	Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation	Marsha Mariya Kappan et.al.	2412.06227v1	null
2024-12-06	CCS: Continuous Learning for Customized Incremental Wireless Sensing Services	Qunhang Fu et.al.	2412.04821v1	null
2024-12-05	ProPLIKS: Probablistic 3D human body pose estimation	Karthik Shetty et.al.	2412.04665v1	null
2024-12-05	DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction	Ben Kaye et.al.	2412.04464v1	null
2024-12-05	Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation	Alan Li et.al.	2412.04279v1	null
2024-12-04	Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis	Qitao Zhao et.al.	2412.03570v1	null
2024-12-06	NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images	Lingen Li et.al.	2412.03517v2	null
2024-12-05	A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks	Proma Hossain Progga et.al.	2412.03498v2	null
2024-12-04	MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras	Huai Yu et.al.	2412.03146v1	link
2024-12-04	An indoor DSO-based ceiling-vision odometry system for indoor industrial environments	Abdelhak Bougouffa et.al.	2412.02950v1	null
2024-12-03	EgoCast: Forecasting Egocentric Human Pose in the Wild	Maria Escobar et.al.	2412.02903v1	null
2024-12-02	emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation	Sasha Salter et.al.	2412.02725v1	link
2024-12-03	ProbPose: A Probabilistic Approach to 2D Human Pose Estimation	Miroslav Purkrabek et.al.	2412.02254v1	null
2024-12-03	Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images	Xiangyong Lu et.al.	2412.02197v1	link
2024-12-03	CLERF: Contrastive LEaRning for Full Range Head Pose Estimation	Ting-Ruen Wei et.al.	2412.02066v1	null
2024-12-02	Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle	Miroslav Purkrabek et.al.	2412.01562v1	link
2024-12-02	6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting	Yufeng Jin et.al.	2412.01543v1	null
2024-12-02	HandOS: 3D Hand Reconstruction in One Stage	Xingyu Chen et.al.	2412.01537v1	null
2024-12-02	SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames	Yuxuan Zhou et.al.	2412.01500v1	link
2024-12-02	MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection	Yonghao Dang et.al.	2412.01422v1	null
2024-12-02	Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures	Qiyuan Shen et.al.	2412.01299v1	null
2024-12-02	CRISP: Object Pose and Shape Estimation with Test-Time Adaptation	Jingnan Shi et.al.	2412.01052v1	null
2024-11-29	Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling	Qirui Wu et.al.	2411.19492v1	null
2024-11-29	Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning	Yang You et.al.	2411.19458v1	null
2024-11-28	GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model	Rui Zhou et.al.	2411.19289v1	null
2024-11-28	HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos	Prithviraj Banerjee et.al.	2411.19167v1	null
2024-11-28	Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations	Tjark Behrens et.al.	2411.19162v1	link
2024-11-28	Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation	Mathias Hudoba de Badyn et.al.	2411.19033v1	null
2024-11-28	Waterfall Transformer for Multi-person Pose Estimation	Navin Ranjan et.al.	2411.18944v1	null
2024-12-02	AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers	Sherwin Bahmani et.al.	2411.18673v2	null
2024-11-27	XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration	Denys Rozumnyi et.al.	2411.18377v1	null
2024-11-27	Manual-PA: Learning 3D Part Assembly from Instruction Diagrams	Jiahao Zhang et.al.	2411.18011v1	null
2024-11-26	Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors	Ziang Xu et.al.	2411.17790v1	null
2024-11-26	Geometric Point Attention Transformer for 3D Shape Reassembly	Jiahan Li et.al.	2411.17788v1	null
2024-11-26	RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training	Raktim Gautam Goswami et.al.	2411.17662v1	null
2024-11-26	Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles	Susu Fang et.al.	2411.17432v1	null
2024-11-26	Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration	Junyuan Deng et.al.	2411.17240v1	link
2024-11-28	SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting	Gyeongjin Kang et.al.	2411.17190v3	null
2024-11-26	GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation	Xin Liu et.al.	2411.17174v1	null
2024-11-25	Diffusion Features for Zero-Shot 6DoF Object Pose Estimation	Bernd Von Gimborn et.al.	2411.16668v1	null
2024-11-25	Edge Weight Prediction For Category-Agnostic Pose Estimation	Or Hirschorn et.al.	2411.16665v1	link
2024-11-25	SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis	Hyojun Go et.al.	2411.16443v1	link
2024-11-25	One Diffusion to Generate Them All	Duong H. Le et.al.	2411.16318v1	link
2024-11-25	UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image	Xingyu Liu et.al.	2411.16106v1	null
2024-11-24	Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching	Yujing Sun et.al.	2411.15860v1	link
2024-11-24	PEnG: Pose-Enhanced Geo-Localisation	Tavis Shore et.al.	2411.15742v1	null
2024-11-22	Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications	Changseob Song et.al.	2411.15366v1	null
2024-11-22	Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation	Huy Le et.al.	2411.14913v1	null
2024-11-22	mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect	Shuting Hu et.al.	2411.14656v1	null
2024-11-21	DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding	Tianhe Ren et.al.	2411.14347v1	link
2024-11-21	SEMPose: A Single End-to-end Network for Multi-object Pose Estimation	Xin Liu et.al.	2411.14002v1	null
2024-11-21	Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain	Vidya Sudevan et.al.	2411.13988v1	null
2024-11-21	Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework	Vidya Sudevan et.al.	2411.13962v1	null
2024-11-20	Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation	Rahm Ranjan et.al.	2411.13716v1	null
2024-11-20	Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction	Yi Gu et.al.	2411.13620v1	null
2024-11-19	VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference	Seong Jong Yoo et.al.	2411.13607v1	link
2024-11-20	DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild	Weicai Ye et.al.	2411.13291v1	null
2024-11-20	X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation	Yuchen Yang et.al.	2411.13026v1	link
2024-11-19	IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose	Fei Ren et.al.	2411.12676v1	null
2024-11-15	SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction	Yutao Tang et.al.	2411.12592v1	link
2024-11-19	GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping	Teli Ma et.al.	2411.12286v1	null
2024-11-18	IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos	Yunong Liu et.al.	2411.11409v1	link
2024-11-15	USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting	Kang Chen et.al.	2411.10504v1	link
2024-11-13	ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening	Hojun Jang et.al.	2411.09435v1	null
2024-11-13	Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis	Dominik Borer et.al.	2411.08603v1	null
2024-11-13	DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization	Yueming Xu et.al.	2411.08373v1	null
2024-11-16	RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation	Shuocheng Yang et.al.	2411.07699v2	link
2024-11-12	Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction	Rotem Atari et.al.	2411.07644v1	null
2024-11-12	Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions	Shuwei Xing et.al.	2411.07495v1	null
2024-11-08	Acoustic-based 3D Human Pose Estimation Robust to Human Position	Yusuke Oumi et.al.	2411.07165v1	null
2024-11-11	CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	Junho Kim et.al.	2411.06869v1	null
2024-11-11	GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting	Daehan Lee et.al.	2411.06766v1	link
2024-11-11	GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction	Shizhe Yuan et.al.	2411.06725v1	null
2024-11-10	Magnetic Field Aided Vehicle Localization with Acceleration Correction	Mrunmayee Deshpande et.al.	2411.06543v1	null
2024-11-10	Visuotactile-Based Learning for Insertion with Compliant Hands	Osher Azulay et.al.	2411.06408v1	link
2024-11-08	Poze: Sports Technique Feedback under Data Constraints	Agamdeep Singh et.al.	2411.05734v1	null
2024-11-08	DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions	Rafael Berral-Soler et.al.	2411.05552v1	link
2024-11-08	Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map	Chanuk Yang et.al.	2411.05497v1	null
2024-11-08	Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements	Kunrui Ze et.al.	2411.05481v1	null
2024-11-07	Social EgoMesh Estimation	Luca Scofano et.al.	2411.04598v1	link
2024-11-07	Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory	Ali K. AlShami et.al.	2411.04501v1	null
2024-11-08	SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation	Xun Tu et.al.	2411.04386v2	null
2024-11-08	GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting	Jilan Mei et.al.	2411.03807v3	null
2024-11-06	Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage	Claus D. Hansen et.al.	2411.03724v1	null
2024-11-05	Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data	Seunggeun Chi et.al.	2411.03561v1	null
2024-11-05	HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features	Arnab Dey et.al.	2411.03086v1	null
2024-11-04	Semantic Masking and Visual Feature Matching for Robust Localization	Luisa Mao et.al.	2411.01804v1	null
2024-11-03	Activating Self-Attention for Multi-Scene Absolute Pose Regression	Miso Lee et.al.	2411.01443v1	link
2024-11-04	3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction	Jongmin Lee et.al.	2411.00543v2	null
2024-10-31	Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis	Brody McNutt et.al.	2411.00196v1	null
2024-10-31	No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images	Botao Ye et.al.	2410.24207v1	link
2024-11-06	SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation	Aditya Agarwal et.al.	2410.23643v2	null
2024-10-30	SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark	HyunJun Jung et.al.	2410.22715v1	null
2024-10-29	LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues	Hanqing Jiang et.al.	2410.22213v1	null
2024-10-29	PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting	Sunghwan Hong et.al.	2410.22128v1	link
2024-10-29	HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation	Zhoujie Xu et.al.	2410.22079v1	null
2024-10-29	EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data	Zhonghua Yi et.al.	2410.21743v1	link
2024-10-28	Synthetica: Large Scale Synthetic Data for Robot Perception	Ritvik Singh et.al.	2410.21153v1	null
2024-10-29	BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment	Chih-Hsiang Hsu et.al.	2410.20731v2	link
2024-11-01	RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior	Mingjiang Liang et.al.	2410.20358v2	null
2024-10-27	Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions	Rawal Khirodkar et.al.	2410.20294v1	null
2024-10-26	Neural Fields in Robotics: A Survey	Muhammad Zubair Irshad et.al.	2410.20220v1	link
2024-10-25	DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems	Muhammad Zaeem Shahzad et.al.	2410.19336v1	null
2024-10-24	Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction	Junyi Chen et.al.	2410.18962v1	null
2024-10-24	VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation	Daniel Bermuth et.al.	2410.18723v1	link
2024-10-23	Robust Two-View Geometry Estimation with Implicit Differentiation	Vladislav Pyatov et.al.	2410.17983v1	link
2024-10-23	YOLOv11: An Overview of the Key Architectural Enhancements	Rahima Khanam et.al.	2410.17725v1	link
2024-10-21	Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers	Andrea Berra et.al.	2410.15802v1	null
2024-10-21	ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos	Tao Tang et.al.	2410.15582v1	link
2024-10-20	Neural Active Structure-from-Motion in Dark and Textureless Environment	Kazuto Ichimaru et.al.	2410.15378v1	null
2024-10-20	POSE: Pose estimation Of virtual Sync Exhibit system	Hao-Tang Tsui et.al.	2410.15343v1	link
2024-10-18	Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing	Jianping Li et.al.	2410.14565v1	null
2024-10-18	Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior	Calvin-Khang Ta et.al.	2410.14540v1	null
2024-10-18	Sim2real Cattle Joint Estimation in 3D point clouds	Okour Mohammad et.al.	2410.14419v1	null
2024-10-18	Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping	Renguang Chen et.al.	2410.14161v1	null
2024-10-15	From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images	unyang Wu et.al.	2410.13896v1	null
2024-10-17	DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions	Edison P. Velasco-Sánchez et.al.	2410.13541v1	null
2024-10-17	Object Pose Estimation Using Implicit Representation For Transparent Objects	Varun Burde et.al.	2410.13465v1	null
2024-10-16	Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation	Francesco Evangelisti et.al.	2410.12679v1	null
2024-10-15	Contrastive Touch-to-Touch Pretraining	Samanta Rodriguez et.al.	2410.11834v1	null
2024-10-18	X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing	Xinyan Chen et.al.	2410.10167v2	null
2024-10-13	Occluded Human Pose Estimation based on Limb Joint Augmentation	Gangtao Han et.al.	2410.09885v1	null
2024-10-12	Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors	Hritam Basak et.al.	2410.09467v1	null
2024-10-12	Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis	Qianyi Deng et.al.	2410.09312v1	link
2024-10-11	CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation	Jianyu Zhao et.al.	2410.09010v1	link
2024-10-11	Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization	Christian Schmidt et.al.	2410.08743v1	link
2024-10-10	Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation	Felix Petersen et.al.	2410.08125v1	null
2024-10-10	Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation	Maria Makarova et.al.	2410.07801v1	null
2024-10-10	Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos	Cuong Le et.al.	2410.07795v1	link
2024-10-12	Autonomous Driving in Unstructured Environments: How Far Have We Come?	Chen Min et.al.	2410.07701v2	link
2024-10-10	Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks	Minxing Zhang et.al.	2410.07670v1	null
2024-10-09	OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB	Yunzhi Lin et.al.	2410.06694v1	null
2024-10-08	SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging	Ziyang Chen et.al.	2410.06028v1	link
2024-10-08	AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry	Thomas Jantos et.al.	2410.05996v1	null
2024-10-08	Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation?	Charalambos Tzamos et.al.	2410.05984v1	link
2024-10-08	FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance	Ruocheng Wang et.al.	2410.05791v1	null
2024-10-07	Comparison of marker-less 2D image-based methods for infant pose estimation	Lennart Jahn et.al.	2410.04980v1	null
2024-10-06	Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion	Mehwish Ghafoor et.al.	2410.04574v1	link
2024-10-06	LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation	Jianhao Jiao et.al.	2410.04419v1	null
2024-10-05	Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis	Juan Ignacio Bravo Pérez-Villar et.al.	2410.04298v1	link
2024-10-05	A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems	Nikola Radulov et.al.	2410.04242v1	link
2024-10-04	Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos	Ziyu Wang et.al.	2410.03858v1	null
2024-10-04	Universal Global State Estimation for Inertial Navigation Systems	Sifeddine Benahmed et.al.	2410.03846v1	null
2024-10-04	MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion	Junyi Zhang et.al.	2410.03825v1	null
2024-10-04	Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images	Ci Li et.al.	2410.03438v1	null
2024-10-04	HRVMamba: High-Resolution Visual State Space Model for Dense Prediction	Hao Zhang et.al.	2410.03174v1	null
2024-10-04	CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization	Shigemichi Matsuzaki et.al.	2410.03054v1	null
2024-10-03	Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition	Nikolaos Stathoulopoulos et.al.	2410.02643v1	null
2024-10-03	Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features	Chengkai Hou et.al.	2410.02237v1	null
2024-10-02	SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment	Xingyu Ji et.al.	2410.01618v1	null
2024-10-02	SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network	Ahmed Tawfik Aboukhadra et.al.	2410.01293v1	null
2024-10-01	Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models	Jerry Yan et.al.	2410.01061v1	null
2024-10-01	RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations	Kaichen Zhou et.al.	2410.00713v1	link
2024-10-01	GERA: Geometric Embedding for Efficient Point Registration Analysis	Geng Li et.al.	2410.00589v1	null
2024-09-30	Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations	Muhammad Saif Ullah Khan et.al.	2409.20469v1	null
2024-09-30	Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies	Shalini Sarode et.al.	2409.20237v1	null
2024-09-30	PuzzleBoard: A New Camera Calibration Pattern with Position Encoding	Peer Stelldinger et.al.	2409.20127v1	link
2024-09-30	Robust Gaussian Splatting SLAM by Leveraging Loop Closure	Zunjie Zhu et.al.	2409.20111v1	null
2024-09-30	GearTrack: Automating 6D Pose Estimation	Yu Deng et.al.	2409.19986v1	null
2024-09-29	PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond	Chen Song et.al.	2409.19772v1	link
2024-09-29	GelSlim 4.0: Focusing on Touch and Reproducibility	Andrea Sipos et.al.	2409.19770v1	null
2024-09-27	Robust Proximity Operations using Probabilistic Markov Models	Deep Parikh et.al.	2409.19062v1	null
2024-09-27	Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras	Yipeng Lu et.al.	2409.18673v1	null
2024-09-27	DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences	Jingwei Song et.al.	2409.18457v1	null
2024-09-30	Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation	Mengchen Zhang et.al.	2409.18261v2	link
2024-09-26	AI-Powered Augmented Reality for Satellite Assembly, Integration and Test	Alvaro Patricio et.al.	2409.18101v1	null
2024-09-27	Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes	Katja Ludwig et.al.	2409.17671v2	null
2024-09-25	Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits	Shaoxiong Yao et.al.	2409.17389v1	null
2024-09-25	Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots	Zhichao Liu et.al.	2409.17116v1	null
2024-09-25	Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles	Ran Jing et.al.	2409.17111v1	null
2024-09-25	Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation	Lucas Carvalho de Lima et.al.	2409.16680v1	null
2024-09-25	FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation	Jingyi Tang et.al.	2409.16600v1	null
2024-09-25	Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots	Masoud Dayani Najafabadi et.al.	2409.16595v1	link
2024-09-24	PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings	Sutharsan Mahendren et.al.	2409.15832v1	null
2024-09-24	LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation	Ruida Zhang et.al.	2409.15727v1	link
2024-09-23	Framework for Robust Localization of UUVs and Mapping of Net Pens	David Botta et.al.	2409.15475v1	null
2024-09-23	FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera	Guoyang Zhao et.al.	2409.15054v1	link
2024-09-23	BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach	Stefano Puliti et.al.	2409.14755v1	link
2024-09-23	ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps	Haiming Gao et.al.	2409.14723v1	null
2024-09-22	Tactile Functasets: Neural Implicit Representations of Tactile Datasets	Sikai Li et.al.	2409.14592v1	null
2024-09-22	AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way	Sining Huang et.al.	2409.14577v1	null
2024-09-22	DROP: Dexterous Reorientation via Online Planning	Albert H. Li et.al.	2409.14562v1	null
2024-09-21	Combining Absolute and Semi-Generalized Relative Poses for Visual Localization	Vojtech Panek et.al.	2409.14269v1	null
2024-09-18	SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection	Tim Engelbracht et.al.	2409.11870v1	link
2024-09-18	End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation	Thomas Pöllabauer et.al.	2409.11819v1	null
2024-09-18	Bridging Domain Gap for Flight-Ready Spaceborne Vision	Tae Ha Park et.al.	2409.11661v1	null
2024-09-17	Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification	Frederik Hagelskjær et.al.	2409.11512v1	null
2024-09-17	Training Datasets Generation for Machine Learning: Application to Vision Based Navigation	Jérémy Lebreton et.al.	2409.11383v1	null
2024-09-17	OmniGen: Unified Image Generation	Shitao Xiao et.al.	2409.11340v1	link
2024-09-17	ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges	Thien-Minh Nguyen et.al.	2409.11122v1	link
2024-09-17	Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB	Alessandro Simoni et.al.	2409.11104v1	null
2024-09-21	HGSLoc: 3DGS-based Heuristic Camera Pose Refinement	Zhongyan Niu et.al.	2409.10925v2	null
2024-09-17	Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter	Deep Parikh et.al.	2409.10815v1	null
2024-09-16	CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera	Jingpei Lu et.al.	2409.10441v1	null
2024-09-16	HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models	Vineet Bhat et.al.	2409.10419v1	null
2024-09-16	2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?	Téo Guichoux et.al.	2409.10357v1	null
2024-09-16	Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference	Huy-Dung Nguyen et.al.	2409.10095v1	null
2024-09-15	Precise Pick-and-Place using Score-Based Diffusion Networks	Shih-Wei Guo et.al.	2409.09725v1	null
2024-09-15	Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild	Nie Lin et.al.	2409.09714v1	null
2024-09-15	Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision	Deep Parikh et.al.	2409.09665v1	null
2024-09-15	A Scalable Tabletop Satellite Automation Testbed:Design And Experiments	Deep Parikh et.al.	2409.09633v1	null
2024-09-14	MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry	Yuheng Qiu et.al.	2409.09479v1	null
2024-09-14	Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM	Haoying Li et.al.	2409.09410v1	null
2024-09-13	Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry	Yunus Bilge Kurt et.al.	2409.08769v1	link
2024-09-13	WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users	Yunzhi Li et.al.	2409.08494v1	link
2024-09-12	Bayesian Inverse Graphics for Few-Shot Concept Learning	Octavio Arriaga et.al.	2409.08351v1	link
2024-09-12	Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation	Samanta Rodriguez et.al.	2409.08269v1	null
2024-09-12	Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation	Haoying Li et.al.	2409.07933v1	null
2024-09-12	GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions	Liang Feng et.al.	2409.07798v1	null
2024-09-12	GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution	Liang Feng et.al.	2409.07752v1	null
2024-09-11	FaVoR: Features via Voxel Rendering for Camera Relocalization	Vincenzo Polizzi et.al.	2409.07571v1	null
2024-09-11	Benchmarking 2D Egocentric Hand Pose Datasets	Olga Taran et.al.	2409.07337v1	null
2024-09-11	iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation	Shuolong Chen et.al.	2409.07116v1	link
2024-09-11	Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry	Anbo Tao et.al.	2409.06948v1	null
2024-09-13	A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch	Haodong Zheng et.al.	2409.06912v2	null
2024-09-11	Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences	Shishir Reddy Vutukur et.al.	2409.06683v2	link
2024-09-10	PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation	Ginger Delmas et.al.	2409.06535v1	null
2024-09-10	Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation	Mohsi Jawaid et.al.	2409.06240v1	null
2024-09-09	From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models	Tessa Pulli et.al.	2409.05413v1	null
2024-09-08	HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions	Jianping Li et.al.	2409.05006v1	null
2024-09-06	Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands	Yotam Erel et.al.	2409.04397v1	null
2024-09-06	GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers	Lorenza Prospero et.al.	2409.04196v1	null
2024-09-06	Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics	Woojin Cho et.al.	2409.04033v1	null
2024-09-06	Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments	Therese Joseph et.al.	2409.03998v1	null
2024-09-09	The Influence of Faulty Labels in Data Sets on Human Pose Estimation	Arnold Schwarz et.al.	2409.03887v2	null
2024-09-05	MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation	Philipp Quentin et.al.	2409.03556v1	null
2024-09-05	UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking	Md. Mahfuzur Rahman et.al.	2409.03245v1	null
2024-09-01	Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach	Wenjun Huang et.al.	2409.02715v1	null
2024-09-04	Object Gaussian for Monocular 6D Pose Estimation from Sparse Views	Luqing Luo et.al.	2409.02581v1	null
2024-09-03	EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision	Yiming Zhao et.al.	2409.02224v1	null
2024-09-03	Deep learning for objective estimation of Parkinsonian tremor severity	Felipe Duque-Quiceno et.al.	2409.02011v1	null
2024-09-03	SPiKE: 3D Human Pose from Point Cloud Sequences	Irene Ballester et.al.	2409.01879v1	link
2024-09-02	Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds	Mohammed H. AlSharif et.al.	2409.01002v1	null
2024-09-01	Detection, Recognition and Pose Estimation of Tabletop Objects	Sanjuksha Nirgude et.al.	2409.00869v1	null
2024-09-01	DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation	Huixin Zhang et.al.	2409.00744v1	link
2024-09-01	MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds	Ziqiang Dang et.al.	2409.00736v1	null
2024-08-31	ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action	Longyun Liao et.al.	2409.00449v1	null
2024-09-04	Augmented Reality without Borders: Achieving Precise Localization Without Maps	Albert Gassol Puigjaner et.al.	2408.17373v3	null
2024-08-30	BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities	Boris Meden et.al.	2408.17297v1	null
2024-08-30	EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs	Zhen Fan et.al.	2408.17168v1	null
2024-09-01	Generic Objects as Pose Probes for Few-Shot View Synthesis	Zhirui Gao et.al.	2408.16690v2	null
2024-08-29	OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation	Yuchen Che et.al.	2408.16547v1	link
2024-08-29	GRPose: Learning Graph Relations for Human Image Generation with Pose Priors	Xiangchen Yin et.al.	2408.16540v1	link
2024-08-28	Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators	Nikita Kister et.al.	2408.16536v1	null
2024-08-28	Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation	Laura Bragagnolo et.al.	2408.15810v1	link
2024-08-30	Addressing the challenges of loop detection in agricultural environments	Nicolás Soncini et.al.	2408.15761v2	link
2024-08-28	Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph	Zherong Zhang et.al.	2408.15750v1	null
2024-08-28	Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction	Salma Salimi et.al.	2408.15717v1	null
2024-08-26	Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model	Abu Saleh Musa Miah et.al.	2408.14111v1	null
2024-08-25	InterTrack: Tracking Human Object Interaction without Object Templates	Xianghui Xie et.al.	2408.13953v1	null
2024-08-24	Temporally-consistent 3D Reconstruction of Birds	Johannes Hägerlind et.al.	2408.13629v1	null
2024-08-24	Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation	Jianing Song et.al.	2408.13587v1	null
2024-08-27	Sapiens: Foundation for Human Vision Models	Rawal Khirodkar et.al.	2408.12569v3	null
2024-08-21	GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting	Wanshui Gan et.al.	2408.11447v1	link
2024-08-20	GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting	Changkun Liu et.al.	2408.11085v1	null
2024-08-20	ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data	Elia Bonetto et.al.	2408.10831v1	null
2024-08-20	MPL: Lifting 3D Human Pose from Multi-view 2D Poses	Seyed Abolfazl Ghasemzadeh et.al.	2408.10805v1	link
2024-08-19	RUMI: Rummaging Using Mutual Information	Sheng Zhong et.al.	2408.10450v1	null
2024-08-19	SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views	Chao Xu et.al.	2408.10195v1	null
2024-08-19	SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition	Wiktor Mucha et.al.	2408.10037v1	link
2024-08-19	Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation	Qianhui Men et.al.	2408.09931v1	null
2024-08-18	OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare	Chen Long-fei et.al.	2408.09409v1	null
2024-08-17	An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface	Kevin Jose Thomas et.al.	2408.09311v1	link
2024-08-16	ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation	Hao Tang et.al.	2408.09042v1	null
2024-08-16	Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS	Wei Sun et.al.	2408.08723v1	null
2024-08-16	SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis	Xingyue Lin et.al.	2408.08623v1	null
2024-08-15	HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning	Hongyu Li et.al.	2408.08312v1	null
2024-08-15	Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation	Varun Burde et.al.	2408.08234v1	link
2024-08-15	Towards Practical Human Motion Prediction with LiDAR Point Clouds	Xiao Han et.al.	2408.08202v1	null
2024-08-15	Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment	Qiushuo Cheng et.al.	2408.08182v1	null
2024-08-15	Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models	Tianyu Wang et.al.	2408.07975v1	null
2024-08-15	GOReloc: Graph-based Object-Level Relocalization for Visual SLAM	Yutong Wang et.al.	2408.07917v1	link
2024-08-13	Grasping by Hanging: a Learning-Free Grasping Detection Method for Previously Unseen Objects	Wanze Li et.al.	2408.06734v1	null
2024-08-13	A Miniature Vision-Based Localization System for Indoor Blimps	Shicong Ma et.al.	2408.06648v1	null
2024-08-12	UniT: Unified Tactile Representation for Robot Learning	Zhengtong Xu et.al.	2408.06481v1	link
2024-08-12	Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques	Navid Ghassemi et.al.	2408.06336v1	null
2024-08-12	CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments	Yanpeng Jia et.al.	2408.05981v1	null
2024-08-12	PAFormer: Part Aware Transformer for Person Re-identification	Hyeono Jung et.al.	2408.05918v1	null
2024-08-11	SABER-6D: Shape Representation Based Implicit Object Pose Estimation	Shishir Reddy Vutukur et.al.	2408.05867v1	null
2024-08-10	Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis	Zhongche Qu et.al.	2408.05635v1	null
2024-08-10	Anticipation through Head Pose Estimation: a preliminary study	Federico Figari Tomenotti et.al.	2408.05516v1	null
2024-08-09	Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing	Lennart Niecksch et.al.	2408.04979v1	null
2024-08-07	PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model	Yunlong Huang et.al.	2408.03540v1	null
2024-08-06	Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera	Zibin Liu et.al.	2408.03225v1	link
2024-08-06	Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW	Elia Cereda et.al.	2408.03168v1	null
2024-08-06	BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications	G. Manni et.al.	2408.03078v1	link
2024-08-07	Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network	Xinyi Zhang et.al.	2408.02922v2	null
2024-08-05	Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises	Aleksa Marusic et.al.	2408.02855v1	null
2024-08-05	Joint-Motion Mutual Learning for Pose Estimation in Videos	Sifan Wu et.al.	2408.02285v1	null
2024-08-04	AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos	Feichi Lu et.al.	2408.02110v1	null
2024-08-04	Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem	Tian Zhan et.al.	2408.01945v1	null
2024-08-03	MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions	Rahul Islam et.al.	2408.01850v1	null
2024-08-03	BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles	Lun Luo et.al.	2408.01841v1	link
2024-08-03	E $^3$ NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images	Yunshan Qi et.al.	2408.01840v1	null
2024-08-03	Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality	Leina Elansary et.al.	2408.01728v1	null
2024-08-03	Stimulating Imagination: Towards General-purpose Object Rearrangement	Jianyang Wu et.al.	2408.01655v1	null
2024-08-02	Full-range Head Pose Geometric Data Augmentations	Huei-Chung Hu et.al.	2408.01566v1	null
2024-07-31	Adapting Skills to Novel Grasps: A Self-Supervised Approach	Georgios Papagiannis et.al.	2408.00178v1	null
2024-07-31	Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods	Xusheng Luo et.al.	2408.00117v1	null
2024-07-30	StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset	Chaofan Huo et.al.	2407.20545v1	link
2024-07-30	HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation	Wencan Cheng et.al.	2407.20542v1	link
2024-07-30	Markers Identification for Relative Pose Estimation of an Uncooperative Target	Batu Candan et.al.	2407.20515v1	null
2024-07-29	BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation	Kieran Saunders et.al.	2407.20437v1	null
2024-07-28	Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph	Zhengcen Li et.al.	2407.19497v1	link
2024-07-26	Flexible graph convolutional network for 3D human pose estimation	Abu Taib Mohammed Shahjahan et.al.	2407.19077v1	link
2024-07-26	From 2D to 3D: AISG-SLA Visual Localization Challenge	Jialin Gao et.al.	2407.18590v1	null
2024-07-28	HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation	Zhenzhi Wang et.al.	2407.17438v2	link
2024-07-24	Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments	Wei Gao et.al.	2407.17078v1	null
2024-07-30	DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction	Xiaobiao Du et.al.	2407.16988v2	link
2024-07-24	Pose Estimation from Camera Images for Underwater Inspection	Luyuan Peng et.al.	2407.16961v1	null
2024-07-23	COALA: A Practical and Vision-Centric Federated Learning Platform	Weiming Zhuang et.al.	2407.16560v1	link
2024-07-23	Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features	Romeo Valentin et.al.	2407.16223v1	null
2024-07-23	Optimal camera-robot pose estimation in linear time from points and lines	Guangyang Zeng et.al.	2407.16151v1	null
2024-07-23	3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images	Jie Zhao et.al.	2407.16137v1	null
2024-07-21	CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models	Zheng Chong et.al.	2407.15886v1	link
2024-07-22	RADA: Robust and Accurate Feature Learning with Domain Adaptation	Jingtai He et.al.	2407.15791v1	null
2024-07-22	Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection	Kangqi Ma et.al.	2407.15771v1	null
2024-07-22	6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model	Matteo Bortolon et.al.	2407.15484v1	null
2024-07-23	Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions	Yihao Ai et.al.	2407.15451v2	link
2024-07-22	avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality	Dizhi Ma et.al.	2407.15373v1	null
2024-07-20	From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM	Lorenzo Montano-Oliván et.al.	2407.14797v1	null
2024-07-19	ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation	Luke Bidulka et.al.	2407.14605v1	null
2024-07-19	6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry	Sungho Chun et.al.	2407.14136v1	link
2024-07-18	RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark	Yuan-Hao Ho et.al.	2407.13930v1	null
2024-07-19	GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation	Bangyan Liao et.al.	2407.13537v2	link
2024-07-18	SCAPE: A Simple and Strong Category-Agnostic Pose Estimator	Yujia Liang et.al.	2407.13483v1	link
2024-07-17	SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization	Yiyang Chen et.al.	2407.12667v1	link
2024-07-17	Invertible Neural Warp for NeRF	Shin-Fang Chng et.al.	2407.12354v1	null
2024-07-16	NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models	Francesco Milano et.al.	2407.12207v1	link
2024-07-16	Monocular pose estimation of articulated surgical instruments in open surgery	Robert Spektor et.al.	2407.12138v1	null
2024-07-17	GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection	Jingwen Yu et.al.	2407.11736v2	link
2024-07-16	TCFormer: Visual Recognition via Token Clustering Transformer	Wang Zeng et.al.	2407.11321v1	link
2024-07-15	A BlueROV2-based platform for underwater mapping experiments	Tudor Alinei-Poiana et.al.	2407.10901v1	link
2024-07-15	LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning	Zhuozhu Jian et.al.	2407.10782v1	null
2024-07-15	Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis	Antoine Legrand et.al.	2407.10762v1	null
2024-07-16	GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation	Haonan Wang et.al.	2407.10756v2	null
2024-07-15	Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs	Nicholas Carlotti et.al.	2407.10661v1	null
2024-07-15	Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function	Giulia Panconi et.al.	2407.10590v1	null
2024-07-14	3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects	Weiming Zhi et.al.	2407.10331v1	null
2024-07-16	psifx -- Psychological and Social Interactions Feature Extraction Package	Guillaume Rochette et.al.	2407.10266v2	null
2024-07-14	PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation	Nermin Samet et.al.	2407.10220v1	link
2024-07-14	3DEgo: 3D Editing on the Go!	Umar Khalid et.al.	2407.10102v1	null
2024-07-12	iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning	Tom Fischer et.al.	2407.09271v1	link
2024-07-12	HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation	Manuel Birlo et.al.	2407.09215v1	null
2024-07-12	KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting	Andrew Jeong et.al.	2407.08909v1	null
2024-07-11	RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation	Tao Jiang et.al.	2407.08634v1	link
2024-07-11	SRPose: Two-view Relative Pose Estimation with Sparse Keypoints	Rui Yin et.al.	2407.08199v1	link
2024-07-11	SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM	Neng Wang et.al.	2407.08106v1	link
2024-07-10	RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects	Jiahao Nick Li et.al.	2407.08081v1	null
2024-07-10	Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization	Jinjie Mai et.al.	2407.08023v1	link
2024-07-10	Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation	Junjia Han et.al.	2407.07389v1	null
2024-07-09	Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images	Chuanrui Zhang et.al.	2407.06984v1	null
2024-07-09	Computer vision tasks for intelligent aerospace missions: An overview	Huilin Chen et.al.	2407.06513v1	null
2024-07-08	GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields	Weiyi Xue et.al.	2407.05597v1	null
2024-07-10	On the power of data augmentation for head pose estimation	Michael Welter et.al.	2407.05357v2	link
2024-07-07	SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning	Yi Feng et.al.	2407.05283v1	link
2024-07-05	Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos	Leonhard Sommer et.al.	2407.04384v1	link
2024-07-04	Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation	Laiyan Ding et.al.	2407.04041v1	link
2024-07-04	Markerless Multi-view 3D Human Pose Estimation: a survey	Ana Filipa Rodrigues Nogueira et.al.	2407.03817v1	null
2024-07-04	A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios	Zikang Yuan et.al.	2407.03590v1	link
2024-07-03	Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation	Mengmeng Cui et.al.	2407.02990v1	null
2024-07-03	Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction	Jiaxin Guo et.al.	2407.02918v1	link
2024-07-02	SUPER: Seated Upper Body Pose Estimation using mmWave Radars	Bo Zhang et.al.	2407.02455v1	null
2024-07-02	ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction	Bo Qian et.al.	2407.02129v1	null
2024-07-02	Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval	Nicola Messina et.al.	2407.02104v1	null
2024-07-01	Active Human Pose Estimation via an Autonomous UAV Agent	Jingxi Chen et.al.	2407.01811v1	null
2024-07-01	RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields	Haochen Jiang et.al.	2407.01303v1	link
2024-07-01	Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization	Ruofei Bai et.al.	2407.01013v1	link
2024-06-30	Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation	Adnan Abdullah et.al.	2407.00848v1	null
2024-06-29	When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration	Philipp Allgeuer et.al.	2407.00518v1	link
2024-06-28	Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review	Moseli Mots'oehli et.al.	2407.00252v1	null
2024-06-28	EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans	Nicola Garau et.al.	2406.19726v1	null
2024-06-28	CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services	DongKi Noh et.al.	2406.19634v1	null
2024-06-27	Multimodal Visual-haptic pose estimation in the presence of transient occlusion	Michael Zechmair et.al.	2406.19323v1	null
2024-06-27	Human Modelling and Pose Estimation Overview	Pawel Knap et.al.	2406.19290v1	null
2024-06-26	Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference	Yuan Gao et.al.	2406.18453v1	link
2024-06-27	Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods	Filipe Gama et.al.	2406.17382v2	null
2024-06-24	High-resolution open-vocabulary object 6D pose estimation	Jaime Corsetti et.al.	2406.16384v1	null
2024-06-23	Breaking the Frame: Image Retrieval by Visual Overlap Prediction	Tong Wei et.al.	2406.16204v1	link
2024-06-21	Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe	Sandeep Singh Sengar et.al.	2406.15649v1	link
2024-06-24	Investigating the impact of 2D gesture representation on co-speech gesture generation	Teo Guichoux et.al.	2406.15111v2	null
2024-06-20	Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data	Moira Shooter et.al.	2406.14412v1	null
2024-06-20	PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions	Sihan Ma et.al.	2406.14367v1	null
2024-06-19	NeRF-Feat: 6D Object Pose Estimation using Feature Rendering	Shishir Reddy Vutukur et.al.	2406.13796v1	null
2024-06-19	CNN Based Flank Predictor for Quadruped Animal Species	Vanessa Suessle et.al.	2406.13588v1	null
2024-06-19	MVSBoost: An Efficient Point Cloud-based 3D Reconstruction	Umair Haroon et.al.	2406.13515v1	null
2024-06-19	An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses	Johanna Bräunig et.al.	2406.13464v1	null
2024-06-18	Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings	Ruijie Tang et.al.	2406.13048v1	null
2024-06-17	Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization	Huaiji Zhou et.al.	2406.11766v1	null
2024-06-17	Domain Generalization for In-Orbit 6D Pose Estimation	Antoine Legrand et.al.	2406.11743v1	null
2024-06-17	SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking	Tianhong Catherine Yu et.al.	2406.11645v1	null
2024-06-14	Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization	Wonho Song et.al.	2406.11599v1	null
2024-06-15	MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception	M. Mahbubur Rahman et.al.	2406.10708v1	link
2024-06-15	Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference	Shayan Shekarforoush et.al.	2406.10455v1	null
2024-06-14	The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences	Bria Long et.al.	2406.10447v1	null
2024-06-14	OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics	Yoni Gozlan et.al.	2406.09788v1	null
2024-06-13	ImageNet3D: Towards General-Purpose Object-Level 3D Understanding	Wufei Ma et.al.	2406.09613v1	link
2024-06-13	Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV	Maneesha Wickramasuriya et.al.	2406.09260v1	link
2024-06-14	Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning	Huy Hoang Nguyen et.al.	2406.09039v2	null
2024-06-14	VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks	Jiannan Wu et.al.	2406.08394v2	link
2024-06-12	Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization	Jiaxin Deng et.al.	2406.08001v1	null
2024-06-12	IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes	Fengtian Lang et.al.	2406.07937v1	link
2024-06-12	From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers	Swaminathan Gurumurthy et.al.	2406.07785v1	link
2024-06-12	SPIN: Spacecraft Imagery for Navigation	Javier Montalvo et.al.	2406.07500v2	link
2024-06-11	Realistic Data Generation for 6D Pose Estimation of Surgical Instruments	Juan Antonio Barragan et.al.	2406.07328v1	link
2024-06-11	SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale	Shester Gueuwou et.al.	2406.06907v1	null
2024-06-10	Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation	Shenghao Li et.al.	2406.06374v1	link
2024-06-08	A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks	Muhammad Suhail Saleem et.al.	2406.05522v1	null
2024-06-06	GLACE: Global Local Accelerated Coordinate Encoding	Fangjinhua Wang et.al.	2406.04340v1	link
2024-06-06	Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking	Jiyao Zhang et.al.	2406.04316v1	null
2024-06-05	Hi5: 2D Hand Pose Estimation with Zero Human Annotation	Masum Hasan et.al.	2406.03599v1	null
2024-06-05	Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices	Xingjian Yang et.al.	2406.02977v1	null
2024-06-04	CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation	Dejia Xu et.al.	2406.02509v1	null
2024-06-04	HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model	Yu Tian et.al.	2406.01914v1	null
2024-06-03	A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios	Enrico Martini et.al.	2406.01832v1	link
2024-06-01	Equivariant amortized inference of poses for cryo-EM	Larissa de Ruijter et.al.	2406.01630v1	null
2024-06-03	3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information	Sihan Wen et.al.	2406.01196v1	null
2024-06-01	CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation	Matan Rusanovsky et.al.	2406.00384v1	link
2024-05-30	Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach	Muhammad Saif Ullah Khan et.al.	2405.20084v1	null
2024-05-30	TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM	Peifeng Jiang et.al.	2405.19614v1	null
2024-05-29	Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives	Mingqi Yuan et.al.	2405.19531v1	null
2024-05-29	Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation	Sabrina Cynthia Triess et.al.	2405.19173v1	null
2024-05-28	World Models for General Surgical Grasping	Hongbin Lin et.al.	2405.17940v1	null
2024-05-27	MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds	Jiahui Lei et.al.	2405.17421v1	link
2024-05-27	Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding	Niloofar Azizi et.al.	2405.17397v1	null
2024-05-27	$\text{Di}^2\text{Pose}$ : Discrete Diffusion Model for Occluded 3D Human Pose Estimation	Weiquan Wang et.al.	2405.17016v1	null
2024-05-27	Clustering-based Learning for UAV Tracking and Pose Estimation	Jiaping Xiao et.al.	2405.16867v1	null
2024-05-26	Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge	Tianchen Deng et.al.	2405.16464v1	link
2024-05-25	Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality	Hakim Ikebayashi et.al.	2405.16008v1	null
2024-05-23	CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments	Yang Zhou et.al.	2405.14731v1	link
2024-05-23	Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation	Daniel Kienzle et.al.	2405.14467v1	link
2024-05-21	Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos	Jayroop Ramesh et.al.	2405.13235v1	link
2024-05-21	Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations	Antoine Legrand et.al.	2405.12728v1	null
2024-05-21	PoseGravity: Pose Estimation from Points and Lines with Axis Prior	Akshay Chandrasekhar et.al.	2405.12646v1	link
2024-05-19	Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation	Zejun Gu et.al.	2405.12247v1	null
2024-05-20	AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements	Calvin Yeung et.al.	2405.12070v1	link
2024-05-19	Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries	Christiaan G. A. Viviers et.al.	2405.11677v1	link
2024-05-19	Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation	Zejun Gu et.al.	2405.11448v1	null
2024-05-18	PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking	Yifan Yang et.al.	2405.11257v1	null
2024-05-18	MotionGS : Compact Gaussian Splatting SLAM by Motion Filter	Xinli Guo et.al.	2405.11129v1	link
2024-05-17	Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation	Yongliang Lin et.al.	2405.10557v1	null
2024-05-16	Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder	Mohamed Ilyes Lakhal et.al.	2405.10423v1	null
2024-05-17	Toon3D: Seeing Cartoons from a New Perspective	Ethan Weber et.al.	2405.10320v2	null
2024-05-15	Task-adaptive Q-Face	Haomiao Sun et.al.	2405.09059v1	null
2024-05-14	RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images	Zong-Wei Hong et.al.	2405.08483v1	link
2024-05-14	TP3M: Transformer-based Pseudo 3D Image Matching with Reference	Liming Han et.al.	2405.08434v1	null
2024-05-13	Deep Learning-Based Object Pose Estimation: A Comprehensive Survey	Jian Liu et.al.	2405.07801v1	link
2024-05-13	JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation	Xubo Luo et.al.	2405.07429v1	link
2024-05-11	TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization	Zhen Tan et.al.	2405.07027v1	link
2024-05-11	AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation	Xingxu Li et.al.	2405.06959v1	null
2024-05-10	CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras	James Tang et.al.	2405.06845v1	link
2024-05-10	MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization	Pengcheng Zhu et.al.	2405.06241v1	null
2024-05-10	Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera	Haixin Shi et.al.	2405.05858v2	null
2024-05-09	Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion	Huanyu Tian et.al.	2405.05817v1	null
2024-05-09	NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM	Yiping Xie et.al.	2405.05807v1	null
2024-05-09	Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview	Yuhang Ming et.al.	2405.05526v1	null
2024-05-08	Adversary-Guided Motion Retargeting for Skeleton Anonymization	Thomas Carr et.al.	2405.05428v1	null
2024-05-08	FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models	Jinglin Xu et.al.	2405.05216v1	link
2024-05-08	ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion	Bing Zhu et.al.	2405.05164v1	null
2024-05-08	GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation	Ivan Bilić et.al.	2405.04890v1	null
2024-05-07	Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation	Jenny Wang et.al.	2405.04609v1	null
2024-05-07	Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map	Yuxuan Xia et.al.	2405.04290v1	null
2024-05-07	Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform	Zhijian Qiao et.al.	2405.03969v1	null
2024-05-07	Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints	Xiongjun Guan et.al.	2405.03959v1	link
2024-05-06	Pose Priors from Language Models	Sanjay Subramanian et.al.	2405.03689v1	null
2024-05-06	Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors	Amit Moryossef et.al.	2405.03545v1	link
2024-05-05	Multi-hop graph transformer network for 3D human pose estimation	Zaedul Islam et.al.	2405.03055v1	null
2024-05-05	Blending Distributed NeRFs with Tri-stage Robust Pose Optimization	Baijun Ye et.al.	2405.02880v1	null
2024-05-03	WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD	Xuxin Cheng et.al.	2405.02241v1	link
2024-05-03	Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation	Xianzhou Zeng et.al.	2405.02114v1	link
2024-05-03	An Onboard Framework for Staircases Modeling Based on Point Clouds	Chun Qing et.al.	2405.01918v1	null
2024-05-06	ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness	Deegan Atha et.al.	2405.01673v2	null
2024-05-02	IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning	Ryan Hoque et.al.	2405.01472v1	null
2024-05-02	Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning	Liu Qiyuan et.al.	2405.01284v1	null
2024-05-02	Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors	Wenxuan Guo et.al.	2405.01112v1	null
2024-05-02	CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications	Jan Blumenkamp et.al.	2405.01107v1	null
2024-05-04	HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images	Zixun Jiao et.al.	2405.01066v2	null
2024-05-01	Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods	Andrew J. Kramer et.al.	2405.00600v1	null
2024-04-30	Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging	Rayan Armani et.al.	2404.19541v1	link
2024-04-30	UniFS: Universal Few-shot Instance Perception with Point Representations	Sheng Jin et.al.	2404.19401v1	link
2024-04-30	Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training	Xingyu Song et.al.	2404.19279v1	link
2024-04-30	XFeat: Accelerated Features for Lightweight Image Matching	Guilherme Potje et.al.	2404.19174v1	null
2024-04-29	Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction	Antoine Maiorca et.al.	2404.18628v1	null
2024-04-29	Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle	Jungwoo Lee et.al.	2404.18395v1	null
2024-04-29	Reconstructing Satellites in 3D from Amateur Telescope Images	Zhiming Chang et.al.	2404.18394v1	null
2024-04-27	Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs	Yiming Bao et.al.	2404.17837v1	null
2024-04-26	Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses	Yi Shen et.al.	2404.17685v1	null
2024-04-26	SLAM for Indoor Mapping of Wide Area Construction Environments	Vincent Ress et.al.	2404.17215v1	null
2024-04-25	WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users	William Huang et.al.	2404.17063v1	link
2024-04-25	Transformer-Based Local Feature Matching for Multimodal Image Registration	Remi Delaunay et.al.	2404.16802v1	null
2024-04-25	DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation	Leandro Di Bella et.al.	2404.16558v1	null
2024-04-25	Efficient Solution of Point-Line Absolute Pose	Petr Hruby et.al.	2404.16552v1	link
2024-04-25	COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images	Panagiotis Sapoutzoglou et.al.	2404.16471v1	link
2024-04-25	MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter	Kenji Koide et.al.	2404.16370v1	null
2024-04-24	3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement	Filipa Lino et.al.	2404.16136v1	link
2024-04-23	SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation	Xiangyu Xu et.al.	2404.15276v1	link
2024-04-25	Domain adaptive pose estimation via multi-level alignment	Yugan Chen et.al.	2404.14885v2	link
2024-04-23	Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking	Kexin Meng et.al.	2404.14835v1	null
2024-04-23	UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues	Vandad Davoodnia et.al.	2404.14634v1	null
2024-04-22	DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation	Yonghao Dang et.al.	2404.14025v1	link
2024-04-23	CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory	Yunlong Ran et.al.	2404.13896v2	null
2024-04-21	Resampling-free Particle Filters in High-dimensions	Akhilan Boopathy et.al.	2404.13698v1	link
2024-04-20	EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment	Guanghao Li et.al.	2404.13346v1	link
2024-04-18	Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds	Oliver Lemke et.al.	2404.12440v1	null
2024-04-18	Gait Recognition from Highly Compressed Videos	Andrei Niculae et.al.	2404.12183v1	null
2024-04-17	Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding	George Retsinas et.al.	2404.12144v1	link
2024-04-17	Kathakali Hand Gesture Recognition With Minimal Data	Kavitha Raju et.al.	2404.11205v1	null
2024-04-17	GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement	Linfang Zheng et.al.	2404.11139v1	null
2024-04-17	CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation	Lianyu Hu et.al.	2404.11111v1	link
2024-04-16	HumMUSS: Human Motion Understanding using State Space Models	Arnab Kumar Mondal et.al.	2404.10880v1	null
2024-04-16	Invariant Kalman Filtering with Noise-Free Pseudo-Measurements	Sven Goffin et.al.	2404.10687v1	null
2024-04-16	The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement	Gabriele Trivigno et.al.	2404.10438v1	null
2024-04-16	GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling	Huantao Ren et.al.	2404.10213v1	null
2024-04-16	LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark	Avinash Upadhyay et.al.	2404.10212v1	link
2024-04-15	LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives	Jiadi Cui et.al.	2404.09748v1	null
2024-04-14	In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Wiktor Mucha et.al.	2404.09308v1	link
2024-04-13	DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Johan Edstedt et.al.	2404.08928v1	link
2024-04-16	3D Human Scan With A Moving Event Camera	Kai Kohyama et.al.	2404.08504v2	null
2024-04-11	Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method	Tashmoy Ghosh et.al.	2404.07649v1	null
2024-04-11	GLID: Pre-training a Generalist Encoder-Decoder Vision Model	Jihao Liu et.al.	2404.07603v1	null
2024-04-10	Measuring proximity to standard planes during fetal brain ultrasound scanning	Chiara Di Vece et.al.	2404.07124v1	null
2024-04-10	MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints	Bedirhan Uguz et.al.	2404.07094v1	null
2024-04-10	Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting	Xiaolei Lang et.al.	2404.06926v1	null
2024-04-09	Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences	Axel Barroso-Laguna et.al.	2404.06337v1	link
2024-04-09	Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes	Tianchen Deng et.al.	2404.06050v1	null
2024-04-08	Learning 3D-Aware GANs from Unposed Images with Template Feature Field	Xinya Chen et.al.	2404.05705v1	null
2024-04-08	Learning a Category-level Object Pose Estimator without Pose Annotations	Fengrui Tian et.al.	2404.05626v1	null
2024-04-08	DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker	Jiapeng Wu et.al.	2404.05518v1	link
2024-04-08	Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks	Maksym Ivashechkin et.al.	2404.05414v1	null
2024-04-08	STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs	Kush Hari et.al.	2404.05151v1	null
2024-04-05	ToolEENet: Tool Affordance 6D Pose Estimation	Yunlong Wang et.al.	2404.04193v1	null
2024-04-04	SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation	Sichen Chen et.al.	2404.03518v1	link
2024-04-04	Multi Positive Contrastive Learning with Pose-Consistent Generated Images	Sho Inayoshi et.al.	2404.03256v1	null
2024-04-04	HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud	Wencan Cheng et.al.	2404.03159v1	link
2024-04-03	Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones	Luca Crupi et.al.	2404.02567v1	null
2024-04-03	Semi-Supervised Unconstrained Head Pose Estimation in the Wild	Huayi Zhou et.al.	2404.02544v1	link
2024-04-02	3D Congealing: 3D-Aware Image Alignment in the Wild	Yunzhi Zhang et.al.	2404.02125v1	null
2024-04-02	SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation	Vinkle Srivastav et.al.	2404.02041v1	link
2024-04-01	Marrying NeRF with Feature Matching for One-step Pose Estimation	Ronghan Chen et.al.	2404.00891v1	null
2024-03-31	Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation	Meisam Kabiri et.al.	2404.00691v1	null
2024-03-31	OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos	Dongyoung Choi et.al.	2404.00676v1	null
2024-04-02	KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation	Jihua Peng et.al.	2404.00658v2	link
2024-03-29	FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model	Molin Zhang et.al.	2404.00132v1	null
2024-03-29	Latent Embedding Clustering for Occlusion Robust Head Pose Estimation	José Celestino et.al.	2403.20251v1	null
2024-03-29	A Unified Framework for Human-centric Point Cloud Video Understanding	Yiteng Xu et.al.	2403.20031v1	null
2024-04-01	Video-Based Human Pose Regression via Decoupled Space-Time Aggregation	Jijie He et.al.	2403.19926v2	link
2024-03-28	Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation	Xiao Lin et.al.	2403.19527v1	link
2024-03-27	Object Pose Estimation via the Aggregation of Diffusion Features	Tianfu Wang et.al.	2403.18791v1	link
2024-03-27	RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation	Yang Tian et.al.	2403.18259v1	null
2024-03-26	Mathematical Foundation and Corrections for Full Range Head Pose Estimation	Huei-Chung Hu et.al.	2403.18104v1	null
2024-03-26	EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation	Chenhongyi Yang et.al.	2403.18080v1	link
2024-03-26	A Survey on 3D Egocentric Human Pose Estimation	Md Mushfiqur Azam et.al.	2403.17893v1	link
2024-03-26	GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction	Hrishav Bakul Barua et.al.	2403.17837v1	link
2024-03-26	DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions	Sammy Christen et.al.	2403.17827v1	null
2024-03-26	System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners	Felix Esser et.al.	2403.17788v1	null
2024-03-25	Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos	Remy Sabathier et.al.	2403.17103v1	link
2024-03-25	Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging	Mahdieh Dashtbani Moghari et.al.	2403.16490v1	null
2024-03-25	Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects	Zicong Fan et.al.	2403.16428v1	link
2024-03-25	A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups	Yixiao Ge et.al.	2403.16411v1	null
2024-03-25	ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation	Hannah Schieber et.al.	2403.16400v1	link
2024-03-24	KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments	Abdelrahman Younes et.al.	2403.16238v1	null
2024-03-24	Diffusion Model is a Good Pose Estimator from 3D RF-Vision	Junqiao Fan et.al.	2403.16198v1	null
2024-03-23	UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation	Yuliang Guo et.al.	2403.15705v1	link
2024-03-22	InterFusion: Text-Driven Generation of 3D Human-Object Interaction	Sisi Dai et.al.	2403.15612v1	link
2024-03-22	Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times	Sepehr Sabeti et.al.	2403.15571v1	null
2024-03-22	Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications	Vít Krátký et.al.	2403.15333v1	null
2024-03-22	WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization	Jialu Wang et.al.	2403.15272v1	null
2024-03-22	DITTO: Demonstration Imitation by Trajectory Transformation	Nick Heppert et.al.	2403.15203v1	null
2024-03-22	Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	Bumsoo Kim et.al.	2403.15048v1	null
2024-03-22	Trajectory Regularization Enhances Self-Supervised Geometric Representation	Jiayun Wang et.al.	2403.14973v1	link
2024-03-21	VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding	Ahmad Mahmood et.al.	2403.14743v1	link
2024-03-21	Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation	Ruyi Lian et.al.	2403.14559v1	null
2024-03-23	Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset	Andrea Avogaro et.al.	2403.14447v2	null
2024-03-21	Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests	Haedam Oh et.al.	2403.14326v1	null
2024-03-21	Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation	Francesco Di Felice et.al.	2403.14279v1	null
2024-03-20	DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses	Chen Zhao et.al.	2403.13683v1	link
2024-03-20	Meta-Point Learning and Refining for Category-Agnostic Pose Estimation	Junjie Chen et.al.	2403.13647v1	link
2024-03-20	Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery	Mayura Manawadu et.al.	2403.13434v1	null
2024-03-20	DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation	Yamin Mao et.al.	2403.13405v1	null
2024-03-20	ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics	Qiaojun Yu et.al.	2403.13365v1	null
2024-03-20	MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination	Weiying Wang et.al.	2403.13348v1	null
2024-03-19	FaceXFormer: A Unified Transformer for Facial Analysis	Kartik Narayan et.al.	2403.12960v1	link
2024-03-19	WHAC: World-grounded Humans and Cameras	Wanqi Yin et.al.	2403.12959v1	link
2024-03-19	Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation	Jingtao Sun et.al.	2403.12728v1	link
2024-03-19	IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model	Matteo Bortolon et.al.	2403.12682v1	null
2024-03-19	In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing	Mingrui Yu et.al.	2403.12676v1	null
2024-03-19	Self-learning Canonical Space for Multi-view 3D Human Pose Estimation	Xiaoben Li et.al.	2403.12440v1	null
2024-03-20	Human Mesh Recovery from Arbitrary Multi-view Images	Xiaoben Li et.al.	2403.12434v2	link
2024-03-19	XPose: eXplainable Human Pose Estimation	Luyu Qiu et.al.	2403.12370v1	null
2024-03-18	HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data	Mengqi Zhang et.al.	2403.12011v1	null
2024-03-18	Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction	Wolfgang Fuhl et.al.	2403.11665v1	null
2024-03-18	An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation	Zewen Xu et.al.	2403.11639v1	null
2024-03-18	LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models	Yang Yang et.al.	2403.11627v1	link
2024-03-18	GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects	Sungphill Moon et.al.	2403.11510v1	null
2024-03-17	A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation	Qucheng Peng et.al.	2403.11310v1	link
2024-03-17	Compact 3D Gaussian Splatting For Dense Visual SLAM	Tianchen Deng et.al.	2403.11247v1	link
2024-03-16	Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty	Lakshadeep Naik et.al.	2403.10874v1	null
2024-03-16	DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation	Christopher Kolios et.al.	2403.10773v1	null
2024-03-15	GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation	Dingding Cai et.al.	2403.10683v1	null
2024-03-15	CLOSURE: Fast Quantification of Pose Uncertainty Sets	Yihuai Gao et.al.	2403.09990v1	null
2024-03-14	ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image	Fangqiang Ding et.al.	2403.09871v1	null
2024-03-14	BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects	Tomas Hodan et.al.	2403.09799v1	null
2024-03-14	Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR	Sebastián Barbas Laina et.al.	2403.09596v1	null
2024-03-14	Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting	Pawel Knap et.al.	2403.09437v1	null
2024-03-14	LM2D: Lyrics- and Music-Driven Dance Synthesis	Wenjie Yin et.al.	2403.09407v1	null
2024-03-14	SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios	Ding-Tao Huang et.al.	2403.09317v1	link
2024-03-14	MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion	Arul Selvam Periyasamy et.al.	2403.09309v1	null
2024-03-13	Data Augmentation in Human-Centric Vision	Wentao Jiang et.al.	2403.08650v1	null
2024-03-15	PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections	Matteo Taiana et.al.	2403.08586v2	null
2024-03-13	NeRF-Supervised Feature Point Detection and Description	Ali Youssef et.al.	2403.08156v1	link
2024-03-12	Q-SLAM: Quadric Representations for Monocular SLAM	Chensheng Peng et.al.	2403.08125v1	null
2024-03-12	MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation	Yuelong Li et.al.	2403.08019v1	link
2024-03-12	Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation	Kira Wursthorn et.al.	2403.07741v1	null
2024-03-12	Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving	JunDa Cheng et.al.	2403.07535v1	link
2024-03-12	Category-Agnostic Pose Estimation for Point Clouds	Bowen Liu et.al.	2403.07437v1	null
2024-03-12	Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery	Yike Zhang et.al.	2403.07219v1	null
2024-03-11	Real-Time Simulated Avatar from Head-Mounted Sensors	Zhengyi Luo et.al.	2403.06862v1	null
2024-03-11	Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition	Erkut Akdag et.al.	2403.06577v1	null
2024-03-10	Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation	Paweł A. Pierzchlewicz et.al.	2403.06164v1	link
2024-03-10	Diffusion Models Trained with Large Data Are Transferable Visual Models	Guangkai Xu et.al.	2403.06090v1	link
2024-03-08	Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm	Ziyu Zhang et.al.	2403.05666v1	null
2024-03-11	Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation	Tarek Bouazza et.al.	2403.05450v2	null
2024-03-07	Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps	Ivana Collado-Gonzalez et.al.	2403.04936v1	null
2024-03-07	That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation	Georgi Pramatarov et.al.	2403.04755v1	null
2024-03-07	Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser	Qingyuan Cai et.al.	2403.04444v1	link
2024-03-09	Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation	Ruicong Liu et.al.	2403.04381v2	link
2024-03-05	FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation	Chris Rockwell et.al.	2403.03221v1	null
2024-03-05	NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors	Yannan He et.al.	2403.03122v1	null
2024-03-05	Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection	Mohamed Afifi et.al.	2403.03111v1	null
2024-03-05	Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps	Timothy Chen et.al.	2403.02751v1	null
2024-03-04	PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station	Cunyi Yin et.al.	2403.01913v1	link
2024-03-04	A Simple Baseline for Efficient Hand Mesh Reconstruction	Zhishan Zhou et.al.	2403.01813v1	null
2024-03-03	MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images	Junwen Huang et.al.	2403.01517v1	null
2024-03-02	Single-image camera calibration with model-free distortion correction	Katia Genovese et.al.	2403.01263v1	null
2024-03-02	Grid-based Fast and Structural Visual Odometry	Zhang Zhihe et.al.	2403.01110v1	null
2024-03-01	Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations	Syed Shabbir Ahmed et.al.	2403.00988v1	null
2024-03-04	TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity	Sangwoon Kim et.al.	2403.00049v2	null
2024-03-01	Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach	Sarina Thomas et.al.	2402.19062v2	null
2024-02-29	Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey	Yang Liu et.al.	2402.18844v1	link
2024-02-28	Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting	Taeho Kang et.al.	2402.18330v1	link
2024-02-28	Location-guided Head Pose Estimation for Fisheye Image	Bing Li et.al.	2402.18320v1	null
2024-02-28	NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images	Jingrui Yu et.al.	2402.18196v1	link
2024-02-28	Six-Point Method for Multi-Camera Systems with Reduced Solution Space	Banglei Guan et.al.	2402.18066v1	link
2024-02-27	Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association	Zhaoying Wang et.al.	2402.17504v1	null
2024-02-26	HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields	Haozhe Qi et.al.	2402.17062v1	link
2024-02-26	DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation	Shang Wu et.al.	2402.16640v1	null
2024-02-26	GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video	Xinqi Liu et.al.	2402.16607v1	null
2024-02-26	DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer	Yizhe Wu et.al.	2402.16308v1	null
2024-02-25	XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras	Arnav Mishra et.al.	2402.16175v1	null
2024-02-25	VOLoc: Visual Place Recognition by Querying Compressed Lidar Map	Xudong Cai et.al.	2402.15961v1	link
2024-02-24	CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge	Xiao Lin et.al.	2402.15726v1	null
2024-02-23	Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones	Matteo Risso et.al.	2402.15273v1	null
2024-02-22	Cameras as Rays: Pose Estimation via Ray Diffusion	Jason Y. Zhang et.al.	2402.14817v1	null
2024-02-22	S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR	Jialun Pei et.al.	2402.14461v1	link
2024-02-22	VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning	Jingyao Li et.al.	2402.14456v1	null
2024-02-22	Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks	Daniel Holmberg et.al.	2402.14400v1	link
2024-02-22	Secure Navigation using Landmark-based Localization in a GPS-denied Environment	Ganesh Sapkota et.al.	2402.14280v1	null
2024-02-21	SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings	Rishabh Bajpai et.al.	2402.14143v1	null
2024-02-21	High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks	Luca Crupi et.al.	2402.13756v1	null
2024-02-21	EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization	Zhendong Xiao et.al.	2402.13537v1	null
2024-02-20	DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation	Takuya Ikeda et.al.	2402.12647v1	link
2024-02-19	Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment	Ganesh Sapkota et.al.	2402.12551v1	null
2024-02-18	Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training	Huayi Zhou et.al.	2402.11566v1	link
2024-02-17	Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review	Merryn D. Constable et.al.	2402.11288v1	null
2024-02-17	Dense Matchers for Dense Tracking	Tomáš Jelínek et.al.	2402.11287v1	null
2024-02-16	Occlusion Resilient 3D Human Pose Estimation	Soumava Kumar Roy et.al.	2402.11036v1	null
2024-02-16	3D Diffuser Actor: Policy Diffusion with 3D Scene Representations	Tsung-Wei Ke et.al.	2402.10885v1	null
2024-02-15	Lester: rotoscope animation through video object segmentation and tracking	Ruben Tous et.al.	2402.09883v1	link
2024-02-15	Foul prediction with estimated poses from soccer broadcast video	Jiale Fang et.al.	2402.09650v1	null
2024-02-16	IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture	Varun Ramani et.al.	2402.08923v2	null
2024-02-13	Are Semi-Dense Detector-Free Methods Good at Matching Local Features?	Matthieu Vilain et.al.	2402.08671v1	null
2024-02-13	Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities	Syed S. Ahmed et.al.	2402.08566v1	null
2024-02-13	Learning to Produce Semi-dense Correspondences for Visual Localization	Khang Truong Giang et.al.	2402.08359v1	link
2024-02-12	Extending 3D body pose estimation for robotic-assistive therapies of autistic children	Laura Santos et.al.	2402.08006v1	null
2024-02-12	GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance	Shiyu Li et.al.	2402.07677v1	link
2024-02-12	UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments	Ahmed Radwan et.al.	2402.07537v1	null
2024-02-09	Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation	Peter Hönig et.al.	2402.06436v1	null
2024-02-08	Real-time Holistic Robot Pose Estimation with Unknown States	Shikun Ban et.al.	2402.05655v1	link
2024-02-08	Extending 6D Object Pose Estimators for Stereo Vision	Thomas Pöllabauer et.al.	2402.05610v1	null
2024-02-09	NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction	Zhongqun Zhang et.al.	2402.05532v2	null
2024-02-07	Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training	Thomas Pöllabauer et.al.	2402.04979v1	null
2024-02-07	4-Dimensional deformation part model for pose estimation using Kalman filter constraints	Enrique Martinez-Berti et.al.	2402.04953v1	null
2024-02-07	STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation	Peter Hönig et.al.	2402.04878v1	link
2024-02-05	A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model	Murad Hasan et.al.	2402.03417v1	null
2024-02-05	SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM	Mingrui Li et.al.	2402.03246v1	link
2024-02-05	Extreme Two-View Geometry From Object Poses with Diffusion Models	Yujing Sun et.al.	2402.02800v1	link
2024-02-04	Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation	Ti Wang et.al.	2402.02339v1	null
2024-02-01	mmID: High-Resolution mmWave Imaging for Human Identification	Sakila S. Jayaweera et.al.	2402.00996v1	null
2024-02-01	In-Bed Pose Estimation: A Review	Ziya Ata Yazıcı et.al.	2402.00700v1	null
2024-02-01	WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness	Mateus Valverde Gasparino et.al.	2402.00683v1	link
2024-02-02	CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration	Daniele Cattaneo et.al.	2402.00129v2	null
2024-01-31	Improved Scene Landmark Detection for Camera Localization	Tien Do et.al.	2401.18083v1	link
2024-01-30	Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics	Nastaran Darabi et.al.	2401.17481v1	null
2024-01-30	MESA: Matching Everything by Segmenting Anything	Yesheng Zhang et.al.	2401.16741v1	null
2024-01-30	Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers	Jianbin Jiao et.al.	2401.16700v1	link
2024-01-29	Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation	Jaewoo Park et.al.	2401.16284v1	null
2024-01-29	Reconstructing Close Human Interactions from Multiple Views	Qing Shuai et.al.	2401.16173v1	link
2024-01-28	Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras	Yu-Jhe Li et.al.	2401.15616v1	null
2024-01-30	Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization	Kihoon Shin et.al.	2401.15313v2	null
2024-01-26	Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones	Beatrice Alessandra Motetti et.al.	2401.15236v1	null
2024-01-26	SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras	Hanz Cuevas-Velasquez et.al.	2401.14785v1	null
2024-01-24	Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter	Dongmyoung Lee et.al.	2401.13405v1	null
2024-01-24	Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry	Qi Cai et.al.	2401.13357v1	null
2024-01-23	SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization	Mingyang Li et.al.	2401.13076v1	link
2024-01-24	RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos	Hongchi Xia et.al.	2401.12592v2	null
2024-01-26	MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR	Changkun Liu et.al.	2401.11511v2	null
2024-01-19	SCENES: Subpixel Correspondence Estimation With Epipolar Supervision	Dominik A. Kloepfer et.al.	2401.10886v1	null
2024-01-19	Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation	Prakhar Kaushik et.al.	2401.10848v1	null
2024-01-22	TEXterity: Tactile Extrinsic deXterity	Antonia Bronars et.al.	2401.10230v2	null
2024-01-18	Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework	Junkun Jiang et.al.	2401.09836v1	link
2024-01-17	DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing	Hao Qu et.al.	2401.09160v1	null
2024-01-17	PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency	Yue Pan et.al.	2401.09101v1	link
2024-01-16	AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization	Qi Liao et.al.	2401.08360v1	null
2024-01-16	S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera	Thanh Nguyen Canh et.al.	2401.08134v1	null
2024-01-15	Collaboratively Self-supervised Video Representation Learning for Action Recognition	Jie Zhang et.al.	2401.07584v1	null
2024-01-14	3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework	Fan Zhang et.al.	2401.07251v1	null
2024-01-11	On the representation and methodology for wide and short range head pose estimation	Alejandro Cobo et.al.	2401.05807v1	link
2024-01-10	Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects	Tianhang Cheng et.al.	2401.05236v1	link
2024-01-10	Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits	Helena Russello et.al.	2401.05202v1	null
2024-01-10	Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton	Hongbo Kang et.al.	2401.04921v1	link
2024-01-15	Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker	Jingtao Sun et.al.	2401.04377v2	link
2024-01-07	RHOBIN Challenge: Reconstruction of Human Object Interaction	Xianghui Xie et.al.	2401.04143v1	null
2024-01-08	D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement	Danqi Yan et.al.	2401.03914v1	null
2024-01-07	Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems	Victor Adewopo et.al.	2401.03587v1	null
2024-01-04	Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications	Darshan Venkatrayappa et.al.	2401.02383v1	null
2024-01-04	Fit-NGP: Fitting Object Models to Neural Graphics Primitives	Marwan Taher et.al.	2401.02357v1	null
2024-01-04	PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation	Lukas Meyer et.al.	2401.02281v1	link
2024-01-03	Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique	Ekram Alam et.al.	2401.01587v1	link
2024-01-05	PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization	Jiaming He et.al.	2401.01081v2	link
2023-12-30	3D Human Pose Perception from Egocentric Stereo Videos	Hiroyasu Akada et.al.	2401.00889v1	null
2024-01-01	Geometry Depth Consistency in RGBD Relative Pose Estimation	Sourav Kumar et.al.	2401.00639v1	null
2023-12-30	A comprehensive framework for occluded human pose estimation	Linhao Xu et.al.	2401.00155v1	null
2024-01-02	6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation	Li Xu et.al.	2401.00029v2	null
2023-12-29	MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments	Andrew Fishberg et.al.	2312.17731v1	link
2023-12-28	iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views	Chin-Hsuan Wu et.al.	2312.17250v1	link
2023-12-28	EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion	Jianping Jiang et.al.	2312.16933v1	null
2023-12-28	SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction	Zikang Yuan et.al.	2312.16800v1	link
2023-12-28	L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry	Feiya Li et.al.	2312.16787v1	null
2023-12-27	HMP: Hand Motion Priors for Pose and Shape Estimation from Video	Enes Duran et.al.	2312.16737v1	null
2023-12-27	Camera calibration for the surround-view system: a benchmark and dataset	L Qin et.al.	2312.16499v1	null
2023-12-24	TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions	Rohit Lal et.al.	2312.16221v1	link
2023-12-26	Graph Context Transformation Learning for Progressive Correspondence Pruning	Junwen Guo et.al.	2312.15971v1	link
2023-12-25	Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation	Feng Zhou et.al.	2312.15636v1	null
2023-12-25	APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond	Yuxiang Yang et.al.	2312.15612v1	link
2023-12-23	PACE: Pose Annotations in Cluttered Environments	Yang You et.al.	2312.15130v1	link
2023-12-22	PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF	Mohsen Gholami et.al.	2312.14915v1	link
2023-12-22	Harnessing Diffusion Models for Visual Perception with Meta Prompts	Qiang Wan et.al.	2312.14733v1	link
2023-12-22	Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization	Joaquin Rodriguez et.al.	2312.14697v1	link
2023-12-22	PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer	Neha Sengar et.al.	2312.14577v1	null
2023-12-22	Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning	Jay Shenoy et.al.	2312.14432v1	null
2023-12-21	3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera	Christen Millerdurai et.al.	2312.14157v1	null
2023-12-21	DUSt3R: Geometric 3D Vision Made Easy	Shuzhe Wang et.al.	2312.14132v1	link
2023-12-20	NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields	Jens Naumann et.al.	2312.13471v1	null
2023-12-20	Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach	Habib Boloorchi Tabrizi et.al.	2312.13162v1	link
2023-12-18	Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics	Yesukhei Jagvaral et.al.	2312.11707v1	null
2023-12-18	Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface	Vicu-Mihalis Maer et.al.	2312.11401v1	null
2023-12-17	SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation	Xiaoqi An et.al.	2312.10758v1	link
2023-12-17	PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields	Boming Zhao et.al.	2312.10649v1	null
2023-12-15	SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation	David C. Jeong et.al.	2312.10195v1	link
2023-12-14	iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching	Yuan Sun et.al.	2312.09031v1	null
2023-12-14	Scene 3-D Reconstruction System in Scattering Medium	Zhuoyifan Zhang et.al.	2312.09005v1	null
2023-12-14	CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming	Kian Eng Ong et.al.	2312.08764v1	link
2023-12-20	PnP for Two-Dimensional Pose Estimation	Joshua Wang et.al.	2312.08488v2	link
2023-12-13	Pose and shear-based tactile servoing	John Lloyd et.al.	2312.08411v1	null
2023-12-13	FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects	Bowen Wen et.al.	2312.08344v1	link
2023-12-13	Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation	Arul Selvam Periyasamy et.al.	2312.08268v1	null
2023-12-13	CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation	Eugenio Chisari et.al.	2312.08240v1	null
2023-12-13	C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation	Florian Fervers et.al.	2312.08060v1	null
2023-12-13	Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation	Jingwei Yang et.al.	2312.07964v1	null
2023-12-13	Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users	Tianxun Zhou et.al.	2312.07854v1	null
2023-12-12	RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation	Peng Lu et.al.	2312.07526v1	link
2023-12-12	COLMAP-Free 3D Gaussian Splatting	Yang Fu et.al.	2312.07504v1	null
2023-12-12	RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments	Pavel Petracek et.al.	2312.07337v1	link
2023-12-12	Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs	Sunghwan Hong et.al.	2312.07246v1	link
2023-12-12	Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation	Yuchen Yang et.al.	2312.07051v1	link
2023-12-12	Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation	Nikhil Kashyap et.al.	2312.06965v1	null
2023-12-12	Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice	Soham Bafana et.al.	2312.06914v1	link
2023-12-11	Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach	Travis Driver et.al.	2312.06865v1	link
2023-12-11	Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input	Trung-Hieu Hoang et.al.	2312.06797v1	null
2023-12-11	3D Hand Pose Estimation in Egocentric Images in the Wild	Aditya Prakash et.al.	2312.06583v1	null
2023-12-11	PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation	Zhiyu Pan et.al.	2312.06409v1	null
2023-12-11	ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation	Cédric Rommel et.al.	2312.06386v1	link
2023-12-10	From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation	Javier Tirado-Garín et.al.	2312.05995v1	link
2023-12-09	You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception	Sheng Jin et.al.	2312.05525v1	link
2023-12-07	Image and AIS Data Fusion Technique for Maritime Computer Vision Applications	Emre Gülsoylu et.al.	2312.05270v1	link
2023-12-07	Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection	Kohei Yamashita et.al.	2312.04527v1	null
2023-12-07	Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images	Yiqun Zhang et.al.	2312.04236v1	null
2023-12-06	Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning	Xinshun Wang et.al.	2312.03703v1	link
2023-12-06	Cooperative Probabilistic Trajectory Forecasting under Occlusion	Anshul Nayak et.al.	2312.03296v1	null
2023-12-05	A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis	Niccolò Bisagno et.al.	2312.02613v1	null
2023-12-05	6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation	K. Samarawickrama et.al.	2312.02593v1	link
2023-12-05	PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation	Geonhyup Lee et.al.	2312.02531v1	null
2023-12-04	GenEM: Physics-Informed Generative Cryo-Electron Microscopy	Jiakai Zhang et.al.	2312.02235v1	null
2023-12-02	Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors	Yu Zhang et.al.	2312.02196v1	link
2023-12-04	iMatching: Imperative Correspondence Learning	Zitong Zhan et.al.	2312.02141v1	link
2023-12-04	SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM	Nikhil Keetha et.al.	2312.02126v1	link
2023-12-04	Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection	Xubin Zhong et.al.	2312.01713v1	null
2023-12-05	Hulk: A Universal Knowledge Translator for Human-Centric Tasks	Yizhou Wang et.al.	2312.01697v2	link
2023-12-04	Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks	Yan Xu et.al.	2312.01561v1	null
2023-12-01	Object 6D pose estimation meets zero-shot learning	Andrea Caraffa et.al.	2312.00947v1	null
2023-12-01	Open-vocabulary object 6D pose estimation	Jaime Corsetti et.al.	2312.00690v1	null
2023-12-01	Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras	Mohammad Altillawi et.al.	2312.00500v1	null
2023-12-01	Learning Unorthogonalized Matrices for Rotation Estimation	Kerui Gu et.al.	2312.00462v1	null
2023-11-30	PoseGPT: Chatting about 3D Human Pose	Yao Feng et.al.	2311.18836v1	null
2023-11-30	FoundPose: Unseen Object Pose Estimation with Foundation Features	Evin Pınar Örnek et.al.	2311.18809v1	null
2023-11-30	Pose Estimation and Tracking for ASIST	Ari Goodman et.al.	2311.18665v1	null
2023-11-29	A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem	Wolfgang Hoegele et.al.	2311.18107v1	null
2023-11-29	Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation	Or Hirschorn et.al.	2311.17891v1	link
2023-11-29	Cinematic Behavior Transfer via NeRF-based Differentiable Filming	Xuekun Jiang et.al.	2311.17754v1	null
2023-11-29	PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens	Sebastian Stapf et.al.	2311.17504v1	null
2023-11-28	On the Calibration of Human Pose Estimation	Kerui Gu et.al.	2311.17105v1	null
2023-11-28	Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence	Junyi Zhang et.al.	2311.17034v1	link
2023-11-28	HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors	Shutong Zhang et.al.	2311.16552v1	null
2023-11-28	Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement	Jian Wang et.al.	2311.16495v1	null
2023-11-24	UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning	Zhongyu Jiang et.al.	2311.16477v1	null
2023-11-27	DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization	Zhaoyang Xia et.al.	2311.16060v1	link
2023-11-27	Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid	Yukai Tang et.al.	2311.15962v1	null
2023-11-27	Computer Vision for Carriers: PATRIOT	Ari Goodman et.al.	2311.15914v1	null
2023-11-27	SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation	Jiehong Lin et.al.	2311.15707v1	link
2023-11-24	RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling	Xiaoyue Wan et.al.	2311.14242v1	null
2023-11-23	Appearance-based gaze estimation enhanced with synthetic images using deep neural networks	Dmytro Herashchenko et.al.	2311.14175v1	link
2023-11-23	GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence	Van Nguyen Nguyen et.al.	2311.14155v1	link
2023-11-23	GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence	Pengyuan Wang et.al.	2311.13777v1	null
2023-11-22	HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation	Chengpeng Wu et.al.	2311.13615v1	link
2023-11-24	Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor	Di Wang et.al.	2311.12778v2	null
2023-11-21	HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation	Yongliang Lin et.al.	2311.12588v1	link
2023-11-21	CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems	Young-Hee Lee et.al.	2311.12580v1	null
2023-11-21	HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling	Afshin Bozorgpour et.al.	2311.12486v1	link
2023-11-21	Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency	Christian Keilstrup Ingwersen et.al.	2311.12421v1	null
2023-11-20	Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models	Pooya Fayyazsanavi et.al.	2311.12128v1	link
2023-11-20	Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation	Wenhao Li et.al.	2311.12028v1	link
2023-11-20	SniffyArt: The Dataset of Smelling Persons	Mathias Zinnen et.al.	2311.11888v1	null
2023-11-21	Robot Hand-Eye Calibration using Structure-from-Motion	Nicolas Andreff et.al.	2311.11808v2	null
2023-11-18	SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation	Yamei Chen et.al.	2311.11125v1	link
2023-11-18	Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment	Parth Rawal et.al.	2311.11039v1	null
2023-11-18	Multiple View Geometry Transformers for 3D Human Pose Estimation	Ziwei Liao et.al.	2311.10983v1	link
2023-11-18	Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process	Zixun Huang et.al.	2311.10918v1	null
2023-11-17	BiHRNet: A Binary high-resolution network for Human Pose Estimation	Zhicheng Zhang et.al.	2311.10296v1	null
2023-11-16	Match and Locate: low-frequency monocular odometry based on deep feature matching	Stepan Konev et.al.	2311.10034v1	null
2023-11-16	LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters	Yibin Wu et.al.	2311.09887v1	link
2023-11-16	Improved TokenPose with Sparsity	Anning Li et.al.	2311.09653v1	null
2023-11-16	Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation	Yangzheng Wu et.al.	2311.09500v1	null
2023-11-15	NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios	En-Te Lin et.al.	2311.09269v1	link
2023-11-15	Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation	Abhishek Goudar et.al.	2311.09056v1	link
2023-11-14	LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping	Sujal Vijayaraghavan et.al.	2311.08438v1	null
2023-11-13	SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models	Ziyi Lin et.al.	2311.07575v1	link
2023-11-13	Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers	Luca Lach et.al.	2311.07257v1	link
2023-11-10	CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM	Ruben Sanchez-Garcia et.al.	2311.06194v1	link
2023-11-10	2D Image head pose estimation via latent space regression under occlusion settings	José Celestino et.al.	2311.06038v1	link
2023-11-10	Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous	Ziwei Wang et.al.	2311.05992v1	null
2023-11-10	A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries	Stefan Zellmann et.al.	2311.05887v1	link
2023-11-09	Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking	Mederic Fourmy et.al.	2311.05344v1	null
2023-11-09	Spatial Attention-based Distribution Integration Network for Human Pose Estimation	Sihan Gao et.al.	2311.05323v1	null
2023-11-09	SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing	Arunkumar Rathinam et.al.	2311.05310v1	null
2023-11-09	Differentiable Cloth Parameter Identification and State Estimation in Manipulation	Dongzhe Zheng et.al.	2311.05141v1	null
2023-11-09	POISE: Pose Guided Human Silhouette Extraction under Occlusions	Arindam Dutta et.al.	2311.05077v1	link
2023-11-08	Active Transfer Learning for Efficient Video-Specific Human Pose Estimation	Hiromu Taketsugu et.al.	2311.05041v1	link
2023-11-08	3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud	Jianchao Ci et.al.	2311.04699v1	null
2023-11-09	Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations	Xiaoting Yin et.al.	2311.04591v2	link
2023-11-08	Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images	Nishant Jain et.al.	2311.04521v1	null
2023-11-08	PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points	Tong Hua et.al.	2311.04477v1	null
2023-11-08	UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields	Injae Kim et.al.	2311.03784v2	link
2023-11-06	A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation	Qitao Zhao et.al.	2311.03312v1	null
2023-11-06	Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction	Silvia Romero-Azpitarte et.al.	2311.03146v1	null
2023-11-06	Simultaneous Time Synchronization and Mutual Localization for Multi-robot System	Xiangyong Wen et.al.	2311.02948v1	null
2023-11-06	Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation	Xueyan Oh et.al.	2311.02900v1	null
2023-11-06	Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning	Nobline Yoo et.al.	2311.02815v1	link
2023-11-03	Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression	Jiaqi Wu et.al.	2311.01782v1	link
2023-11-03	Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation	Jiaqi Wu et.al.	2311.01770v1	null
2023-11-02	Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors	Gabriele M. Caddeo et.al.	2311.01380v1	link
2023-11-01	A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios	Wenyang Hu et.al.	2311.00401v1	null
2023-10-31	HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception	Junkun Yuan et.al.	2310.20695v1	link
2023-10-31	Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior	Qingqing Zhao et.al.	2310.20249v1	null
2023-10-30	FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound	Chaoyu Chen et.al.	2310.19293v1	null
2023-10-29	Distributed Nonlinear Filtering using Triangular Transport Maps	Daniel Grange et.al.	2310.19000v1	null
2023-10-29	TIC-TAC: A Framework To Learn And Evaluate Your Covariance	Megh Shukla et.al.	2310.18953v1	link
2023-10-29	Improving Multi-Person Pose Tracking with A Confidence Network	Zehua Fu et.al.	2310.18920v1	null
2023-10-29	HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration	Weiyi Xue et.al.	2310.18874v1	null
2023-10-28	Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process	Xiao Hu et.al.	2310.18569v1	null
2023-10-27	ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation	Michael Zechmair et.al.	2310.18009v1	null
2023-10-26	Learning Extrinsic Dexterity with Parameterized Manipulation Primitives	Shih-Min Yang et.al.	2310.17785v1	null
2023-10-26	6-DoF Stability Field via Diffusion Models	Takuma Yoneda et.al.	2310.17649v1	null
2023-10-26	SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation	Haobo Jiang et.al.	2310.17359v1	null
2023-10-26	Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs	Ryota Tanaka et.al.	2310.17193v1	link
2023-10-25	Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers	Gerald Ebmer et.al.	2310.16618v1	null
2023-10-25	ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors	Xiaoxuan Ma et.al.	2310.16447v1	link
2023-10-25	MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network	Soroush Mehraban et.al.	2310.16288v1	link
2023-10-25	TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer	Xiao Lin et.al.	2310.16279v1	null
2023-10-23	Converting Depth Images and Point Clouds for Feature-based Pose Estimation	Robert Lösch et.al.	2310.14924v1	link
2023-10-23	Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings	Hazem Youssef et.al.	2310.14914v1	null
2023-10-23	Player Re-Identification Using Body Part Appearences	Mahesh Bhosale et.al.	2310.14469v1	null
2023-10-20	LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly	Bowen Fu et.al.	2310.13819v1	null
2023-10-20	FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer	Xinyu Zhang et.al.	2310.13605v1	null
2023-10-20	ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation	Zhehan Li et.al.	2310.13324v1	link
2023-10-20	CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants	Shaoan Wang et.al.	2310.13320v1	link
2023-10-19	Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey	Lijuan Zhou et.al.	2310.13039v1	null
2023-10-19	FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects	Mayank Lunayach et.al.	2310.12974v1	link
2023-10-18	Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation	Bosang Kim et.al.	2310.12189v1	null
2023-10-18	One-Shot Imitation Learning: A Pose Estimation Perspective	Pietro Vitiello et.al.	2310.12077v1	null
2023-10-18	ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map	Ahmed Tawfik Aboukhadra et.al.	2310.11811v1	null
2023-10-17	Holistic Parking Slot Detection with Polygon-Shaped Representations	Lihao Wang et.al.	2310.11629v1	null
2023-10-17	Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication	Chelsey Edge et.al.	2310.11536v1	null
2023-10-18	AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths	Jiaxin Wei et.al.	2310.09982v2	link
2023-10-15	Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior	Xiaotong Chen et.al.	2310.09956v1	null
2023-10-15	Socially reactive navigation models for mobile robots in dynamic environments	Ricarte Ribeiro et.al.	2310.09916v1	link
2023-10-15	MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection	David C. Jeong et.al.	2310.09757v1	link
2023-10-16	IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints	Mohammed Ayman Shalaby et.al.	2310.08686v2	null
2023-10-12	Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor	Ozdemir Can Kara et.al.	2310.08398v1	null
2023-10-12	Multimodal Active Measurement for Human Mesh Recovery in Close Proximity	Takahiro Maeda et.al.	2310.08116v1	link
2023-10-12	X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention	Yixuan Zhou et.al.	2310.08042v1	link
2023-10-12	PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction	Jia-Wang Bian et.al.	2310.07449v2	link
2023-10-11	SAGE-ICP: Semantic Information-Assisted ICP	Jiaming Cui et.al.	2310.07237v1	link
2023-10-11	DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation	Rong Wang et.al.	2310.07206v1	link
2023-10-12	FABind: Fast and Accurate Protein-Ligand Binding	Qizhi Pei et.al.	2310.06763v2	link
2023-10-10	EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation	Baichuan Huang et.al.	2310.06751v1	null
2023-10-09	Augmenting Vision-Based Human Pose Estimation with Rotation Matrix	Milad Vazan et.al.	2310.06068v1	null
2023-10-07	Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles	Elton F. de S. Soares et.al.	2310.04837v1	null
2023-10-10	1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction	Zhishan Zhou et.al.	2310.04769v2	null
2023-10-06	SwimXYZ: A large-scale dataset of synthetic swimming motions and videos	Fiche Guénolé et.al.	2310.04360v1	null
2023-10-05	BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields	Ágoston István Csehi et.al.	2310.03563v1	null
2023-10-05	3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation	Chen Zhao et.al.	2310.03534v1	null
2023-10-05	RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation	Boshi An et.al.	2310.03478v1	null
2023-10-05	Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code	Hongwei Li et.al.	2310.03470v1	null
2023-10-04	Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC	Hongyi Fan et.al.	2310.02719v1	null
2023-10-05	USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields	Moyang Li et.al.	2310.02687v2	link
2023-10-03	Beyond the Benchmark: Detecting Diverse Anomalies in Videos	Yoav Arad et.al.	2310.01904v1	link
2023-10-03	MFOS: Model-Free & One-Shot Object Pose Estimation	JongMin Lee et.al.	2310.01897v1	null
2023-10-02	LEAP: Liberate Sparse-view 3D Modeling from Camera Poses	Hanwen Jiang et.al.	2310.01410v1	link
2023-10-02	H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation	Yanjie Ze et.al.	2310.01404v1	link
2023-10-04	Self-supervised Learning of Contextualized Local Visual Embeddings	Thalles Santos Silva et.al.	2310.00527v3	link
2023-09-30	Diff-DOPE: Differentiable Deep Object Pose Estimation	Jonathan Tremblay et.al.	2310.00463v1	null
2023-09-29	Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration	Jungseok Hong et.al.	2310.00146v1	null
2023-09-29	Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation	Zhuoran Yu et.al.	2310.00099v1	null
2023-09-29	Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head	Qian Wu et.al.	2309.17143v1	link
2023-09-29	AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi	Yunjiao Zhou et.al.	2309.16964v1	null
2023-09-28	End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon	Guillaume Bono et.al.	2309.16634v1	null
2023-09-28	Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task	Frederik Hagelskjær et.al.	2309.16221v1	null
2023-09-28	Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing	Lu Dai et.al.	2309.16189v1	null
2023-09-28	Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation	Sameer Pai et.al.	2309.16170v1	null
2023-09-28	CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting	Shaoxiang Guo et.al.	2309.16140v1	null
2023-09-28	A Modular Bio-inspired Robotic Hand with High Sensitivity	Chao Liu et.al.	2309.16081v1	null
2023-09-27	Handbook on Leveraging Lines for Two-View Relative Pose Estimation	Petr Hruby et.al.	2309.16040v1	null
2023-09-27	Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature	Shengze Jin et.al.	2309.16023v1	null
2023-09-27	Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range	Xinran Li et.al.	2309.15367v1	null
2023-09-26	Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone	Peter Hardy et.al.	2309.14865v1	null
2023-09-26	Learning Vision-Based Bipedal Locomotion for Challenging Terrain	Helei Duan et.al.	2309.14594v1	null
2023-09-25	Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators	Yinan Meng et.al.	2309.14279v1	null
2023-09-25	Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics	Philipp Quentin et.al.	2309.14265v1	null
2023-09-25	BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation	Uyoung Jeong et.al.	2309.14072v1	link
2023-09-24	Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset	Zixun Huang et.al.	2309.13570v1	link
2023-09-21	ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding	Yu Cheng et.al.	2309.12183v1	null
2023-09-21	ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers	Philipp Ausserlechner et.al.	2309.11986v1	null
2023-09-21	Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views	Taeho Kang et.al.	2309.11962v1	link
2023-09-21	A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose	Qingtian Wu et.al.	2309.11773v1	null
2023-09-20	Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation	Krishna Kanth Nakka et.al.	2309.11667v1	null
2023-09-20	Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering	Tae Ha Park et.al.	2309.11645v1	null
2023-09-20	OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving	Heng Li et.al.	2309.11011v1	link
2023-09-19	Language-Conditioned Affordance-Pose Detection in 3D Point Clouds	Toan Nguyen et.al.	2309.10911v1	null
2023-09-19	MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings	Surbhi Madan et.al.	2309.10765v1	link
2023-09-19	SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction	Anilkumar Swamy et.al.	2309.10748v1	null
2023-09-20	GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild	Simon Schaefer et.al.	2309.10369v2	null
2023-09-19	RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery	Jiaxin Wei et.al.	2309.10255v1	link
2023-09-18	Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation	Kathia Melbouci et.al.	2309.09934v1	null
2023-09-18	Application-driven Validation of Posteriors in Inverse Problems	Tim J. Adler et.al.	2309.09764v1	null
2023-09-18	RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy	Mert Asim Karaoglu et.al.	2309.09563v1	null
2023-09-18	Sparse and Privacy-enhanced Representation for Human Pose Estimation	Ting-Ying Lin et.al.	2309.09515v1	null
2023-09-19	RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation	Lijun Li et.al.	2309.09301v2	link
2023-09-16	Optimal Initialization Strategies for Range-Only Trajectory Estimation	Abhishek Goudar et.al.	2309.09011v1	null
2023-09-16	DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF	Mert Asim Karaoglu et.al.	2309.08927v1	link
2023-09-16	Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning	Pengyu Yin et.al.	2309.08914v1	link
2023-09-15	Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild	Sungchan Park et.al.	2309.08644v1	null
2023-09-15	YCB-Ev: Event-vision dataset for 6DoF object pose estimation	Pavel Rojtberg et.al.	2309.08482v1	link
2023-09-15	Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM	Chenghao Shi et.al.	2309.08086v1	null
2023-09-14	Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success	Gergely Sóti et.al.	2309.08040v1	null
2023-09-14	TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting	Rohan Choudhury et.al.	2309.07910v1	null
2023-09-14	Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation	Thorsten Hempel et.al.	2309.07654v1	link
2023-09-14	EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization	Minjung Kim et.al.	2309.07471v1	link
2023-09-14	Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images	Junyang Wu et.al.	2309.07390v1	null
2023-09-13	LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation	Peter Hardy et.al.	2309.07243v1	null
2023-09-13	3D Active Metric-Semantic SLAM	Yuezhan Tao et.al.	2309.06950v1	null
2023-09-11	ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion	Hongyu Li et.al.	2309.05662v1	null
2023-09-11	Towards Intuitive HMI for UAV Control	Filip Zoric et.al.	2309.05460v1	null
2023-09-12	FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild	Jiong Wang et.al.	2309.05073v2	link
2023-09-09	Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation	Boyuan Jiang et.al.	2309.04756v1	link
2023-09-09	Mirror-Aware Neural Humans	Daniel Ajisafe et.al.	2309.04750v1	link
2023-09-08	Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry	Akankshya Kar et.al.	2309.04147v1	null
2023-09-07	ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation	Hui Zhang et.al.	2309.03891v1	null
2023-09-05	An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria	Upender Kalwa et.al.	2309.03235v1	null
2023-09-05	A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments	W. Jacob Wagner et.al.	2309.02569v1	null
2023-09-05	GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction	Youmin Zhang et.al.	2309.02436v1	link
2023-09-05	DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation	Lei Zhou et.al.	2309.01925v1	link
2023-09-04	On the Query Strategies for Efficient Online Active Distillation	Michele Boldo et.al.	2309.01612v1	null
2023-09-04	DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion	Cédric Rommel et.al.	2309.01575v1	null
2023-09-06	Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation	Hanbing Liu et.al.	2309.01365v2	link
2023-09-04	SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras	Himanshu Pahadia et.al.	2309.01324v1	null
2023-09-03	BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking	Dorian F. Henning et.al.	2309.01236v1	null
2023-09-02	Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis	Jerrin Bright et.al.	2309.01010v1	null
2023-09-01	Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture	Shaohua Pan et.al.	2309.00310v1	link
2023-08-31	EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild	Manuel Kaufmann et.al.	2308.16894v1	link
2023-08-31	SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects	Ning Gao et.al.	2308.16528v1	null
2023-08-30	Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports	İrem Üstek et.al.	2308.16325v1	link
2023-08-30	SignDiff: Learning Diffusion Models for American Sign Language Production	Sen Fang et.al.	2308.16082v1	null
2023-08-30	Learning Structure-from-Motion with Graph Attention Networks	Lucas Brynte et.al.	2308.15984v1	link
2023-08-30	Reconstructing Groups of People with Hypergraph Relational Reasoning	Buzhen Huang et.al.	2308.15844v1	link
2023-08-29	3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking	Urs Waldmann et.al.	2308.15316v1	link
2023-08-29	Spatio-temporal MLP-graph network for 3D human pose estimation	Tanvir Hassan et.al.	2308.15313v1	link
2023-08-29	Pose-Free Neural Radiance Fields via Implicit Pose Regularization	Jiahui Zhang et.al.	2308.15049v1	null
2023-08-28	R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras	Aron Schmied et.al.	2308.14713v1	null
2023-08-28	Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease	Gabriela T. Acevedo Trebbau et.al.	2308.14679v1	null
2023-08-28	Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera	Jun Yang et.al.	2308.14665v1	null
2023-08-28	CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment	Pengcheng Dong et.al.	2308.14324v1	null
2023-08-27	LDL: Line Distance Functions for Panoramic Localization	Junho Kim et.al.	2308.13989v1	link
2023-08-26	Prior-guided Source-free Domain Adaptation for Human Pose Estimation	Dripta S. Raychaudhuri et.al.	2308.13954v1	null
2023-08-26	Vision-Based Human Pose Estimation via Deep Learning: A Survey	Gongjin Lan et.al.	2308.13872v1	null
2023-08-24	POCO: 3D Pose and Shape Estimation with Confidence	Sai Kumar Dwivedi et.al.	2308.12965v1	link
2023-08-24	Robot Pose Nowcasting: Forecast the Future to Improve the Present	Alessandro Simoni et.al.	2308.12914v1	null
2023-08-23	Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map	Timothy D Barfoot et.al.	2308.12418v1	null
2023-08-22	Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape	Jiacong Xu et.al.	2308.11737v1	null
2023-08-22	TrackFlow: Multi-Object Tracking with Normalizing Flows	Gianluca Mancusi et.al.	2308.11513v1	null
2023-08-22	A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles	Yusheng Wang et.al.	2308.11492v1	null
2023-08-22	PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation	Soubarna Banik et.al.	2308.11440v1	null
2023-08-22	Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views	Wentian Qu et.al.	2308.11198v1	null
2023-08-21	Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images	Tze Ho Elden Tse et.al.	2308.11015v1	null
2023-08-21	Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data	Patrick Ruhkamp et.al.	2308.10627v1	null
2023-08-21	GaitPT: Skeletons Are All You Need For Gait Recognition	Andy Catruna et.al.	2308.10623v1	null
2023-08-21	Approximately Equivariant Graph Networks	Ningyuan Huang et.al.	2308.10436v1	link
2023-08-21	In-Rack Test Tube Pose Estimation Using RGB-D Data	Hao Chen et.al.	2308.10411v1	null
2023-08-20	Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video	Yingxuan You et.al.	2308.10305v1	link
2023-08-20	OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision	Shujie Zhang et.al.	2308.10146v1	link
2023-08-19	3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation	Yi Zhang et.al.	2308.10123v1	link
2023-08-19	Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation	Yang Hai et.al.	2308.10016v1	link
2023-08-19	UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning	Meiqi Sun et.al.	2308.09953v1	null
2023-08-22	Scene-Aware Feature Matching	Xiaoyong Lu et.al.	2308.09949v2	null
2023-08-18	PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation	Hanbing Liu et.al.	2308.09678v1	link
2023-08-18	Improving 3D Pose Estimation for Sign Language	Maksym Ivashechkin et.al.	2308.09525v1	null
2023-08-18	Denoising Diffusion for 3D Hand Pose Estimation from Images	Maksym Ivashechkin et.al.	2308.09523v1	null
2023-08-18	ResQ: Residual Quantization for Video Perception	Davide Abati et.al.	2308.09511v1	null
2023-08-17	MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices	Dongyang Yu et.al.	2308.09084v1	null
2023-08-17	Pedestrian Environment Model for Automated Driving	Adrian Holzbock et.al.	2308.09080v1	link
2023-08-17	Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction	Yuhao Yang et.al.	2308.08518v2	null
2023-08-16	View Consistent Purification for Accurate Cross-View Localization	Shan Wang et.al.	2308.08110v1	null
2023-08-15	Learning Better Keypoints for Multi-Object 6DoF Pose Estimation	Yangzheng Wu et.al.	2308.07827v1	link
2023-08-14	Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation	Huan Liu et.al.	2308.07313v1	link
2023-08-12	4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion	Guirong Zhuo et.al.	2308.06573v1	null
2023-08-17	EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes	Jiaxi Jiang et.al.	2308.06493v2	null
2023-08-11	Aggressive Aerial Grasping using a Soft Drone with Onboard Perception	Samuel Ubellacker et.al.	2308.06351v1	null
2023-08-11	VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields	Dominic Maggio et.al.	2308.05939v1	null
2023-08-10	Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations	Frederike Dümbgen et.al.	2308.05783v1	link
2023-08-10	KS-APR: Keyframe Selection for Robust Absolute Pose Regression	Changkun Liu et.al.	2308.05459v1	null
2023-08-10	How-to Augmented Lagrangian on Factor Graphs	Barbara Bazzana et.al.	2308.05444v1	null
2023-08-10	Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation	Jun Zhou et.al.	2308.05438v1	link
2023-08-10	Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR	Changkun Liu et.al.	2308.05394v1	null
2023-08-10	Double-chain Constraints for 3D Human Pose Estimation in Images and Videos	Hongbo Kang et.al.	2308.05298v1	link
2023-08-09	ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction	Weijie Chen et.al.	2308.04956v1	null
2023-08-07	SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention	Efimia Panagiotaki et.al.	2308.03718v1	link
2023-08-07	A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior	Jose Sosa et.al.	2308.03411v1	null
2023-08-06	Source-free Domain Adaptive Human Pose Estimation	Qucheng Peng et.al.	2308.03202v1	link
2023-08-04	Diffusion-Augmented Depth Prediction with Sparse Annotations	Jiaqi Li et.al.	2308.02283v1	null
2023-08-04	DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field	Haowen Wang et.al.	2308.02239v1	null
2023-08-07	Robust Self-Supervised Extrinsic Self-Calibration	Takayuki Kanai et.al.	2308.02153v2	null
2023-08-03	Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter	Luca Crupi et.al.	2308.01833v1	null
2023-08-03	Active Acoustic Sensing for Robot Manipulation	Shihan Lu et.al.	2308.01600v1	null
2023-08-02	HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions	Andrew Guo et.al.	2308.01477v1	null
2023-08-06	Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes	Bohao Fan et.al.	2308.00628v2	link
2023-08-01	Markerless human pose estimation for biomedical applications: a survey	Andrea Avogaro et.al.	2308.00519v1	null
2023-08-01	Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches	Pia Hanfeld et.al.	2308.00344v1	link
2023-08-01	Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis	Asish Bera et.al.	2308.00323v1	null
2023-08-01	Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF)	Chaochao Zhou et.al.	2308.00214v1	null
2023-07-31	Lightweight Super-Resolution Head for Human Pose Estimation	Haonan Wang et.al.	2307.16765v1	link
2023-07-31	DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation	Runyang Feng et.al.	2307.16687v1	null
2023-07-30	Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction	Prajval Kumar Murali et.al.	2307.16254v1	null
2023-07-30	Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems	Cen Liu et.al.	2307.16117v1	link
2023-07-29	Iterative Graph Filtering Network for 3D Human Pose Estimation	Zaedul Islam et.al.	2307.16074v1	link
2023-07-29	HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation	Zuyan Liu et.al.	2307.16061v1	null
2023-07-29	Effective Whole-body Pose Estimation with Two-stages Distillation	Zhendong Yang et.al.	2307.15880v1	link
2023-07-28	TrackAgent: 6D Object Tracking via Reinforcement Learning	Konstantin Röhrl et.al.	2307.15671v1	null
2023-07-28	Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation	Jaime Corsetti et.al.	2307.15514v1	link
2023-07-28	Robust Visual Sim-to-Real Transfer for Robotic Manipulation	Ricardo Garcia et.al.	2307.15320v1	null
2023-07-27	Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving	Peter Bauer et.al.	2307.14889v1	null
2023-07-26	Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control	Yijiong Lin et.al.	2307.14510v1	null
2023-07-28	CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor	Alexandros Filotheou et.al.	2307.14247v2	link
2023-07-26	Deep Robust Multi-Robot Re-localisation in Natural Environments	Milad Ramezani et.al.	2307.13950v1	null
2023-07-25	Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior	Jose Sosa et.al.	2307.13361v1	null
2023-07-23	TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation	Huijie Zhang et.al.	2307.12400v1	null
2023-07-25	FDCT: Fast Depth Completion for Transparent Objects	Tianan Li et.al.	2307.12274v2	link
2023-07-22	Challenges for Monocular 6D Object Pose Estimation in Robotics	Stefan Thalhammer et.al.	2307.12172v1	null
2023-07-22	Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap	Zhijian Qiao et.al.	2307.12116v1	link
2023-07-22	Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence	Yang Tian et.al.	2307.12106v1	link
2023-07-26	LAMP: Leveraging Language Prompts for Multi-person Pose Estimation	Shengnan Hu et.al.	2307.11934v2	link
2023-07-21	YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation	Arul Selvam Periyasamy et.al.	2307.11550v1	null
2023-07-21	KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation	Ivano Donadi et.al.	2307.11543v1	link
2023-07-21	Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots	Mihir Kulkarni et.al.	2307.11522v1	null
2023-07-20	SimCol3D -- 3D Reconstruction during Colonoscopy Challenge	Anita Rau et.al.	2307.11261v1	link
2023-07-20	MSQNet: Actor-agnostic Action Recognition with Multi-modal Query	Anindya Mondal et.al.	2307.10763v1	link
2023-07-19	POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities	Rui Wang et.al.	2307.10387v1	link
2023-07-18	ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting	Hongwei Zheng et.al.	2307.09026v1	null
2023-07-17	Human Emergency Detection during Autonomous Hospital Transports	Andreas Zachariae et.al.	2307.08359v1	link
2023-07-17	Self-supervised Monocular Depth Estimation: Let's Talk About The Weather	Kieran Saunders et.al.	2307.08357v1	null
2023-07-20	Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer	Yujiao Shi et.al.	2307.08015v3	link
2023-07-15	Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents	Ke Cao et.al.	2307.07763v1	null
2023-07-13	Haptic-guided assisted telemanipulation approach for grasping desired objects from heaps	Maxime Adjigble et.al.	2307.07053v1	null
2023-07-13	Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data	Miroslav Purkrábek et.al.	2307.06737v1	link
2023-07-12	Deep learning-based estimation of whole-body kinematics from multi-view images	Kien X. Nguyen et.al.	2307.05896v1	link
2023-07-12	GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human	Bruce X. B. Yu et.al.	2307.05853v1	link
2023-07-09	TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement	Mahmoud Abdulsalam et.al.	2307.05561v1	null
2023-07-11	ResMatch: Residual Attention Learning for Local Feature Matching	Yuxin Deng et.al.	2307.05180v1	link
2023-07-07	Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation	Jessica Yin et.al.	2307.03839v1	null
2023-07-07	Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation	Zhongyu Jiang et.al.	2307.03833v1	link
2023-07-07	Equivariant Single View Pose Prediction Via Induced and Restricted Representations	Owen Howell et.al.	2307.03704v1	null
2023-07-07	RCDN -- Robust X-Corner Detection Algorithm based on Advanced CNN Model	Ben Chen et.al.	2307.03505v1	null
2023-07-06	Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning	Christian Jauch et.al.	2307.03007v1	null
2023-07-06	Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive	Eran Bamani et.al.	2307.02949v1	null
2023-07-06	A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition	Orhan Konak et.al.	2307.02906v1	null
2023-07-04	Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones	Elia Cereda et.al.	2307.01559v1	null
2023-07-03	Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach	Dongyang Yu et.al.	2307.01004v1	null
2023-07-01	Automatic Solver Generator for Systems of Laurent Polynomial Equations	Evgeniy Martyushev et.al.	2307.00320v1	link
2023-07-01	SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation	Fabian Duffhauss et.al.	2307.00306v1	link
2023-06-30	GIRA: Gaussian Mixture Models for Inference and Robot Autonomy	Kshitij Goel et.al.	2307.00071v1	link
2023-06-30	Towards the extraction of robust sign embeddings for low resource sign language recognition	Mathieu De Coster et.al.	2306.17558v1	null
2023-06-30	Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle	Václav Pritzl et.al.	2306.17544v1	link
2023-06-30	Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization	Stephen Hausler et.al.	2306.17529v1	null
2023-06-29	ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models	Weihao Cheng et.al.	2306.17140v1	null
2023-06-29	Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation	Zhongwei Qiu et.al.	2306.17074v1	null
2023-06-28	Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects	Alireza Rezazadeh et.al.	2306.15858v1	null
2023-06-09	Data-Link: High Fidelity Manufacturing Datasets for Model2Real Transfer under Industrial Settings	Sunny Katyara et.al.	2306.05766v1	null
2023-05-28	Counter-Hypothetical Particle Filters for Single Object Pose Tracking	Elizabeth A. Olson et.al.	2305.17828v1	null
2023-05-25	Enhanced 6D Pose Estimation for Robotic Fruit Picking	Marco Costanzo et.al.	2305.15856v1	null
2023-05-22	You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example	Walter Goodwin et.al.	2305.12626v1	null
2023-05-18	Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose	Yichen Zhang et.al.	2305.10808v1	link
2023-05-08	RelPose++: Recovering 6D Poses from Sparse-view Observations	Amy Lin et.al.	2305.04926v1	link
2023-04-17	Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation	Elena Govi et.al.	2304.08230v1	link
2023-03-28	CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects	Nick Heppert et.al.	2303.15782v1	link
2023-03-23	Prior-free Category-level Pose Estimation with Implicit Space Transformation	Jianhui Liu et.al.	2303.13479v1	link
2023-06-21	6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics	Maximilian Ulmer et.al.	2303.13241v3	null
2023-03-22	Rigidity-Aware Detection for 6D Object Pose Estimation	Yang Hai et.al.	2303.12396v1	link
2023-03-22	Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation	Heng Yang et.al.	2303.12246v1	link
2023-03-21	Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation	Fulin Liu et.al.	2303.11516v1	link
2023-03-18	SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations	Boyan Wan et.al.	2303.10346v1	null
2023-03-12	Module-Wise Network Quantization for 6D Object Pose Estimation	Saqib Javed et.al.	2303.06753v1	link
2023-03-09	SpyroPose: Importance Sampling Pyramids for Object Pose Distribution Estimation in SE(3)	Rasmus Laurvig Haugaard et.al.	2303.05308v1	null
2023-03-03	Depth-based 6DoF Object Pose Estimation using Swin Transformer	Zhujun Li et.al.	2303.02133v1	link
2023-03-02	Canonical mapping as a general-purpose object descriptor for robotic manipulation	Benjamin Joffe et.al.	2303.01331v1	null
2023-02-14	MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation	Dingding Cai et.al.	2302.07300v1	null
2023-02-14	Model-Based Underwater 6D Pose Estimation from RGB	Davide Sapienza et.al.	2302.06821v1	null
2023-02-02	A Projective Geometric View for 6D Pose Estimation in mmWave MIMO Systems	Shengqiang Shen et.al.	2302.00227v2	null
2023-01-31	Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors	Gabriele M. Caddeo et.al.	2301.13667v1	link
2023-01-19	Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain	Chiara Di Vece et.al.	2301.08317v1	null
2023-01-19	RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation	Leonard Bruns et.al.	2301.08147v1	link
2022-12-21	HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios	HyunJun Jung et.al.	2212.10428v2	link
2022-12-13	MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare	Yann Labbé et.al.	2212.06870v1	null
2022-12-11	Context-aware 6D Pose Estimation of Known Objects using RGB-D data	Ankit Kumar et.al.	2212.05560v1	null
2023-01-30	Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation	Wei Chen et.al.	2212.04632v2	null

(back to top)

Point Cloud Registration

Publish Date	Title	Authors	PDF	Code
2024-12-29	Towards Explaining Uncertainty Estimates in Point Cloud Registration	Ziyuan Qin et.al.	2412.20612v1	null
2024-12-26	Resolving the Ambiguity of Complete-to-Partial Point Cloud Registration for Image-Guided Liver Surgery with Patches-to-Partial Matching	Zixin Yang et.al.	2412.19328v1	null
2024-12-25	Cross-PCR: A Robust Cross-Source Point Cloud Registration Framework	Guiyu Zhao et.al.	2412.18873v1	null
2024-12-23	PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging	Mattias Paul Heinrich et.al.	2412.17390v1	null
2024-12-19	3D Registration in 30 Years: A Survey	Jiaqi Yang et.al.	2412.13735v2	link
2024-12-13	TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes	Yan Xia et.al.	2412.10308v1	null
2024-12-10	A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM	Zongbo Liao et.al.	2412.07513v1	null
2024-12-07	AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration	Jiong Lin et.al.	2412.05507v1	null
2024-12-06	GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration	Yaojie Zhang et.al.	2412.04855v1	null
2024-12-04	AffordDP: Generalizable Diffusion Policy with Transferable Affordance	Shijie Wu et.al.	2412.03142v1	null
2024-12-04	QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives	Ji Wu et.al.	2412.02998v1	null
2024-12-01	FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting	Phu Pham et.al.	2412.00682v1	null
2024-11-27	XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration	Denys Rozumnyi et.al.	2411.18377v1	null
2024-11-22	EADReg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model for Outdoor Point Cloud Registration	Linrui Gong et.al.	2411.15271v1	null
2024-11-20	Automatic marker-free registration based on similar tetrahedras for single-tree point clouds	Jing Ren et.al.	2411.13069v1	null
2024-11-19	3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality	Hanbeom Chang et.al.	2411.12514v1	null
2024-11-16	Deep Loss Convexification for Learning Iterative Models	Ziming Zhang et.al.	2411.10649v1	null
2024-11-12	3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration	Liyuan Zhang et.al.	2411.07740v1	null
2024-11-04	Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration	Kezheng Xiong et.al.	2411.01870v1	link
2024-10-30	UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration	Geng Li et.al.	2410.22909v1	null
2024-10-29	Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy	Rongling Zhang et.al.	2410.21857v1	null
2024-10-29	Memory-Efficient Point Cloud Registration via Overlapping Region Sampling	Tomoyasu Shimada et.al.	2410.21753v1	null
2024-10-21	RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration	Pengcheng Shi et.al.	2410.15682v1	link
2024-10-14	A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration	Renlang Huang et.al.	2410.10295v1	link
2024-10-14	Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces	Tiziano Guadagnino et.al.	2410.10277v1	null
2024-10-10	LiPO: LiDAR Inertial Odometry for ICP Comparison	Darwin Mick et.al.	2410.08097v1	null
2024-10-08	Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration	Xueyang Kang et.al.	2410.05729v1	link
2024-10-07	Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection	Ang He et.al.	2410.05017v1	null
2024-10-03	LoGDesc: Local geometric features aggregation for robust point cloud registration	Karim Slimani et.al.	2410.02420v1	link
2024-10-01	GERA: Geometric Embedding for Efficient Point Registration Analysis	Geng Li et.al.	2410.00589v1	null
2024-10-01	TFCT-I2P: Three stream fusion network with color aware transformer for image-to-point cloud registration	Muyao Peng et.al.	2410.00360v1	link
2024-10-06	KISS-Matcher: Fast and Robust Point Cloud Registration Revisited	Hyungtae Lim et.al.	2409.15615v2	link
2024-09-23	MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies	Haojie Huang et.al.	2409.15517v1	null
2024-09-22	SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration	Sara Monji-Azad et.al.	2409.14474v1	null
2024-09-27	FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator	Bang-Shien Chen et.al.	2409.13978v2	link
2024-09-17	Enhancing the Reliability of LiDAR Point Cloud Sampling: A Colorization and Super-Resolution Approach Based on LiDAR-Generated Images	Sier Ha et.al.	2409.11532v1	null
2024-09-14	Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent	Xuesong Li et.al.	2409.09312v1	null
2024-09-11	Unsupervised Point Cloud Registration with Self-Distillation	Christian Löwens et.al.	2409.07558v1	link
2024-09-10	Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations	Tejas Anvekar et.al.	2409.06267v1	link
2024-09-09	From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models	Tessa Pulli et.al.	2409.05413v1	null
2024-09-08	Sight View Constraint for Robust Point Cloud Registration	Yaojie Zhang et.al.	2409.05065v1	null
2024-08-23	UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration	Yuval Haitman et.al.	2408.12380v2	link
2024-08-21	Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild	Turcan Tuna et.al.	2408.11809v1	null
2024-08-20	LoopSplat: Loop Closure by Registering 3D Gaussian Splats	Liyuan Zhu et.al.	2408.10154v2	link
2024-08-05	CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration	Gongxin Yao et.al.	2408.02394v1	null
2024-08-05	MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval	Gongxin Yao et.al.	2408.02392v1	null
2024-07-29	Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning	Ray Zhang et.al.	2407.20223v1	null
2024-07-24	Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model	Lingjie Su et.al.	2407.17183v1	null
2024-07-23	SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration	Chien Erh Lin et.al.	2407.16823v1	link
2024-07-19	PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training	Suyi Chen et.al.	2407.14054v1	link
2024-07-19	GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation	Bangyan Liao et.al.	2407.13537v2	link
2024-07-22	Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems	Jianzhu Huai et.al.	2407.11705v2	null
2024-07-14	PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration	Runzhao Yao et.al.	2407.10142v1	link
2024-07-13	ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency	Shaocheng Yan et.al.	2407.09862v1	link
2024-07-11	BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration	Stefanos Pertigkiozoglou et.al.	2407.08729v1	null
2024-07-10	Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval	Shiqi Li et.al.	2407.07525v1	null
2024-07-08	SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration	Guiyu Zhao et.al.	2407.06297v1	link
2024-07-08	GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields	Weiyi Xue et.al.	2407.05597v1	null
2024-07-07	GaussReg: Fast 3D Registration with Gaussian Splatting	Jiahao Chang et.al.	2407.05254v1	null
2024-07-06	Incremental Multiview Point Cloud Registration	Xiaoya Cheng et.al.	2407.05021v1	link
2024-06-25	Point Tree Transformer for Point Cloud Registration	Meiling Wang et.al.	2406.17530v1	null
2024-06-17	Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra	Francisco Xavier Vasconcelos et.al.	2406.11732v1	link
2024-06-05	L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration	Yibo Liu et.al.	2406.03298v1	link
2024-05-25	Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration	Junjie Gao et.al.	2405.16085v1	null
2024-05-26	NV-LIO: LiDAR-Inertial Odometry using Normal Vectors Towards Robust SLAM in Multifloor Environments	Dongha Chung et.al.	2405.12563v2	link
2024-05-13	RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration	Congjia Chen et.al.	2405.07594v1	null
2024-05-10	Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration	Li Ling et.al.	2405.06279v1	link
2024-05-09	Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration	Yifan Duan et.al.	2405.05589v1	null
2024-05-07	Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform	Zhijian Qiao et.al.	2405.03969v1	null
2024-05-06	Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery	Maximilian Weber et.al.	2405.03314v1	null
2024-04-27	FRAME: A Modular Framework for Autonomous Map-merging: Advancements in the Field	Nikolaos Stathoulopoulos et.al.	2404.18006v1	null
2024-04-22	PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer	Rui She et.al.	2404.14034v1	null
2024-04-22	A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning	Yu-Xin Zhang et.al.	2404.13830v1	link
2024-04-09	Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search	Tianyu Huang et.al.	2404.06155v1	link
2024-04-08	Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes	Yu Sheng et.al.	2404.05164v1	null
2024-04-06	Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes	Zhiyuan Yu et.al.	2404.04557v1	link
2024-04-05	A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping	Javier Rodriguez-Sanchez et.al.	2404.04404v1	null
2024-04-01	FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features	Keisuke Sugiura et.al.	2404.01237v1	null
2024-03-28	SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks	Yaxu Xie et.al.	2403.19474v1	link
2024-03-26	Global Point Cloud Registration Network for Large Transformations	Hanz Cuevas-Velasquez et.al.	2403.18040v1	null
2024-03-28	Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields	Junhong Zhao et.al.	2403.15981v2	null
2024-03-15	VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering	Guiyu Zhao et.al.	2403.10085v1	link
2024-03-15	MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings	Yu Du et.al.	2403.09996v1	null
2024-03-15	CLOSURE: Fast Quantification of Pose Uncertainty Sets	Yihuai Gao et.al.	2403.09990v1	null
2024-03-13	FastMAC: Stochastic Spectral Sampling of Correspondence Graph	Yifei Zhang et.al.	2403.08770v1	link
2024-03-13	NeRF-Supervised Feature Point Detection and Description	Ali Youssef et.al.	2403.08156v1	link
2024-03-10	PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing	Jianping Li et.al.	2403.06124v1	null
2024-03-27	Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension	Quan Liu et.al.	2403.03532v2	link
2024-03-15	RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments	Zhiqiang Chen et.al.	2402.18934v2	null
2024-02-28	PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers	Seong Hun Lee et.al.	2402.16598v2	link
2024-02-23	CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration	Kaveh Fathian et.al.	2402.15464v1	link
2024-02-11	CLIPPER: Robust Data Association without an Initial Guess	Parker C. Lusk et.al.	2402.07284v1	null
2024-02-08	Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization	Kenji Koide et.al.	2402.05540v1	null
2024-01-16	Registration of algebraic varieties using Riemannian optimization	Florentin Goyens et.al.	2401.08562v1	link
2024-01-09	Iterative Feedback Network for Unsupervised Point Cloud Registration	Yifan Xie et.al.	2401.04357v1	link
2024-01-06	PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations	Rui She et.al.	2401.03167v1	null
2024-01-04	OptFlow: Fast Optimization-based Scene Flow Estimation without Supervision	Rahul Ahuja et.al.	2401.02550v1	null
2024-01-17	Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration	Qianliang Wu et.al.	2401.00436v4	null
2023-12-22	On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods	Anh Duc Nguyen et.al.	2312.13970v2	link
2023-12-20	D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer	Junjie Gao et.al.	2312.12970v1	null
2023-12-14	SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration	Kezheng Xiong et.al.	2312.08664v1	null
2023-12-11	PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration	Yue Wu et.al.	2312.06063v1	null
2023-12-05	DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration	Zhi Chen et.al.	2312.03053v1	null
2023-12-08	Zero-Shot Point Cloud Registration	Weijie Wang et.al.	2312.03032v2	null
2023-12-05	A Dynamic Network for Efficient Point Cloud Registration	Yang Ai et.al.	2312.02877v1	null
2023-12-05	6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation	K. Samarawickrama et.al.	2312.02593v1	link
2023-12-04	Rotation-Invariant Rapid TRISO-Fueled Pebble Identification Based on Feature Matching and Point Cloud Registration	Ming Fang et.al.	2312.02006v1	null
2023-12-27	E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning	Xiuhong Lin et.al.	2311.18433v2	link
2023-11-15	Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change	Tao Sun et.al.	2311.09346v1	null
2023-11-02	Transformation Decoupling Strategy based on Screw Theory for Deterministic Point Cloud Registration with Gravity Prior	Xinyi Li et.al.	2311.01432v1	null
2023-11-02	Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration	Yifan Xie et.al.	2311.01202v1	link
2023-10-29	HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration	Weiyi Xue et.al.	2310.18874v1	null
2023-10-27	Do we need scan-matching in radar odometry?	Vladimír Kubelka et.al.	2310.18117v1	link
2023-10-26	SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation	Haobo Jiang et.al.	2310.17359v1	null
2023-10-18	DBDNet:Partial-to-Partial Point Cloud Registration with Dual Branches Decoupling	Shiqi Li et.al.	2310.11733v1	null
2023-10-15	OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer	Junjie Gao et.al.	2310.09817v1	null
2023-10-09	FeatSense -- A Feature-based Registration Algorithm with GPU-accelerated TSDF-Mapping Backend for NVIDIA Jetson Boards	Julian Gaal et.al.	2310.05766v1	link
2023-10-09	Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration	Chunge Bai et.al.	2310.05504v1	link
2023-10-06	Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching	Shiquan Yi et.al.	2310.04162v1	link
2023-10-05	FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators	Haiping Wang et.al.	2310.03420v1	link
2023-10-02	COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry	Patrick Pfreundschuh et.al.	2310.01235v1	link
2023-09-27	Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature	Shengze Jin et.al.	2309.16023v1	null
2023-09-27	Partial Transport for Point-Cloud Registration	Yikun Bai et.al.	2309.15787v1	null
2023-09-27	KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping	Renlang Huang et.al.	2309.15394v1	null
2023-09-26	CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration	Shuhao Kang et.al.	2309.14660v1	null
2023-09-20	AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration	Zheng Dang et.al.	2309.11170v1	null
2023-09-19	LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation	Haizhou Zhang et.al.	2309.10436v1	link
2023-09-17	Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control	Abdullah Altawaitan et.al.	2309.09163v1	link
2023-09-16	FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization	Nan Ma et.al.	2309.08966v1	null
2023-09-16	Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning	Pengyu Yin et.al.	2309.08914v1	link
2023-09-15	A Ground Segmentation Method Based on Point Cloud Map for Unstructured Roads	Zixuan Li et.al.	2309.08164v1	null
2023-09-15	Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM	Chenghao Shi et.al.	2309.08086v1	null
2023-09-14	EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization	Minjung Kim et.al.	2309.07471v1	link
2023-09-12	SGFeat: Salient Geometric Feature for Point Cloud Registration	Qianliang Wu et.al.	2309.06207v1	null
2023-09-01	Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning	Ahmed Hatem et.al.	2308.16481v2	null
2023-08-21	In-Rack Test Tube Pose Estimation Using RGB-D Data	Hao Chen et.al.	2308.10411v1	null
2023-08-18	DReg-NeRF: Deep Registration for Neural Radiance Fields	Yu Chen et.al.	2308.09386v1	link
2023-08-18	Overlap Bias Matching is Necessary for Point Cloud Registration	Pengcheng Shi et.al.	2308.09364v1	null
2023-08-10	Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds Registration	Shaocong Liu et.al.	2308.05314v1	null
2023-08-09	PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration	Mingzhi Yuan et.al.	2308.04782v1	link
2023-07-25	GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer	Zheng Qin et.al.	2308.03768v1	link
2023-07-26	One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration	Yongzhe Yuan et.al.	2307.14019v1	null
2023-07-22	Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap	Zhijian Qiao et.al.	2307.12116v1	link
2023-09-12	ELiOT : End-to-end Lidar Odometry using Transformer Framework	Daegyu Lee et.al.	2307.11998v4	null
2023-08-08	Density-invariant Features for Distant Point Cloud Registration	Quan Liu et.al.	2307.09788v2	link
2023-07-18	SphereNet: Learning a Noise-Robust and General Descriptor for Point Cloud Registration	Guiyu Zhao et.al.	2307.09351v1	null
2023-07-14	CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration	Gongxin Yao et.al.	2307.07142v1	null
2023-07-11	Exact Point Cloud Downsampling for Fast and Accurate Global Trajectory Optimization	Kenji Koide et.al.	2307.02948v2	link
2023-07-03	Direct Superpoints Matching for Fast and Robust Point Cloud Registration	Aniket Gupta et.al.	2307.01362v1	link
2023-07-04	A denoised Mean Teacher for domain adaptive point cloud registration	Alexander Bigalke et.al.	2306.14749v2	link
2023-06-20	End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization	Guangming Wang et.al.	2306.11346v1	null
2023-06-14	ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching	Matthew McDermott et.al.	2306.08690v1	link
2023-06-12	Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM	Peter Stratton et.al.	2306.06850v1	link
2023-06-11	PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications	Manorama Jha et.al.	2306.06717v1	null
2023-06-07	Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks	Sara Monji-Azad et.al.	2306.04701v1	null
2023-05-23	Cross-source Point Cloud Registration: Challenges, Progress and Prospects	Xiaoshui Huang et.al.	2305.13570v1	null
2023-05-19	Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration	Xinyi Li et.al.	2305.11716v1	null
2023-05-18	3D Registration with Maximal Cliques	Xiyu Zhang et.al.	2305.10854v1	link
2023-05-05	HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration	Canhui Tang et.al.	2305.03487v1	link
2023-05-08	APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction	Quan Liu et.al.	2305.02893v2	link
2023-04-27	RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object	Shiyu Xing et.al.	2304.14092v1	link
2023-04-26	Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography	Peng Liu et.al.	2304.13618v1	link
2023-04-25	BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization	Harel Biggie et.al.	2304.13114v1	link
2023-04-18	SDFReg: Learning Signed Distance Functions for Point Cloud Registration	Leida Zhang et.al.	2304.08929v1	null
2023-04-12	SiLK -- Simple Learned Keypoints	Pierre Gleize et.al.	2304.06194v1	link
2023-04-11	TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain	Alexey I. Boyko et.al.	2304.05342v1	null
2023-04-10	HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion	Yu Wang et.al.	2304.04508v1	null
2023-04-09	Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos	Shiyang Lu et.al.	2304.04325v1	null
2023-04-09	DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames	Changjie Qiu et.al.	2304.04200v1	null
2023-04-02	Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting	Haiping Wang et.al.	2304.00467v1	link
2023-03-31	kNN-Res: Residual Neural Network with kNN-Graph coherence for point cloud registration	Muhammad S. Battikh et.al.	2304.00050v1	link
2023-03-31	RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving	Chenghao Shi et.al.	2303.18084v1	null
2023-04-23	HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching	Yiheng Li et.al.	2303.16526v2	link
2023-03-27	Learnable Graph Matching: A Practical Paradigm for Data Association	Jiawei He et.al.	2303.15414v1	link
2023-03-23	Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration	Guofeng Mei et.al.	2303.13290v1	link
2023-03-22	RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration	Jiuming Liu et.al.	2303.12384v1	link
2023-03-17	Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration	Zheng Qin et.al.	2303.09950v1	link
2023-03-14	RoCNet: 3D Robust Registration of Point-Clouds using Deep Learning	Karim Slimani et.al.	2303.07963v1	null
2023-03-07	GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration	Michael Gentner et.al.	2303.04032v1	null
2023-03-02	Neural Intrinsic Embedding for Non-rigid Point Cloud Matching	Puhua Jiang et.al.	2303.01038v1	null
2023-03-14	A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation	Lin Li et.al.	2302.14511v2	link
2023-02-28	PCR-CG: Point Cloud Registration via Deep Color and Geometry	Yu Zhang et.al.	2302.14418v1	link
2023-02-28	Efficient Implicit Neural Reconstruction Using LiDAR	Dongyu Yan et.al.	2302.14363v1	link
2023-02-25	Accurate Gaussian Process Distance Fields with applications to Echolocation and Mapping	Cedric Le Gentil et.al.	2302.13005v1	null
2023-02-14	Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms	Ningli Xu et.al.	2302.07184v1	link

(back to top)

Point Cloud Segmentation

Publish Date	Title	Authors	PDF	Code
2024-12-02	The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs	Christina Kassab et.al.	2412.01539v1	null
2024-11-30	Density-aware Global-Local Attention Network for Point Cloud Segmentation	Chade Li et.al.	2412.00489v1	null
2024-11-28	Textured As-Is BIM via GIS-informed Point Cloud Segmentation	Mohamed S. H. Alabassy et.al.	2411.18898v1	null
2024-11-27	Towards Cross-device and Training-free Robotic Grasping in 3D Open World	Weiguang Zhao et.al.	2411.18133v1	null
2024-11-20	BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation	Umamaheswaran Raman Kumar et.al.	2411.13251v1	null
2024-11-13	Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model	Yutao Shen et.al.	2411.08453v1	null
2024-11-13	Multiscale Graph Construction Using Non-local Cluster Features	Reina Kaneko et.al.	2411.08371v1	null
2024-10-30	Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification	Pengkun Liu et.al.	2410.23105v1	null
2024-11-03	Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation	Zhaochong An et.al.	2410.22489v2	null
2024-10-28	Exploring contextual modeling with linear complexity for point cloud segmentation	Yong Xien Chng et.al.	2410.21211v1	null
2024-10-14	Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies	Yanjie Ze et.al.	2410.10803v1	link
2024-10-09	Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy	Qinfeng Zhu et.al.	2410.06725v1	null
2024-09-24	Underground Mapping and Localization Based on Ground-Penetrating Radar	Jinchang Zhang et.al.	2409.16446v1	null
2024-09-22	Lidar Panoptic Segmentation in an Open World	Anirudh S Chakravarthy et.al.	2409.14273v1	link
2024-09-03	When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels	Yifan Liu et.al.	2409.01691v1	null
2024-09-03	Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation	Haodong Wang et.al.	2409.01662v1	null
2024-08-29	Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment	Liyao Tang et.al.	2408.16520v1	link
2024-08-21	GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation	Abiao Li et.al.	2408.11558v1	link
2024-08-02	Trainable Pointwise Decoder Module for Point Cloud Segmentation	Bike Chen et.al.	2408.01548v1	null
2024-07-31	Fine-grained Metrics for Point Cloud Semantic Segmentation	Zhuheng Lu et.al.	2407.21289v1	null
2024-07-19	Scale Disparity of Instances in Interactive Point Cloud Segmentation	Chenrui Han et.al.	2407.14009v1	null
2024-07-18	SegPoint: Segment Any Point Cloud via Large Language Model	Shuting He et.al.	2407.13761v1	null
2024-07-17	Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation	Ruijie Xu et.al.	2407.12489v1	link
2024-07-17	HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation	Tianpei Zou et.al.	2407.12387v1	link
2024-07-17	Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model	Tao Wang et.al.	2407.12319v1	null
2024-07-12	Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion	Shiqi Tan et.al.	2407.09697v1	null
2024-07-01	fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence	Francis Williams et.al.	2407.01781v1	null
2024-06-25	Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model	Zhuoyuan Li et.al.	2406.17442v1	null
2024-08-04	Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes	Yong-Qiang Mao et.al.	2405.19735v2	null
2024-05-24	3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving	Boyi Sun et.al.	2405.15286v1	link
2024-05-25	Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation	Bike Chen et.al.	2405.10175v2	null
2024-04-16	ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation	Iaroslav Melekhov et.al.	2404.10699v1	link
2024-04-04	OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	Francis Engelmann et.al.	2404.03650v1	null
2024-03-28	RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation	Chongkai Gao et.al.	2403.19460v1	null
2024-05-30	CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation	Guoyang Zhao et.al.	2403.16794v2	link
2024-03-18	EffiPerception: an Efficient Framework for Various Perception Tasks	Xinhao Xiang et.al.	2403.12317v1	null
2024-03-11	3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data	Xiting Zhao et.al.	2403.06538v1	null
2024-03-11	Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation	Peng Zhang et.al.	2403.06401v1	null
2024-03-03	Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation	Dipesh Gyawali et.al.	2403.01407v1	null
2024-01-29	Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation	Jie Liu et.al.	2401.16051v1	link
2024-01-19	Symbol as Points: Panoptic Symbol Spotting via Point-based Representation	Wenlong Liu et.al.	2401.10556v1	link
2023-12-29	Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation	Xiawei Li et.al.	2312.16578v2	link
2023-12-19	Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas	Alperen Enes Bayar et.al.	2312.11880v1	null
2023-12-15	T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning	Weijie Wei et.al.	2312.10217v1	link
2023-12-14	FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments	Minghao Lu et.al.	2312.08743v1	null
2023-12-12	Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation	Yuanbin Wang et.al.	2312.07221v1	null
2023-12-11	Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation	Shaobo Xia et.al.	2312.06799v1	null
2024-01-15	Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More	Jan Schuchardt et.al.	2312.02708v2	null
2023-11-24	OneFormer3D: One Transformer for Unified Point Cloud Segmentation	Maxim Kolodiazhnyi et.al.	2311.14405v1	null
2023-11-18	DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields	Yu Chi et.al.	2311.12063v1	link
2023-11-10	U3DS $^3$ : Unsupervised 3D Semantic Scene Segmentation	Jiaxu Liu et.al.	2311.06018v1	null
2023-11-06	Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation	Shichao Dong et.al.	2311.01989v2	null
2023-10-19	2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision	Cheng-Kun Yang et.al.	2310.12817v1	null
2023-10-11	PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation	Haibo Qiu et.al.	2310.07743v1	link
2023-09-26	Addressing Data Misalignment in Image-LiDAR Fusion on Point Cloud Segmentation	Wei Jong Yang et.al.	2309.14932v1	null
2023-09-20	Towards Robust Few-shot Point Cloud Semantic Segmentation	Yating Xu et.al.	2309.11228v1	link
2023-09-20	Generalized Few-Shot Point Cloud Segmentation Via Geometric Words	Yating Xu et.al.	2309.11222v1	link
2023-08-29	Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation	Cristiano Saltori et.al.	2308.14619v2	link
2023-08-22	Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation	Zongyi Xu et.al.	2308.11166v1	link
2023-08-14	Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid	Alexander Kyuroson et.al.	2308.07283v1	null
2023-08-08	Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement	Zhenhua Ning et.al.	2308.03177v2	link
2023-07-31	pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation	Abhishek Kuriyal et.al.	2307.14777v2	link
2023-07-27	Clustering based Point Cloud Representation Learning for 3D Analysis	Tuo Feng et.al.	2307.14605v1	link
2023-07-20	See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data	Yuhang Lu et.al.	2307.10782v1	null
2023-07-14	Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar	Runwei Guan et.al.	2307.07102v1	link
2023-07-08	BPNet: Bézier Primitive Segmentation on 3D Point Clouds	Rao Fu et.al.	2307.04013v1	link
2023-06-28	Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction	Athrva Atul Pandhare et.al.	2306.16306v1	null
2023-05-30	Dynamic Clustering Transformer Network for Point Cloud Segmentation	Dening Lu et.al.	2306.08073v1	null
2023-05-23	Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation	Shuting He et.al.	2305.14335v1	link
2023-05-22	Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning	Xiaoxiao Sheng et.al.	2305.12959v1	null
2023-05-17	Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences	Ahmed J. Afifi et.al.	2305.09928v1	null
2023-05-08	OctFormer: Octree-based Transformers for 3D Point Clouds	Peng-Shuai Wang et.al.	2305.03045v2	link
2023-05-22	Urban GeoBIM construction by integrating semantic LiDAR point clouds with as-designed BIM models	Jie Shao et.al.	2304.11719v2	null
2023-04-22	Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation	Feng Jiang et.al.	2304.11393v1	link
2023-06-02	Transformer-Based Visual Segmentation: A Survey	Xiangtai Li et.al.	2304.09854v2	link
2023-04-11	Feature-assisted interactive geometry reconstruction in 3D point clouds using incremental region growing	Attila Szabo et.al.	2304.05109v1	null

(back to top)

Zero-shot

Publish Date	Title	Authors	PDF	Code
2025-01-03	GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models	Zhangyang Qi et.al.	2501.01428v2	null
2025-01-02	VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control	Yuanpeng Tu et.al.	2501.01427v1	null
2025-01-02	Unifying Specialized Visual Encoders for Video Language Models	Jihoon Chung et.al.	2501.01426v1	null
2025-01-03	AdaptVC: High Quality Voice Conversion with Adaptive Learning	Jaehun Kim et.al.	2501.01347v2	null
2025-01-02	Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers?	Manuel Weber et.al.	2501.01256v1	null
2025-01-02	Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction	Alexander Brinkmann et.al.	2501.01237v1	null
2025-01-02	Symmetries-enhanced Multi-Agent Reinforcement Learning	Nikolaos Bousias et.al.	2501.01136v1	null
2025-01-03	MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization	Haina Zhu et.al.	2501.01108v2	null
2025-01-02	Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice	Federico Ravenda et.al.	2501.00982v1	null
2025-01-01	Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model	Chenyang Liu et.al.	2501.00895v1	null
2024-12-30	QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing	Shlomo Kashani et.al.	2412.20956v1	null
2024-12-30	Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding	Liuzhenghao Lv et.al.	2412.20888v1	link
2024-12-30	TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting	Huanyu Zhang et.al.	2412.20810v1	null
2024-12-30	Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks	Yuhe Ding et.al.	2412.20682v1	null
2024-12-29	Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)	Tomer Garber et.al.	2412.20596v1	null
2024-12-27	Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark	Lukas Picek et.al.	2412.19944v1	null
2024-12-27	EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs	Daniil A. Berdyshev et.al.	2412.19725v1	link
2024-12-30	VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models	Tao Wu et.al.	2412.19645v2	null
2024-12-27	MINIMA: Modality Invariant Image Matching	Xingyu Jiang et.al.	2412.19412v1	link
2024-12-26	Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment	Ziang Yan et.al.	2412.19326v1	link
2024-12-26	RecLM: Recommendation Instruction Tuning	Yangqin Jiang et.al.	2412.19302v1	null
2024-12-26	Time Series Foundational Models: Their Role in Anomaly Detection and Prediction	Chathurangi Shyalika et.al.	2412.19286v1	link
2024-12-26	Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval	Yang Du et.al.	2412.19178v1	link
2024-12-26	CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting	Siyu Jiao et.al.	2412.19142v1	null
2024-12-26	Semantic Residual for Multimodal Unified Discrete Representation	Hai Huang et.al.	2412.19128v1	null
2024-12-26	Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing	Inpyo Hong et.al.	2412.19125v1	link
2024-12-24	Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models	Zehan Wang et.al.	2412.18605v1	null
2024-12-24	ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation	Hongjie Li et.al.	2412.18600v1	null
2024-12-24	Distilling Fine-grained Sentiment Understanding from Large Language Models	Yice Zhang et.al.	2412.18552v1	link
2024-12-24	The Key of Understanding Vision Tasks: Explanatory Instructions	Yang Shen et.al.	2412.18525v1	link
2024-12-24	Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English	Avinash Anand et.al.	2412.18415v1	link
2024-12-24	Extract Free Dense Misalignment from CLIP	JeongYeon Nam et.al.	2412.18404v1	null
2024-12-24	A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction	Stefano Damiano et.al.	2412.18348v1	link
2024-12-24	Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model	Yushu Li et.al.	2412.18303v1	null
2024-12-24	Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight	Xi Ding et.al.	2412.18298v1	link
2024-12-24	Improved Feature Generating Framework for Transductive Zero-shot Learning	Zihan Ye et.al.	2412.18282v1	null
2024-12-23	CiteBART: Learning to Generate Citations for Local Citation Recommendation	Ege Yiğit Çelik et.al.	2412.17534v1	link
2024-12-23	Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio	Gongyu Chen et.al.	2412.17306v1	null
2024-12-23	Discriminative Image Generation with Diffusion Models for Zero-Shot Learning	Dingjie Fu et.al.	2412.17219v1	null
2024-12-22	Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis	Ye-Xin Lu et.al.	2412.16977v1	null
2024-12-22	Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation	Quan Dao et.al.	2412.16906v1	null
2024-12-22	Autoregressive Speech Synthesis with Next-Distribution Prediction	Xinfa Zhu et.al.	2412.16846v1	null
2024-12-21	RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing	Zhipeng Huang et.al.	2412.16778v1	null
2024-12-21	HyperCLIP: Adapting Vision-Language models with Hypernetworks	Victor Akinwande et.al.	2412.16777v1	null
2024-12-21	Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval	Luo Ji et.al.	2412.16615v1	null
2024-12-21	Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling	Daichi Yashima et.al.	2412.16576v1	link
2024-12-20	Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts	Muhammad Abdullah Sohail et.al.	2412.16119v1	link
2024-12-20	CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up	Songhua Liu et.al.	2412.16112v1	link
2024-12-20	Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers	Yifan Yang et.al.	2412.16102v1	null
2024-12-20	Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs	Lynn Greschner et.al.	2412.15993v1	null
2024-12-20	Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation	Zhenghao Gao et.al.	2412.15924v1	null
2024-12-20	On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education	Lorenz Wendlinger et.al.	2412.15902v1	null
2024-12-20	AutoLife: Automatic Life Journaling with Smartphones and LLMs	Huatao Xu et.al.	2412.15714v1	null
2024-12-20	Cracking the Code: Evaluating Zero-Shot Prompting Methods for Providing Programming Feedback	Niklas Ippisch et.al.	2412.15702v1	null
2024-12-20	SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training	Wenxi Chen et.al.	2412.15649v1	null
2024-12-20	A New Method to Capturing Compositional Knowledge in Linguistic Space	Jiahe Wan et.al.	2412.15632v1	null
2024-12-19	Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings	Daniel Russo et.al.	2412.15189v1	link
2024-12-19	STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning	Marius Memmel et.al.	2412.15182v1	null
2024-12-19	Adaptive Pruning for Large Language Models with Structural Importance Awareness	Haotian Zheng et.al.	2412.15127v1	null
2024-12-19	Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling	Leying Zhang et.al.	2412.14890v1	null
2024-12-19	Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data	Shuang Li et.al.	2412.14873v1	link
2024-12-19	Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure	Jeffrey Sardina et.al.	2412.14801v1	null
2024-12-19	Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning	Kepu Zhang et.al.	2412.14588v1	null
2024-12-19	MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval	Junjie Zhou et.al.	2412.14475v1	null
2024-12-19	WildSAT: Learning Satellite Image Representations from Wildlife Observations	Rangel Daroya et.al.	2412.14428v1	null
2024-12-18	I0T: Embedding Standardization Method Towards Zero Modality Gap	Na Min An et.al.	2412.14384v1	link
2024-12-18	Autoregressive Video Generation without Vector Quantization	Haoge Deng et.al.	2412.14169v1	link
2024-12-18	Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation	Jianyu Zhang et.al.	2412.14145v1	null
2024-12-18	Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation	Rémi Marsal et.al.	2412.14103v1	null
2024-12-18	FarExStance: Explainable Stance Detection for Farsi	Majid Zarharan et.al.	2412.14008v1	link
2024-12-18	Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition	Ethan Baron et.al.	2412.13947v1	null
2024-12-18	Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer	Xinyuan Shao et.al.	2412.13908v1	link
2024-12-18	Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models	Anna Scius-Bertrand et.al.	2412.13859v1	null
2024-12-18	SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor	Chenyu Yang et.al.	2412.13786v1	null
2024-12-18	G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o	Tony Cheng Tong et.al.	2412.13647v1	link
2024-12-18	Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking	Zhengfei Xu et.al.	2412.13614v1	null
2024-12-17	GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding	Haoyi Jiang et.al.	2412.13193v1	link
2024-12-17	A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis	Xiao Zhou et.al.	2412.13126v1	null
2024-12-17	Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO	Umer Butt et.al.	2412.12997v1	null
2024-12-17	An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions	Shreeyash Gowaikar et.al.	2412.12898v1	null
2024-12-17	Question: How do Large Language Models perform on the Question Answering tasks? Answer:	Kevin Fischer et.al.	2412.12893v1	null
2024-12-17	MIVE: New Design and Benchmark for Multi-Instance Video Editing	Samuel Teodoro et.al.	2412.12877v1	null
2024-12-17	Comparative Analysis of Zero-Shot Capability of Time-Series Foundation Models in Short-Term Load Prediction	Nan Lin et.al.	2412.12834v1	null
2024-12-17	FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering	Zheng Cheng et.al.	2412.12833v1	null
2024-12-17	Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages	Robert Litschko et.al.	2412.12806v1	null
2024-12-17	ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation	Shiqi Huang et.al.	2412.12798v1	link
2024-12-16	Causal Diffusion Transformers for Generative Modeling	Chaorui Deng et.al.	2412.12095v1	link
2024-12-16	CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology	Yuxuan Sun et.al.	2412.12077v1	null
2024-12-16	A LoRA is Worth a Thousand Pictures	Chenxi Liu et.al.	2412.12048v1	null
2024-12-16	Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps	Linfeng Zhao et.al.	2412.12024v1	null
2024-12-16	Cost-Effective Label-free Node Classification with LLMs	Taiyan Zhang et.al.	2412.11983v1	null
2024-12-16	Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning	Yuti Liu et.al.	2412.11952v1	null
2024-12-16	Stepwise Reasoning Error Disruption Attack of LLMs	Jingyu Peng et.al.	2412.11934v1	null
2024-12-16	PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection	Sepideh Mamooler et.al.	2412.11923v1	null
2024-12-16	Improved Models for Media Bias Detection and Subcategorization	Tim Menzner et.al.	2412.11835v1	null
2024-12-16	A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation	Tian-Yi Che et.al.	2412.11832v1	null
2024-12-13	UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities	Muhammad Uzair Khattak et.al.	2412.10372v1	link
2024-12-13	Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media	Jiaqing Yuan et.al.	2412.10266v1	null
2024-12-13	Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Jaehyeon Kim et.al.	2412.10208v1	null
2024-12-13	Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments	Kehan Chen et.al.	2412.10137v1	null
2024-12-13	Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data	Jonas Golde et.al.	2412.10121v1	null
2024-12-13	Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP	Yating Yu et.al.	2412.09895v1	link
2024-12-13	CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection	Qibo Chen et.al.	2412.09799v1	null
2024-12-12	Toward Foundation Model for Multivariate Wearable Sensing of Physiological Signals	Yunfei Luo et.al.	2412.09758v1	link
2024-12-12	Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners?	Huaijiang Zhu et.al.	2412.09743v1	null
2024-12-12	TransferLight: Zero-Shot Traffic Signal Control on any Road-Network	Johann Schmidt et.al.	2412.09719v1	null
2024-12-12	EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM	Zhuofan Zong et.al.	2412.09618v1	null
2024-12-12	Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion	Joseph Humphreys et.al.	2412.09440v1	null
2024-12-12	Distribution free uncertainty quantification in neuroscience-inspired deep operators	Shailesh Garg et.al.	2412.09369v1	null
2024-12-12	Towards Open-Vocabulary Video Semantic Segmentation	Xinhao Li et.al.	2412.09329v1	link
2024-12-12	T-SVG: Text-Driven Stereoscopic Video Generation	Qiao Jin et.al.	2412.09323v1	null
2024-12-12	Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine	Xiaoshuang Huang et.al.	2412.09278v1	link
2024-12-12	Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation	Kirill Sirotkin et.al.	2412.09160v1	null
2024-12-12	Evaluating Pixel Language Models on Non-Standardized Languages	Alberto Muñoz-Ortiz et.al.	2412.09084v1	null
2024-12-12	Cross-View Completion Models are Zero-shot Correspondence Estimators	Honggyu An et.al.	2412.09072v1	null
2024-12-13	An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques	Chunxiao Li et.al.	2412.09063v2	null
2024-12-11	RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation	Mingfei Han et.al.	2412.08591v1	null
2024-12-11	SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting	Pallavi Jain et.al.	2412.08536v1	link
2024-12-11	SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation	Tapas Kumar Dutta et.al.	2412.08482v1	null
2024-12-11	Assessing Personalized AI Mentoring with Large Language Models in the Computing Field	Xiao Luo et.al.	2412.08430v1	null
2024-12-11	Zero-Shot Mono-to-Binaural Speech Synthesis	Alon Levkovitch et.al.	2412.08356v1	null
2024-12-11	BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language	Nikolay Banar et.al.	2412.08329v1	null
2024-12-11	Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion	Bingzhi Shen et.al.	2412.08315v1	null
2024-12-11	2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset	Marta R. Costa-jussà et.al.	2412.08274v1	null
2024-12-11	Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field	Tanay Aggarwal et.al.	2412.08258v1	link
2024-12-11	Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision?	Zihao Li et.al.	2412.08174v1	null
2024-12-10	Video Motion Transfer with Diffusion Transformers	Alexander Pondaven et.al.	2412.07776v1	link
2024-12-10	From Slow Bidirectional to Fast Causal Video Generators	Tianwei Yin et.al.	2412.07772v1	null
2024-12-11	Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting	Zetong Yang et.al.	2412.07768v2	null
2024-12-10	SAT: Spatial Aptitude Training for Multimodal Language Models	Arijit Ray et.al.	2412.07755v1	null
2024-12-10	Zero-Shot ATC Coding with Large Language Models for Clinical Assessments	Zijian Chen et.al.	2412.07743v1	null
2024-12-10	DriveMM: All-in-One Large Multimodal Model for Autonomous Driving	Zhijian Huang et.al.	2412.07689v1	link
2024-12-10	Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions	Anant Prakash Awasthi et.al.	2412.07687v1	null
2024-12-10	FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing	Yingying Deng et.al.	2412.07517v1	link
2024-12-10	ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning	Hongshu Guo et.al.	2412.07507v1	null
2024-12-10	Bilingual BSARD: Extending Statutory Article Retrieval to Dutch	Ehsan Lotfi et.al.	2412.07462v1	null
2024-12-09	Visual Lexicon: Rich Image Features in Language Space	XuDong Wang et.al.	2412.06774v1	null
2024-12-09	JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM	Takuro Fujii et.al.	2412.06738v1	link
2024-12-09	You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale	Baorui Ma et.al.	2412.06699v1	link
2024-12-09	Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation	Shun Zhang et.al.	2412.06664v1	null
2024-12-09	LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation	Haihang Wu et.al.	2412.06419v1	null
2024-12-09	Continual Learning for Segment Anything Model Adaptation	Jinglong Yang et.al.	2412.06418v1	link
2024-12-09	ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models	Bingchen Gong et.al.	2412.06292v1	null
2024-12-09	No Annotations for Object Detection in Art through Stable Diffusion	Patrick Ramos et.al.	2412.06286v1	link
2024-12-09	DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction	Yunheng Li et.al.	2412.06244v1	null
2024-12-09	Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings	Zhao Liu et.al.	2412.06134v1	null
2024-12-06	DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo	Junzhe Zhu et.al.	2412.05268v1	null
2024-12-06	Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization	Luca Masserano et.al.	2412.05244v1	null
2024-12-06	Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization	Samuel Schapiro et.al.	2412.05169v1	null
2024-12-06	A Practical Examination of AI-Generated Text Detectors for Large Language Models	Brian Tufts et.al.	2412.05139v1	null
2024-12-06	Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?	Seyed Amin Tabatabaei et.al.	2412.05137v1	null
2024-12-06	The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation	Ruoyu Wang et.al.	2412.05101v1	null
2024-12-06	HOLa: HoloLens Object Labeling	Michael Schwimmbeck et.al.	2412.04945v1	link
2024-12-06	$S^3$ : Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models	Xiaojie Yin et.al.	2412.04925v1	null
2024-12-06	StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching	Jixun Yao et.al.	2412.04724v1	null
2024-12-06	LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs	Xuan Chen et.al.	2412.04690v1	null
2024-12-05	Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail	Luca Bartolomei et.al.	2412.04472v1	link
2024-12-05	Grounding Descriptions in Images informs Zero-Shot Visual Recognition	Shaunak Halbe et.al.	2412.04429v1	link
2024-12-05	SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding	Rong Li et.al.	2412.04383v1	null
2024-12-05	Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting	Edoardo Cetin et.al.	2412.04368v1	null
2024-12-05	Towards Zero-shot 3D Anomaly Localization	Yizhou Wang et.al.	2412.04304v1	null
2024-12-05	3D Part Segmentation via Geometric Aggregation of 2D Visual Features	Marco Garosi et.al.	2412.04247v1	null
2024-12-05	Quantifying the Limits of Segment Anything Model: Analyzing Challenges in Segmenting Tree-Like and Low-Contrast Structures	Yixin Zhang et.al.	2412.04243v1	link
2024-12-05	Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image	Shuang Xu et.al.	2412.04201v1	null
2024-12-05	Unified Framework for Open-World Compositional Zero-shot Learning	Hirunima Jayasekara et.al.	2412.04083v1	link
2024-12-05	Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning	Shicheng Zhou et.al.	2412.04078v1	link
2024-12-04	The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control	Ruili Feng et.al.	2412.03568v1	null
2024-12-04	FLAIR: VLM with Fine-grained Language-informed Image Representations	Rui Xiao et.al.	2412.03561v1	link
2024-12-04	Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression	Junjie Wen et.al.	2412.03293v1	null
2024-12-04	Expanding Event Modality Applications through a Robust CLIP-Based Encoder	Sungheon Jeong et.al.	2412.03093v1	null
2024-12-04	ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction	Victor Junqiu Wei et.al.	2412.03075v1	null
2024-12-04	UTSD: Unified Time Series Diffusion Model	Xiangkai Ma et.al.	2412.03068v1	null
2024-12-03	A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications	Yixiang Qu et.al.	2412.02868v1	null
2024-12-03	Is Large-Scale Pretraining the Secret to Good Domain Generalization?	Piotr Teterwak et.al.	2412.02856v1	null
2024-12-03	Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation	Sarthak Kumar Maharana et.al.	2412.02837v1	null
2024-12-03	Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects	Abdurrahman Zeybey et.al.	2412.02803v1	null
2024-12-03	FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation	Kefan Chen et.al.	2412.02690v1	null
2024-12-03	Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks	Jinjin Cai et.al.	2412.02531v1	null
2024-12-03	LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization	Ethan Smith et.al.	2412.02352v1	null
2024-12-03	Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation	Zhi Qu et.al.	2412.02101v1	link
2024-12-03	Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion	Liu Liu et.al.	2412.02075v1	link
2024-12-02	PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving	Xuewen Luo et.al.	2412.02025v1	null
2024-12-04	The use of large language models to enhance cancer clinical trial educational materials	Mingye Gao et.al.	2412.01955v2	null
2024-12-02	RandAR: Decoder-only Autoregressive Visual Generation in Random Orders	Ziqi Pang et.al.	2412.01827v1	null
2024-12-02	COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training	Sanghwan Kim et.al.	2412.01814v1	link
2024-12-02	Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions	Chaoran Cheng et.al.	2412.01786v1	null
2024-12-02	T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Shukang Yin et.al.	2411.19951v2	link
2024-11-29	Reverse Thinking Makes LLMs Stronger Reasoners	Justin Chih-Yao Chen et.al.	2411.19865v1	null
2024-11-29	Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures	Alain Riou et.al.	2411.19806v1	null
2024-11-29	Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models	Kaican Li et.al.	2411.19757v1	link
2024-11-29	Multimodal Whole Slide Foundation Model for Pathology	Tong Ding et.al.	2411.19666v1	link
2024-11-29	LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification	Taja Kuzman et.al.	2411.19638v1	link
2024-11-29	Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling	Qirui Wu et.al.	2411.19492v1	null
2024-11-29	Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning	Siddhant Agarwal et.al.	2411.19418v1	null
2024-11-28	CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections	Mohamed Fazli Imam et.al.	2411.19346v1	link
2024-11-28	OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration	Yiming Zuo et.al.	2411.19278v1	link
2024-11-27	Diffusion Self-Distillation for Zero-Shot Customized Image Generation	Shengqu Cai et.al.	2411.18616v1	null
2024-11-27	Isolating authorship from content with semantic embeddings and contrastive learning	Javier Huertas-Tato et.al.	2411.18472v1	null
2024-11-27	SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation	Duc-Hai Pham et.al.	2411.18229v1	null
2024-11-27	DRS: Deep Question Reformulation With Structured Output	Zhecheng Li et.al.	2411.17993v1	link
2024-11-26	Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient	Zigeng Chen et.al.	2411.17787v1	link
2024-11-26	MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation	Harsh Singh et.al.	2411.17636v1	null
2024-11-26	ShowUI: One Vision-Language-Action Model for GUI Visual Agent	Kevin Qinghong Lin et.al.	2411.17465v1	link
2024-11-26	FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval	Jingyou Xie et.al.	2411.17454v1	null
2024-11-26	PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning	Zhen Sun et.al.	2411.17453v1	null
2024-11-26	CoA: Chain-of-Action for Generative Semantic Labels	Meng Wei et.al.	2411.17406v1	link
2024-11-26	vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation	Bastian Wittmann et.al.	2411.17386v1	null
2024-11-26	2D Matryoshka Training for Information Retrieval	Shuai Wang et.al.	2411.17299v1	link
2024-11-26	APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents	Jun Yu Chen et.al.	2411.17255v1	link
2024-11-26	Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors	Zhengfei Kuang et.al.	2411.17249v1	null
2024-11-26	Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration	Junyuan Deng et.al.	2411.17240v1	link
2024-11-25	Diffusion Features for Zero-Shot 6DoF Object Pose Estimation	Bernd Von Gimborn et.al.	2411.16668v1	null
2024-11-25	Generating Out-Of-Distribution Scenarios Using Language Models	Erfan Aasi et.al.	2411.16554v1	null
2024-11-25	TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation	Linqing Zhong et.al.	2411.16425v1	null
2024-11-25	Poster: Could Large Language Models Perform Network Management?	Zine el abidine Kherroubi et.al.	2411.16232v1	null
2024-11-25	SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Jungang Li et.al.	2411.16213v1	null
2024-11-25	Learn from Foundation Model: Fruit Detection Model without Manual Annotation	Yanan Wang et.al.	2411.16196v1	link
2024-11-25	Language Driven Occupancy Prediction	Zhu Yu et.al.	2411.16072v1	link
2024-11-25	Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models	Niloufar Alipour Talemi et.al.	2411.16018v1	null
2024-11-24	PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making	Jonathan Light et.al.	2411.15998v1	null
2024-11-24	Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition	Klara Janouskova et.al.	2411.15933v1	null
2024-11-22	Context-Aware Multimodal Pretraining	Karsten Roth et.al.	2411.15099v1	null
2024-11-22	Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models	Aurel X. Appius et.al.	2411.14917v1	null
2024-11-22	Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation	Huy Le et.al.	2411.14913v1	null
2024-11-22	Leveraging Hierarchical Prototypes as the Verbalizer for Implicit Discourse Relation Recognition	Wanqiu Long et.al.	2411.14880v1	null
2024-11-22	VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models	Camilo Chacón Sartori et.al.	2411.14832v1	null
2024-11-22	De-biased Multimodal Electrocardiogram Analysis	Haitao Li et.al.	2411.14795v1	null
2024-11-22	Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers	Hongbo Liu et.al.	2411.14789v1	null
2024-11-21	Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems	Qihao Yuan et.al.	2411.14594v1	link
2024-11-21	Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Yiming Zhang et.al.	2411.14401v1	null
2024-11-21	DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding	Tianhe Ren et.al.	2411.14347v1	link
2024-11-21	StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart	Jian Shi et.al.	2411.14295v1	null
2024-11-21	Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models	Iacopo Ghinassi et.al.	2411.14272v1	link
2024-11-21	Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs	Zeyu Dong et.al.	2411.14256v1	null
2024-11-21	Evaluating the Robustness of Analogical Reasoning in Large Language Models	Martha Lewis et.al.	2411.14215v1	link
2024-11-21	Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data	Xianda Guo et.al.	2411.14053v1	link
2024-11-21	Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion	Jinhong He et.al.	2411.13961v1	link
2024-11-21	Learning to Cooperate with Humans using Generative Agents	Yancheng Liang et.al.	2411.13934v1	link
2024-11-21	CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation	Lin Sun et.al.	2411.13836v1	link
2024-11-20	Find Any Part in 3D	Ziqi Ma et.al.	2411.13550v1	null
2024-11-20	BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework	Xu Zou et.al.	2411.13237v1	null
2024-11-20	Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding	Nabeel Seedat et.al.	2411.13163v1	null
2024-11-20	Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM	Jiawei Yu et.al.	2411.13159v1	null
2024-11-20	Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation	Johannes Pitz et.al.	2411.13148v1	null
2024-11-20	TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models	Xin Wang et.al.	2411.13136v1	null
2024-11-20	Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI	Yaşar Utku Alçalar et.al.	2411.13022v1	null
2024-11-20	Evaluating LLMs Capabilities Towards Understanding Social Dynamics	Anique Tahir et.al.	2411.13008v1	null
2024-11-19	Improving Controllability and Editability for Pretrained Text-to-Music Generation Models	Yixiao Zhang et.al.	2411.12641v1	null
2024-11-19	Instant Policy: In-Context Imitation Learning via Graph Diffusion	Vitalis Vosylius et.al.	2411.12633v1	null
2024-11-19	SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation	Ron Keuth et.al.	2411.12602v1	link
2024-11-19	Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing	Ruyi Ding et.al.	2411.12508v1	null
2024-11-19	Predicting User Intents and Musical Attributes from Music Discovery Conversations	Daeyong Kwon et.al.	2411.12254v1	link
2024-11-19	Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings	Iroro Orife et.al.	2411.12209v1	link
2024-11-19	A More Advanced Group Polarization Measurement Approach Based on LLM-Based Agents and Graphs	Zixin Liu et.al.	2411.12196v1	null
2024-11-19	UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning	Yuan Yuan et.al.	2411.12164v1	link
2024-11-19	HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments	Shuijing Liu et.al.	2411.12150v1	null
2024-11-18	VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation	Bangguo Yu et.al.	2411.11609v1	null
2024-11-18	Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting	Hongjun Wang et.al.	2411.11448v1	link
2024-11-18	Scalable Autoregressive Monocular Depth Estimation	Jinhong Wang et.al.	2411.11361v1	null
2024-11-18	Text-guided Zero-Shot Object Localization	Jingjing Wang et.al.	2411.11357v1	null
2024-11-18	Visual-Semantic Graph Matching Net for Zero-Shot Learning	Bowen Duan et.al.	2411.11351v1	link
2024-11-18	Zero-Shot Load Forecasting with Large Language Models	Wenlong Liao et.al.	2411.11350v1	null
2024-11-18	Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation	Peng Shu et.al.	2411.11295v1	null
2024-11-18	Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition	Yang Chen et.al.	2411.11288v1	null
2024-11-18	Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development	Ranjan Sapkota et.al.	2411.11285v1	null
2024-11-18	ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification	Son T. Luu et.al.	2411.11247v1	link
2024-11-15	Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting	Ziqi Xie et.al.	2411.10309v1	link
2024-11-15	CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation	Dengke Zhang et.al.	2411.10086v1	null
2024-11-15	'What did the Robot do in my Absence?' Video Foundation Models to Enhance Intermittent Supervision	Kavindie Katuwandeniya et.al.	2411.10016v1	null
2024-11-15	Zero-shot Voice Conversion with Diffusion Transformers	Songting Liu et.al.	2411.09943v1	link
2024-11-14	LLM Hallucination Reasoning with Zero-shot Knowledge Test	Seongmin Lee et.al.	2411.09689v1	null
2024-11-14	Script-centric behavior understanding for assisted autism spectrum disorder diagnosis	Wenxing Liu et.al.	2411.09413v1	null
2024-11-14	Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures	Shuzhi Gong et.al.	2411.09389v1	null
2024-11-14	Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet?	Aldo Marzullo et.al.	2411.09310v1	null
2024-11-14	Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching	Yuran Wang et.al.	2411.09151v1	null
2024-11-15	UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos	Chengbo Yuan et.al.	2411.09145v2	null
2024-11-13	Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training	Nghia Trung Ngo et.al.	2411.08785v1	null
2024-11-13	Measuring similarity between embedding spaces using induced neighborhood graphs	Tiago F. Tavares et.al.	2411.08687v1	null
2024-11-13	Zero-shot capability of SAM-family models for bone segmentation in CT scans	Caroline Magg et.al.	2411.08629v1	null
2024-11-13	Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent	Leonidas Askianakis et.al.	2411.08566v1	null
2024-11-13	CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs	Suhas S Kowshik et.al.	2411.08553v1	null
2024-11-13	An Information Theoretic Approach to Operationalize Right to Data Protection	Abhinav Java et.al.	2411.08506v1	null
2024-11-13	Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval	Yeong-Joon Ju et.al.	2411.08334v1	link
2024-11-12	Retrieval Augmented Time Series Forecasting	Kutay Tire et.al.	2411.08249v1	link
2024-11-12	Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing	Zitao Shuai et.al.	2411.08196v1	null
2024-11-12	LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models	Anoop Cherian et.al.	2411.08027v1	null
2024-11-12	Semantic Sleuth: Identifying Ponzi Contracts via Large Language Models	Cong Wu et.al.	2411.07498v1	null
2024-11-11	Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains	Katerina Korre et.al.	2411.07417v1	null
2024-11-11	Warmstarting for Scaling Language Models	Neeratyoy Mallik et.al.	2411.07340v1	null
2024-11-11	DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning	Zecheng Zhang et.al.	2411.07239v1	null
2024-11-11	The Super Weight in Large Language Models	Mengxia Yu et.al.	2411.07191v1	link
2024-11-11	NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics	David Robinson et.al.	2411.07186v1	null
2024-11-11	SAMPart3D: Segment Any Part in 3D Objects	Yunhan Yang et.al.	2411.07184v1	link
2024-11-11	Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models	Yanchen Wang et.al.	2411.07121v1	link
2024-11-11	Transformer verbatim in-context retrieval across time and scale	Kristijan Armeni et.al.	2411.07075v1	link
2024-11-11	MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps	Xue Xia et.al.	2411.06971v1	null
2024-11-11	Robust Fine-tuning of Zero-shot Models via Variance Reduction	Beier Zhu et.al.	2411.06966v1	link
2024-11-11	UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models	Jiachen Liang et.al.	2411.06921v1	null
2024-11-11	Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning	Hongsheng Zhang et.al.	2411.06764v1	null
2024-11-08	End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering	Dylan Goetting et.al.	2411.05755v1	link
2024-11-08	Asterisk: Keep it Simple*	Andrew Semenov et.al.	2411.05691v1	null
2024-11-08	Assessing Open-Source Large Language Models on Argumentation Mining Subtasks	Mohammad Yeghaneh Abkenar et.al.	2411.05639v1	null
2024-11-08	An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking	Zijian Chen et.al.	2411.05508v1	null
2024-11-08	WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models	Shengda Fan et.al.	2411.05451v1	link
2024-11-08	Enhancing Visual Classification using Comparative Descriptors	Hankyeol Lee et.al.	2411.05357v1	link
2024-11-08	ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving	Tao Ma et.al.	2411.05311v1	null
2024-11-07	Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities	Shengzhi Li et.al.	2411.05232v1	link
2024-11-07	Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation	Mu Yang et.al.	2411.05141v1	null
2024-11-07	SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation	Koichi Namekata et.al.	2411.04989v1	null
2024-11-07	DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning	Gaoyue Zhou et.al.	2411.04983v1	null
2024-11-07	Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games	Usman Anwar et.al.	2411.04976v1	link
2024-11-07	In the Era of Prompt Learning with Vision-Language Models	Ankit Jha et.al.	2411.04892v1	null
2024-11-07	Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks	Sanja Karilanova et.al.	2411.04760v1	null
2024-11-07	Vision Language Models are In-Context Value Learners	Yecheng Jason Ma et.al.	2411.04549v1	null
2024-11-07	Best Practices for Distilling Large Language Models into BERT for Web Search Ranking	Dezhi Ye et.al.	2411.04539v1	null
2024-11-07	Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models	Xinyu Zhang et.al.	2411.04530v1	null
2024-11-07	Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity	Robby Costales et.al.	2411.04466v1	null
2024-11-07	AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering	Yungeng Liu et.al.	2411.04440v1	link
2024-11-06	RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models	Maya Varma et.al.	2411.04097v1	link
2024-11-06	Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models	Minh Duc Bui et.al.	2411.03888v1	link
2024-11-06	SA3DIP: Segment Any 3D Instance with Potential 3D Priors	Xi Yang et.al.	2411.03819v1	link
2024-11-06	No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages	Youssef Mohamed et.al.	2411.03769v1	link
2024-11-06	Zero-shot Dynamic MRI Reconstruction with Global-to-local Diffusion Model	Yu Guan et.al.	2411.03723v1	null
2024-11-06	Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction	Muhammad Tayyab Khan et.al.	2411.03707v1	null
2024-11-06	3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement	Ziqi Lu et.al.	2411.03706v1	link
2024-11-06	Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual Question Evaluation in Engineering	Rujun Gao et.al.	2411.03659v1	null
2024-11-05	Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry	Anurag Acharya et.al.	2411.03542v1	null
2024-11-05	A Mamba Foundation Model for Time Series Forecasting	Haoyu Ma et.al.	2411.02941v1	null
2024-11-05	DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark	Haodong Li et.al.	2411.02733v1	link
2024-11-04	EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector	Deok-Hyeon Cho et.al.	2411.02625v1	link
2024-11-04	MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs	Sheng-Chieh Lin et.al.	2411.02571v1	null
2024-11-04	TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives	Maitreya Patel et.al.	2411.02545v1	null
2024-11-04	A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification	Sorouralsadat Fatemi et.al.	2411.02476v1	null
2024-11-04	Do Advanced Language Models Eliminate the Need for Prompt Engineering in Software Engineering?	Guoqing Wang et.al.	2411.02093v1	null
2024-11-04	CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching	Yu Pan et.al.	2411.02026v1	null
2024-11-04	Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models	Sharat Agarwal et.al.	2411.01925v1	null
2024-11-04	ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation	Hengkai Tan et.al.	2411.01850v1	null
2024-11-04	DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability	Bo Gao et.al.	2411.01819v1	null
2024-11-03	Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups	Răzvan-Alexandru Smădu et.al.	2411.01706v1	link
2024-11-03	Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli	Matthias Tangemann et.al.	2411.01505v1	link
2024-11-02	Task-Oriented Hierarchical Object Decomposition for Visuomotor Control	Jianing Qian et.al.	2411.01284v1	null
2024-11-02	MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction	Wang Zhao et.al.	2411.01226v1	link
2024-11-02	Transfer Learning for Finetuning Large Language Models	Tobias Strangmann et.al.	2411.01195v1	null
2024-10-31	DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models	Heng-Jui Chang et.al.	2410.24177v1	null
2024-11-02	$π_0$ : A Vision-Language-Action Flow Model for General Robot Control	Kevin Black et.al.	2410.24164v2	null
2024-10-31	Scaling Concept With Text-Guided Diffusion Models	Chao Huang et.al.	2410.24151v1	null
2024-10-31	Matchmaker: Self-Improving Large Language Model Programs for Schema Matching	Nabeel Seedat et.al.	2410.24105v1	null
2024-10-31	In-Context Fine-Tuning for Time-Series Foundation Models	Abhimanyu Das et.al.	2410.24087v1	null
2024-10-31	GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance	Shuaihang Yuan et.al.	2410.23978v1	null
2024-10-31	Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model	Hao Zhang et.al.	2410.23905v1	link
2024-10-31	EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection	Qinqian Lei et.al.	2410.23904v1	link
2024-10-31	The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge	Dake Guo et.al.	2410.23815v1	null
2024-10-31	RealMind: Zero-Shot EEG-Based Visual Decoding and Captioning Using Multi-Modal Models	Dongyang Li et.al.	2410.23754v1	null
2024-10-30	Multi-student Diffusion Distillation for Better One-step Generators	Yanke Song et.al.	2410.23274v1	null
2024-10-30	Partial Channel Dependence with Channel Masks for Time Series Foundation Models	Seunghan Lee et.al.	2410.23222v1	null
2024-10-30	Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks	Michael Matthews et.al.	2410.23208v1	link
2024-10-30	FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities	Jingge Xiao et.al.	2410.23160v1	link
2024-10-30	DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes	Jialiang Zhang et.al.	2410.23004v1	null
2024-10-30	SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset	Ngoc Dung Huynh et.al.	2410.22648v1	null
2024-10-30	SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms	Shuzhen Li et.al.	2410.22646v1	null
2024-10-29	RealCQA-V2 : Visual Premise Proving	Saleem Ahmed et.al.	2410.22492v1	null
2024-10-29	Local Policies Enable Zero-shot Long-horizon Manipulation	Murtaza Dalal et.al.	2410.22332v1	null
2024-10-29	Are Decoder-Only Large Language Models the Silver Bullet for Code Search?	Yuxuan Chen et.al.	2410.22240v1	link
2024-10-29	Active Learning for Vision-Language Models	Bardia Safaei et.al.	2410.22187v1	null
2024-10-29	Data Generation for Hardware-Friendly Post-Training Quantization	Lior Dikstein et.al.	2410.22110v1	link
2024-10-29	PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement	Shutong Jin et.al.	2410.22059v1	null
2024-10-29	Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation	Halil Utku Unlu et.al.	2410.21926v1	null
2024-10-30	Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models	Lu Yu et.al.	2410.21802v2	link
2024-10-29	Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer	Zihan Pengmei et.al.	2410.21683v1	null
2024-10-28	SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval	Isidora Chara Tourni et.al.	2410.21501v1	null
2024-10-28	SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization	Wanhua Li et.al.	2410.21411v1	link
2024-10-28	Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback	Nour Jedidi et.al.	2410.21242v1	null
2024-10-28	Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments	Marharyta Domnich et.al.	2410.21131v1	link
2024-10-28	Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model	Yang Tan et.al.	2410.21127v1	link
2024-10-28	Zero-Shot Action Recognition in Surveillance Videos	Joao Pereira et.al.	2410.21113v1	null
2024-10-28	Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation	Shuaihang Yuan et.al.	2410.21037v1	null
2024-10-28	Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies	Franck Djeumou et.al.	2410.20990v1	null
2024-10-28	DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning	Xun Guo et.al.	2410.20964v1	link
2024-10-28	MrT5: Dynamic Token Merging for Efficient Byte-level Language Models	Julie Kallini et.al.	2410.20771v1	link
2024-10-28	Face-MLLM: A Large Face Perception Model	Haomiao Sun et.al.	2410.20717v1	null
2024-10-28	Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design	Xiangxin Zhou et.al.	2410.20688v1	link
2024-10-25	Adversarial Environment Design via Regret-Guided Diffusion Models	Hojun Chung et.al.	2410.19715v1	null
2024-10-25	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Xiangyu Zeng et.al.	2410.19702v1	null
2024-10-25	IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation	Kaixian Qu et.al.	2410.19697v1	null
2024-10-25	Context-Based Visual-Language Place Recognition	Soojin Woo et.al.	2410.19341v1	link
2024-10-25	Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting	Xingyu Zhu et.al.	2410.19294v1	null
2024-10-24	Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models	Yue Li et.al.	2410.19195v1	null
2024-10-24	AlignCap: Aligning Speech Emotion Captioning to Human Preferences	Ziqi Liang et.al.	2410.19134v1	null
2024-10-24	ConceptDrift: Uncovering Biases through the Lens of Foundational Models	Cristian Daniel Păduraru et.al.	2410.18970v1	null
2024-10-24	BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning	Yujuan Velvin Fu et.al.	2410.18955v1	null
2024-10-24	SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment	Caelan Garrett et.al.	2410.18907v1	null
2024-10-24	Probabilistic Language-Image Pre-Training	Sanghyuk Chun et.al.	2410.18857v1	link
2024-10-24	Task Calibration: Calibrating Large Language Models on Inference Tasks	Yingjie Li et.al.	2410.18764v1	null
2024-10-24	Data Scaling Laws in Imitation Learning for Robotic Manipulation	Fanqi Lin et.al.	2410.18647v1	null
2024-10-24	Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data	Anup Shirgaonkar et.al.	2410.18588v1	null
2024-10-24	Zero-shot Object Navigation with Vision-Language Models Reasoning	Congcong Wen et.al.	2410.18570v1	null
2024-10-24	Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics	Jinghao Hu et.al.	2410.18537v1	null
2024-10-24	Scaling up Masked Diffusion Models on Text	Shen Nie et.al.	2410.18514v1	link
2024-10-23	Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases	Anna Glazkova et.al.	2410.18040v1	null
2024-10-23	Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models	Nils Blank et.al.	2410.17772v1	null
2024-10-23	Learning Versatile Skills with Curriculum Masking	Yao Tang et.al.	2410.17744v1	link
2024-10-23	Entity-based Reinforcement Learning for Autonomous Cyber Defence	Isaac Symes Thompson et.al.	2410.17647v1	link
2024-10-23	Incremental Learning of Affordances using Markov Logic Networks	George Potter et.al.	2410.17624v1	null
2024-10-23	Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective	Rui Yang et.al.	2410.17600v1	null
2024-10-23	Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors	Bang You et.al.	2410.17551v1	null
2024-10-23	Generalizable Motion Planning via Operator Learning	Sharath Matada et.al.	2410.17547v1	null
2024-10-23	X-MOBILITY: End-To-End Generalizable Navigation via World Modeling	Wei Liu et.al.	2410.17491v1	null
2024-10-22	Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval	Yuanmin Tang et.al.	2410.17393v1	null
2024-10-22	Altogether: Image Captioning via Re-aligning Alt-text	Hu Xu et.al.	2410.17251v1	link
2024-10-22	LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias	Haian Jin et.al.	2410.17242v1	null
2024-10-22	Are Visual-Language Models Effective in Action Recognition? A Comparative Study	Mahmoud Ali et.al.	2410.17149v1	null
2024-10-22	LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging	Ke Wang et.al.	2410.17146v1	link
2024-10-22	SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine	Xiaochen Wang et.al.	2410.17021v1	null
2024-10-22	Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations	Cheng Lei et.al.	2410.16953v1	null
2024-10-22	DNAHLM -- DNA sequence and Human Language mixed large language Model	Wang Liang et.al.	2410.16917v1	link
2024-10-22	AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models	Yongjian Wu et.al.	2410.16820v1	link
2024-10-22	PLDR-LLM: Large Language Model from Power Law Decoder Representations	Burc Gokden et.al.	2410.16703v1	link
2024-10-22	GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting	Pai Zhu et.al.	2410.16647v1	null
2024-10-21	MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report	Samrajya Thapa et.al.	2410.16239v1	link
2024-10-21	IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems	Yihuan Mao et.al.	2410.16237v1	null
2024-10-21	Continuous Speech Synthesis using per-token Latent Diffusion	Arnon Turetzky et.al.	2410.16048v1	null
2024-10-21	Few-shot target-driven instance detection based on open-vocabulary object detection models	Ben Crulis et.al.	2410.16028v1	null
2024-10-21	Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly	Junsheng Zhou et.al.	2410.15971v1	null
2024-10-21	Mitigating Object Hallucination via Concentric Causal Attention	Yun Xing et.al.	2410.15926v1	link
2024-10-21	MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images	Pablo Meseguer et.al.	2410.15881v1	null
2024-10-21	Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images	Yiming Li et.al.	2410.15879v1	null
2024-10-21	FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL	Woosung Koh et.al.	2410.15876v1	null
2024-10-21	Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment	Yankai Jiang et.al.	2410.15744v1	null
2024-10-18	BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities	Shaozhe Hao et.al.	2410.14672v1	link
2024-10-18	Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum	Ryan Soh-Eun Shim et.al.	2410.14589v1	null
2024-10-18	SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning	Magdalena Wysocka et.al.	2410.14399v1	null
2024-10-18	AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios	Ziming Huang et.al.	2410.14379v1	link
2024-10-18	Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Josiah Aklilu et.al.	2410.14340v1	null
2024-10-18	Storyboard guided Alignment for Fine-grained Video Action Recognition	Enqi Liu et.al.	2410.14238v1	null
2024-10-18	Assessing Open-world Forgetting in Generative Image Model Customization	Héctor Laria et.al.	2410.14159v1	null
2024-10-17	Measuring and Modifying the Readability of English Texts with GPT-4	Sean Trott et.al.	2410.14028v1	link
2024-10-17	Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens	Lijie Fan et.al.	2410.13863v1	null
2024-10-17	VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding	Runsen Xu et.al.	2410.13860v1	link
2024-10-17	DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control	Yujie Wei et.al.	2410.13830v1	null
2024-10-17	AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents	Ke Yang et.al.	2410.13825v1	null
2024-10-17	Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers	Yuchen Liang et.al.	2410.13746v1	null
2024-10-17	ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions	Shailaja Keyur Sampat et.al.	2410.13662v1	link
2024-10-17	Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?	Shailaja Keyur Sampat et.al.	2410.13651v1	link
2024-10-18	Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything	Joonhyeon Song et.al.	2410.13621v2	link
2024-10-17	Large Language Models as Narrative-Driven Recommenders	Lukas Eberhard et.al.	2410.13604v1	null
2024-10-17	Representing Model Weights with Language using Tree Experts	Eliahu Horwitz et.al.	2410.13569v1	null
2024-10-16	In-Context Learning Enables Robot Action Prediction in LLMs	Yida Yin et.al.	2410.12782v1	null
2024-10-16	Towards Zero-Shot Camera Trap Image Categorization	Jiří Vyskočil et.al.	2410.12769v1	null
2024-10-16	Towards Graph Foundation Models: The Perspective of Zero-shot Reasoning on Knowledge Graphs	Kai Wang et.al.	2410.12609v1	null
2024-10-16	A Claim Decomposition Benchmark for Long-form Answer Verification	Zhihao Zhang et.al.	2410.12558v1	link
2024-10-16	SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling	Loris Gaven et.al.	2410.12481v1	null
2024-10-16	SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset	Xuyuan Li et.al.	2410.12399v1	null
2024-10-16	ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs	Rui-Chen Zheng et.al.	2410.12359v1	null
2024-10-16	MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation	An-Sheng Lee et.al.	2410.12330v1	link
2024-10-16	Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety	Lucas Choi et.al.	2410.12225v1	null
2024-10-15	Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming	Yilun Hao et.al.	2410.12112v1	null
2024-10-15	FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting	Zhe Li et.al.	2410.11802v1	null
2024-10-15	Time-Series Foundation Model for Value-at-Risk	Anubha Goel et.al.	2410.11773v1	link
2024-10-15	Zero-shot Model-based Reinforcement Learning using Large Language Models	Abdelhakim Benechehab et.al.	2410.11711v1	link
2024-10-15	PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning	Man Liu et.al.	2410.11560v1	null
2024-10-15	AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data	Xinjie Zhao et.al.	2410.11531v1	null
2024-10-15	Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction	Renhang Liu et.al.	2410.11522v1	link
2024-10-15	Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement	Zhi Wang et.al.	2410.11448v1	link
2024-10-15	DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM	Yingjun Shen et.al.	2410.11373v1	null
2024-10-15	Enhance Graph Alignment for Large Language Models	Haitong Luo et.al.	2410.11370v1	null
2024-10-15	In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions	Alireza Shamshiri et.al.	2410.11265v1	null
2024-10-14	Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models	Jingzhi Bao et.al.	2410.10821v1	link
2024-10-14	Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations	Litu Rout et.al.	2410.10792v1	null
2024-10-14	SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators	Rasoul Shafipour et.al.	2410.10714v1	null
2024-10-14	MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer	Minghao Zhu et.al.	2410.10589v1	link
2024-10-14	Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios?	Zeno Vandenbulcke et.al.	2410.10576v1	null
2024-10-14	Continual Learning Improves Zero-Shot Action Recognition	Shreyank N Gowda et.al.	2410.10497v1	null
2024-10-14	Learning to Ground VLMs without Forgetting	Aritra Bhowmik et.al.	2410.10491v1	null
2024-10-14	Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Xu Liu et.al.	2410.10469v1	null
2024-10-14	4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting	Wanlin Liang et.al.	2410.10412v1	null
2024-10-14	GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation	Taha Aksu et.al.	2410.10393v1	link
2024-10-11	Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures	Evan Lucas et.al.	2410.08971v1	null
2024-10-11	NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models	Zheng Yi Ho et.al.	2410.08970v1	null
2024-10-11	Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images	Virmarie Maquiling et.al.	2410.08926v1	null
2024-10-11	SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation	Haosheng Li et.al.	2410.08901v1	null
2024-10-11	A Benchmark for Cross-Domain Argumentative Stance Classification on Social Media	Jiaqing Yuan et.al.	2410.08900v1	null
2024-10-11	RoRA-VLM: Robust Retrieval-Augmented Vision Language Models	Jingyuan Qi et.al.	2410.08876v1	null
2024-10-11	One-shot Generative Domain Adaptation in 3D GANs	Ziqiang Li et.al.	2410.08824v1	link
2024-10-11	Zero-Shot Offline Imitation Learning via Optimal Transport	Thomas Rupf et.al.	2410.08751v1	link
2024-10-11	Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers	Jin Cao et.al.	2410.08688v1	link
2024-10-11	Boosting Open-Vocabulary Object Detection by Handling Background Samples	Ruizhe Zeng et.al.	2410.08645v1	null
2024-10-10	LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts	Anh-Quan Cao et.al.	2410.08211v1	null
2024-10-10	SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation	Hang Yin et.al.	2410.08189v1	null
2024-10-10	On the Evaluation of Generative Robotic Simulations	Feng Chen et.al.	2410.08172v1	null
2024-10-10	ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion	Zitian Zhang et.al.	2410.08168v1	null
2024-10-10	Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning	Vassil Atanassov et.al.	2410.07877v1	null
2024-10-10	RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation	Songming Liu et.al.	2410.07864v1	null
2024-10-10	Rewriting Conversational Utterances with Instructed Large Language Models	Elnara Galimzhanova et.al.	2410.07797v1	null
2024-10-10	The Power of Input: Benchmarking Zero-Shot Sim-To-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control	Alberto Dionigi et.al.	2410.07686v1	null
2024-10-10	Parallel Digital Twin-driven Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks	Zhenyu Tao et.al.	2410.07611v1	null
2024-10-10	CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features	Po-han Li et.al.	2410.07610v1	null
2024-10-09	AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation	Yukang Cao et.al.	2410.07164v1	null
2024-10-09	Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy	Tagore Rao Kosireddy et.al.	2410.07118v1	link
2024-10-09	Collusion Detection with Graph Neural Networks	Lucas Gomes et.al.	2410.07091v1	null
2024-10-09	Stanceformer: Target-Aware Transformer for Stance Detection	Krishna Garg et.al.	2410.07083v1	link
2024-10-09	Compositional Entailment Learning for Hyperbolic Vision-Language Models	Avik Pal et.al.	2410.06912v1	null
2024-10-09	F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching	Yushen Chen et.al.	2410.06885v1	link
2024-10-09	K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images	Mohamed Deriche et.al.	2410.06825v1	null
2024-10-09	Toward Physics-guided Time Series Embedding	Jiaxi Hu et.al.	2410.06651v1	null
2024-10-09	Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments	Meng Yu et.al.	2410.06626v1	null
2024-10-09	DCP: Learning Accelerator Dataflow for Neural Network via Propagation	Peng Xu et.al.	2410.06553v1	null
2024-10-07	Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality	Youngtaek Oh et.al.	2410.05210v1	link
2024-10-07	ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering	Francesco Maria Molfese et.al.	2410.05077v1	link
2024-10-07	PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing	Feng Tian et.al.	2410.04844v1	null
2024-10-07	LPZero: Language Model Zero-cost Proxy Search from Zero	Peijie Dong et.al.	2410.04808v1	null
2024-10-07	Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data	Matteo Risso et.al.	2410.04802v1	null
2024-10-07	Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering	Kazumoto Nakamura et.al.	2410.04801v1	null
2024-10-07	Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering	Zimu Wang et.al.	2410.04752v1	null
2024-10-07	ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction	Hyungjin Chung et.al.	2410.04721v1	null
2024-10-07	Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks	Yu-Hua Chen et.al.	2410.04702v1	null
2024-10-07	SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech	Minchan Kim et.al.	2410.04690v1	null
2024-10-04	GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs	Pu Hua et.al.	2410.03645v1	null
2024-10-04	What Matters for Model Merging at Scale?	Prateek Yadav et.al.	2410.03617v1	null
2024-10-04	Table Question Answering for Low-resourced Indic Languages	Vaishali Pal et.al.	2410.03576v1	link
2024-10-04	STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls	Ali Rabiee et.al.	2410.03486v1	null
2024-10-04	Zero-Shot Fact Verification via Natural Logic and Large Language Models	Marek Strong et.al.	2410.03341v1	link
2024-10-04	Selective Test-Time Adaptation for Unsupervised Anomaly Detection using Neural Implicit Representations	Sameer Ambekar et.al.	2410.03306v1	link
2024-10-04	Comparing zero-shot self-explanations with human rationales in multilingual text classification	Stephanie Brandl et.al.	2410.03296v1	null
2024-10-04	Enhanced Transformer architecture for in-context learning of dynamical systems	Matteo Rufolo et.al.	2410.03291v1	null
2024-10-04	What do Large Language Models Need for Machine Translation Evaluation?	Shenbin Qian et.al.	2410.03278v1	link
2024-10-04	PersoBench: Benchmarking Personalized Response Generation in Large Language Models	Saleh Afzoon et.al.	2410.03198v1	null
2024-10-03	Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations	Nick Jiang et.al.	2410.02762v1	link
2024-10-03	Training Language Models on Synthetic Edit Sequences Improves Code Synthesis	Ulyana Piterbarg et.al.	2410.02749v1	link
2024-10-03	Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers	Shijie Chen et.al.	2410.02642v1	null
2024-10-03	Plots Unlock Time-Series Understanding in Multimodal Models	Mayank Daswani et.al.	2410.02637v1	null
2024-10-03	LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model	Duy M. H. Nguyen et.al.	2410.02615v1	null
2024-10-03	Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment	Kai Liu et.al.	2410.02505v1	link
2024-10-03	Cross-Embodiment Dexterous Grasping with Reinforcement Learning	Haoqi Yuan et.al.	2410.02479v1	null
2024-10-03	Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations	Bohan Zhou et.al.	2410.02477v1	null
2024-10-03	Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification	Yunchuan Guan et.al.	2410.02267v1	link
2024-10-03	Visual Prompting in LLMs for Enhancing Emotion Recognition	Qixuan Zhang et.al.	2410.02244v1	null
2024-10-02	An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings	Soham Govande et.al.	2410.01704v1	link
2024-10-02	Saliency-Guided DETR for Moment Retrieval and Highlight Detection	Aleksandr Gordeev et.al.	2410.01615v1	link
2024-10-02	Coordinate-Based Neural Representation Enabling Zero-Shot Learning for 3D Multiparametric Quantitative MRI	Guoyan Lao et.al.	2410.01577v1	null
2024-10-03	EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections	Francesc Net et.al.	2410.01536v2	link
2024-10-02	Toward a Holistic Evaluation of Robustness in CLIP Models	Weijie Tu et.al.	2410.01534v1	null
2024-10-02	SinkSAM: A Monocular Depth-Guided SAM Framework for Automatic Sinkhole Segmentation	Osher Rafaeli et.al.	2410.01473v1	link
2024-10-02	The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs	Hong Li et.al.	2410.01417v1	null
2024-10-02	AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment	Umair Nawaz et.al.	2410.01407v1	link
2024-10-02	Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots	Renkai Wu et.al.	2410.01395v1	link
2024-10-02	Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling	Yuguang Yang et.al.	2410.01350v1	null
2024-09-30	Uni $^2$ Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection	Yubin Wang et.al.	2409.20558v1	null
2024-09-30	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos	Md Mohaiminul Islam et.al.	2409.20557v1	null
2024-09-30	Robi Butler: Remote Multimodal Interactions with Household Robot Assistant	Anxing Xiao et.al.	2409.20548v1	null
2024-09-30	FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing	Lingling Cai et.al.	2409.20500v1	null
2024-10-01	Instance-adaptive Zero-shot Chain-of-Thought Prompting	Xiaosong Yuan et.al.	2409.20441v2	null
2024-09-30	VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs	Ruotong Liao et.al.	2409.20365v1	link
2024-09-30	CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset	Akshatha Arodi et.al.	2409.20353v1	link
2024-09-30	RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning	Yuxuan Wu et.al.	2409.20291v1	null
2024-09-30	Analysing Zero-Shot Readability-Controlled Sentence Simplification	Abdullah Barayan et.al.	2409.20246v1	null
2024-09-30	VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	Huilin Deng et.al.	2409.20146v1	null
2024-09-27	Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs	Yanyuan Qiao et.al.	2409.18794v1	null
2024-09-27	When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation	Yuli Zhou et.al.	2409.18653v1	link
2024-09-27	Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations	Nicolò Penzo et.al.	2409.18602v1	link
2024-09-27	"Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models	Ricardo Knauer et.al.	2409.18594v1	null
2024-09-27	EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis	Haoyu Wang et.al.	2409.18512v1	null
2024-09-27	Exploring Language Model Generalization in Low-Resource Extractive QA	Saptarshi Sengupta et.al.	2409.18446v1	link
2024-09-26	AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models	Xin Hong et.al.	2409.18339v1	null
2024-09-26	Learning to Drive via Asymmetric Self-Play	Chris Zhang et.al.	2409.18218v1	null
2024-09-26	Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction	Jing He et.al.	2409.18124v1	null
2024-09-26	GSON: A Group-based Social Navigation Framework with Large Multimodal Model	Shangyi Luo et.al.	2409.18084v1	null
2024-09-26	FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction	Runze He et.al.	2409.18071v1	null
2024-09-26	DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving	Dingrui Wang et.al.	2409.18053v1	link
2024-09-26	IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning	Soeun Lee et.al.	2409.18046v1	link
2024-09-26	Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy	Owen Henkel et.al.	2409.17904v1	null
2024-09-26	Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models	Hui-Po Wang et.al.	2409.17836v1	link
2024-09-27	Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model	Nilanjan Sinhababu et.al.	2409.17745v2	null
2024-09-26	AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status	Jinghao Zhang et.al.	2409.17740v1	null
2024-09-26	Robust Ladder Climbing with a Quadrupedal Robot	Dylan Vogel et.al.	2409.17731v1	null
2024-09-25	Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?	Bowen Zhao et.al.	2409.17080v1	link
2024-09-25	ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis	Fangshuo Zhou et.al.	2409.17049v1	link
2024-09-25	Detecting Temporal Ambiguity in Questions	Bhawna Piryani et.al.	2409.17046v1	link
2024-09-25	Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness	Shixuan Ma et.al.	2409.16914v1	link
2024-09-25	Pruning Multilingual Large Language Models for Multilingual Inference	Hwichan Kim et.al.	2409.16911v1	link
2024-09-25	Multi-objective Evolution of Heuristic Using Large Language Model	Shunyu Yao et.al.	2409.16867v1	null
2024-09-25	Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation	Yulin Wang et.al.	2409.16818v1	link
2024-09-25	Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification	Ming Li et.al.	2409.16718v1	link
2024-09-24	Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval	Qiuhai Zeng et.al.	2409.16497v1	null
2024-09-24	BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes	Kasun Weerakoon et.al.	2409.16484v1	null
2024-09-24	Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation	Homanga Bharadhwaj et.al.	2409.16283v1	null
2024-09-24	Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation	Hannah Kerner et.al.	2409.16252v1	link
2024-09-24	Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech	Yunji Chu et.al.	2409.16203v1	null
2024-09-24	HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection	Yuqi Ma et.al.	2409.16136v1	null
2024-09-24	Evaluation of state-of-the-art ASR Models in Child-Adult Interactions	Aditya Ashvin et.al.	2409.16135v1	null
2024-09-24	Bridging Environments and Language with Rendering Functions and Vision-Language Models	Theo Cachet et.al.	2409.16024v1	null
2024-09-24	Finetuning LLMs for Comparative Assessment Tasks	Vatsal Raina et.al.	2409.15979v1	null
2024-09-24	StyleSinger 2: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control	Yu Zhang et.al.	2409.15977v1	link
2024-09-24	SLIMER-IT: Zero-Shot NER on Italian Language	Andrew Zamai et.al.	2409.15933v1	link
2024-09-24	Zero-Shot Detection of AI-Generated Images	Davide Cozzolino et.al.	2409.15875v1	null
2024-09-24	Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models	Sijing Chen et.al.	2409.12139v3	null
2024-09-18	IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition	Rui Liu et.al.	2409.12092v1	null
2024-09-18	Efficacy of Synthetic Data as a Benchmark	Gaurav Maheshwari et.al.	2409.11968v1	null
2024-09-18	GauTOAO: Gaussian-based Task-Oriented Affordance of Objects	Jiawen Wang et.al.	2409.11941v1	null
2024-09-18	LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models	Amaia Cardiel et.al.	2409.11919v1	null
2024-09-18	ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images	Abhinaw Jagtap et.al.	2409.11874v1	null
2024-09-18	One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation	Finn Lukas Busch et.al.	2409.11764v1	null
2024-09-18	Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation	Haohan Guo et.al.	2409.11630v1	null
2024-09-17	Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification	Frederik Hagelskjær et.al.	2409.11512v1	null
2024-09-17	Enriching Datasets with Demographics through Large Language Models: What's in a Name?	Khaled AlNuaimi et.al.	2409.11491v1	null
2024-09-17	Says Who? Effective Zero-Shot Annotation of Focalization	Rebecca M. M. Hicke et.al.	2409.11390v1	null
2024-09-17	Towards Time Series Reasoning with LLMs	Winnie Chow et.al.	2409.11376v1	null
2024-09-17	Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think	Gonzalo Martin Garcia et.al.	2409.11355v1	link
2024-09-17	Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora	Francesco Nespoli et.al.	2409.11107v1	null
2024-09-17	TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation	Yansong Wu et.al.	2409.11047v1	null
2024-09-18	GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models	Hanjun Luo et.al.	2409.11022v2	link
2024-09-17	Relative Representations: Topological and Geometric Perspectives	Alejandro García-Castellanos et.al.	2409.10967v1	link
2024-09-17	Multi-Floor Zero-Shot Object Navigation Policy	Lingfeng Zhang et.al.	2409.10906v1	null
2024-09-17	Implicit Reasoning in Deep Time Series Forecasting	Willa Potosnak et.al.	2409.10840v1	null
2024-09-18	Context-Dependent Interactable Graphical User Interface Element Detection for Spatial Computing Applications	Shuqing Li et.al.	2409.10811v2	null
2024-09-16	Do Pre-trained Vision-Language Models Encode Object States?	Kaleb Newman et.al.	2409.10488v1	null
2024-09-16	Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation	Hanbo Bi et.al.	2409.10389v1	null
2024-09-16	beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems	Vojtěch Vančura et.al.	2409.10309v1	link
2024-09-16	SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps	Jakub Gregorek et.al.	2409.10202v1	null
2024-09-16	SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting	Mohammad Nomaan Qureshi et.al.	2409.10161v1	null
2024-09-16	StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion	Yinghao Aaron Li et.al.	2409.10058v1	null
2024-09-16	A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models	Ryandhimas E. Zezario et.al.	2409.09914v1	null
2024-09-15	GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion	Vitor Guizilini et.al.	2409.09896v1	null
2024-09-15	PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics	Yuxuan Liu et.al.	2409.09811v1	null
2024-09-15	Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models	Yuan-Hong Liao et.al.	2409.09788v1	null
2024-09-13	Data Efficient Child-Adult Speaker Diarization with Simulated Conversations	Anfeng Xu et.al.	2409.08881v1	link
2024-09-13	A RAG Approach for Generating Competency Questions in Ontology Engineering	Xueli Pan et.al.	2409.08820v1	null
2024-09-13	Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling	Jialu Tang et.al.	2409.08788v1	null
2024-09-13	HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit	Yang Li et.al.	2409.08767v1	null
2024-09-13	DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset	Jiawei Du et.al.	2409.08731v1	link
2024-09-13	Eir: Thai Medical Large Language Models	Yutthakorn Thiprak et.al.	2409.08523v1	null
2024-09-13	GroundingBooth: Grounding Text-to-Image Customization	Zhexiao Xiong et.al.	2409.08520v1	null
2024-09-13	Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection	Haoxuan Wang et.al.	2409.08513v1	link
2024-09-12	SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer	Helin Wang et.al.	2409.08425v1	link
2024-09-12	Sequential Discrete Action Selection via Blocking Conditions and Resolutions	Liam Merz Hoffmeister et.al.	2409.08410v1	null
2024-09-12	DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors	Thomas Hanwen Zhu et.al.	2409.08278v1	null
2024-09-12	AnySkin: Plug-and-play Skin Sensing for Robotic Touch	Raunaq Bhirangi et.al.	2409.08276v1	null
2024-09-12	Fine-tuning Large Language Models for Entity Matching	Aaron Steiner et.al.	2409.08185v1	link
2024-09-12	The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal	Huiyuan Xie et.al.	2409.08098v1	null
2024-09-12	EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance	Zicheng Duan et.al.	2409.08091v1	link
2024-09-12	Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations	Wangjin Zhou et.al.	2409.08039v1	null
2024-09-12	From Explanations to Action: A Zero-Shot, Theory-Driven LLM Framework for Student Performance Feedback	Vinitra Swamy et.al.	2409.08027v1	null
2024-09-11	Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models	Matthieu Dubois et.al.	2409.07615v1	null
2024-09-11	Minimizing Embedding Distortion for Robust Out-of-Distribution Performance	Tom Shaked et.al.	2409.07582v1	null
2024-09-11	SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis	Helin Wang et.al.	2409.07556v1	link
2024-09-11	Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence	Luo Ji et.al.	2409.07341v1	null
2024-09-11	Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering	Weixi Weng et.al.	2409.07331v1	null
2024-09-11	PaveSAM Segment Anything for Pavement Distress	Neema Jakisa Owor et.al.	2409.07295v1	null
2024-09-11	A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study	Faiz Ali Shah et.al.	2409.07162v1	link
2024-09-11	Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment	Tien-Hong Lo et.al.	2409.07151v1	null
2024-09-11	Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations	Keumgang Cha et.al.	2409.07048v1	null
2024-09-10	ExIQA: Explainable Image Quality Assessment Using Distortion Attributes	Sepehr Kazemi Ranjbar et.al.	2409.06853v1	null
2024-09-10	Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts	Eleftheria Briakou et.al.	2409.06790v1	null
2024-09-11	EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis	Danli Shi et.al.	2409.06644v2	null
2024-09-10	DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots	Maria Bauza et.al.	2409.06613v1	null
2024-09-10	An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition	Yi-Cheng Wang et.al.	2409.06468v1	null
2024-09-10	SpeechTaxi: On Multilingual Semantic Speech Classification	Lennart Keller et.al.	2409.06372v1	null
2024-09-10	MAGDA: Multi-agent guideline-driven diagnostic assistance	David Bani-Harouni et.al.	2409.06351v1	null
2024-09-10	PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching	Daniel Rose et.al.	2409.06316v1	null
2024-09-10	Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings	Sakshi Deo Shukla et.al.	2409.06222v1	link
2024-09-10	Revisiting Prompt Pretraining of Vision-Language Models	Zhenyuan Chen et.al.	2409.06166v1	null
2024-09-09	Differentiable programming across the PDE and Machine Learning barrier	Nacime Bouziani et.al.	2409.06085v1	null
2024-09-09	FairHome: A Fair Housing and Fair Lending Dataset	Anusha Bagalkotkar et.al.	2409.05990v1	null
2024-09-09	Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments	Haritheja Etukuru et.al.	2409.05865v1	link
2024-09-10	Evaluating Multiview Object Consistency in Humans and Image Models	Tyler Bonnen et.al.	2409.05862v2	link
2024-09-09	A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation	Qi Jiang et.al.	2409.05809v1	null
2024-09-09	AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations	Jingtao Li et.al.	2409.05679v1	null
2024-09-09	Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone!	Yuchen Shen et.al.	2409.05672v1	null
2024-09-09	CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning	Jinwei He et.al.	2409.05559v1	null
2024-09-09	EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels	Qingyao Tian et.al.	2409.05442v1	link
2024-09-09	From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models	Tessa Pulli et.al.	2409.05413v1	null
2024-09-09	NLLB-E5: A Scalable Multilingual Retrieval Model	Arkadeep Acharya et.al.	2409.05401v1	null
2024-09-09	IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS	Ashwin Sankar et.al.	2409.05356v1	link
2024-09-06	FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning	Yunhao Bai et.al.	2409.04298v1	link
2024-09-06	Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering	Jan Hofmann et.al.	2409.04122v1	null
2024-09-06	UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity	Yicheng Fu et.al.	2409.04081v1	null
2024-09-06	AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model	Zeyu Zhang et.al.	2409.04073v1	link
2024-09-06	Refining Wikidata Taxonomy using Large Language Models	Yiwen Peng et.al.	2409.04056v1	link
2024-09-05	Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning	Isaac Ray et.al.	2409.03938v1	null
2024-09-05	A deep learning approach to wall-shear stress quantification: From numerical training to zero-shot experimental application	Esther Lagemann et.al.	2409.03933v1	null
2024-09-05	Few-shot Adaptation of Medical Vision-Language Models	Fereshteh Shakeri et.al.	2409.03868v1	link
2024-09-05	View-Invariant Policy Learning via Zero-Shot Novel View Synthesis	Stephen Tian et.al.	2409.03685v1	null
2024-09-05	Text-Guided Mixup Towards Long-Tailed Image Categorization	Richard Franklin et.al.	2409.03583v1	link
2024-09-05	FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation	Xi Chen et.al.	2409.03525v1	null
2024-09-05	Have Large Vision-Language Models Mastered Art History?	Ombretta Strafforello et.al.	2409.03521v1	null
2024-09-05	RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning	Lawrence Yunliang Chen et.al.	2409.03403v1	null
2024-09-05	Bringing the RT-1-X Foundation Model to a SCARA robot	Jonathan Salzer et.al.	2409.03299v1	null
2024-09-05	LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts	Henrique Da Silva Gameiro et.al.	2409.03291v1	link
2024-09-05	iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models	Yassir Lairgi et.al.	2409.03284v1	link
2024-09-05	FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications	Hao-Han Guo et.al.	2409.03283v1	null
2024-09-04	Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection	Kaiqing Lin et.al.	2409.02664v1	null
2024-09-04	Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation	Tiantian Zhang et.al.	2409.02567v1	link
2024-09-04	StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models	Wen Li et.al.	2409.02543v1	link
2024-09-04	Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts	Arianna Muti et.al.	2409.02519v1	null
2024-09-04	Dispelling Four Challenges in Inertial Motion Tracking with One Recurrent Inertial Graph-based Estimator (RING)	Simon Bachhuber et.al.	2409.02502v1	null
2024-09-04	Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization	Cho-Ying Wu et.al.	2409.02486v1	null
2024-09-04	Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning	Guanwen Xie et.al.	2409.02428v1	null
2024-09-03	Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems	Sanjita Prajapati et.al.	2409.02278v1	null
2024-09-05	LinFusion: 1 GPU, 1 Minute, 16K Image	Songhua Liu et.al.	2409.02097v2	link
2024-09-03	DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos	Wenbo Hu et.al.	2409.02095v1	link
2024-08-30	Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding	Gueter Josmy Faure et.al.	2408.17443v1	link
2024-08-30	VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters	Mouxiang Chen et.al.	2408.17253v1	link
2024-08-30	Reasoning AI Performance Degradation in 6G Networks with Large Language Models	Liming Huang et.al.	2408.17097v1	null
2024-08-30	Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning	Fengyuan Dai et.al.	2408.17083v1	null
2024-08-29	Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD	Ondřej Pražák et.al.	2408.16893v1	link
2024-08-29	Fluent and Accurate Image Captioning with a Self-Trained Reward Model	Nicholas Moratelli et.al.	2408.16827v1	null
2024-08-29	PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning	Noor Hussein et.al.	2408.16769v1	link
2024-08-29	SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners	Ziyu Guo et.al.	2408.16768v1	link
2024-08-29	Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge	Beidi Dong et.al.	2408.16749v1	null
2024-08-29	LLMs generate structurally realistic social networks but overestimate political homophily	Serina Chang et.al.	2408.16629v1	link
2024-08-29	Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning	Zhengqing Gao et.al.	2408.16486v1	link
2024-08-29	WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding	Mohan Li et.al.	2408.16423v1	null
2024-08-29	Text-Enhanced Zero-Shot Action Recognition: A training-free approach	Massimo Bosetti et.al.	2408.16412v1	null
2024-08-29	Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning	Luyao Tang et.al.	2408.16310v1	link
2024-08-29	Training-free Video Temporal Grounding using Large-scale Pre-trained Models	Minghang Zheng et.al.	2408.16219v1	link
2024-08-28	CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases	Yannis Chronis et.al.	2408.16170v1	null
2024-08-29	Spatio-Temporal Context Prompting for Zero-Shot Action Detection	Wei-Jhe Huang et.al.	2408.15996v2	null
2024-08-28	Multi-modal Adversarial Training for Zero-Shot Voice Cloning	John Janiczek et.al.	2408.15916v1	null
2024-08-28	Visual Prompt Engineering for Medical Vision Language Models in Radiology	Stefan Denner et.al.	2408.15802v1	null
2024-08-28	Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions	Huachuan Qiu et.al.	2408.15787v1	link
2024-08-28	LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models	Max Ploner et.al.	2408.15729v1	null
2024-08-28	Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas	Fabio Quattrini et.al.	2408.15660v1	link
2024-08-28	Learning dynamics models for velocity estimation in autonomous racing	Jan Węgrzynowski et.al.	2408.15610v1	null
2024-08-28	Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation	Ziqian Ning et.al.	2408.15474v1	null
2024-08-28	Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance	Kunpeng Wang et.al.	2408.15063v2	link
2024-08-26	MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs	Ye Qiao et.al.	2408.15034v1	null
2024-08-27	Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning	Sakhinana Sagar Srinivas et.al.	2408.14964v1	null
2024-08-27	ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning	Wenjin Hou et.al.	2408.14868v1	null
2024-08-27	Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics	Yixuan Huang et.al.	2408.14769v1	null
2024-08-26	Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning	Xinyang Gu et.al.	2408.14472v1	link
2024-08-28	Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study	Liuchang Xu et.al.	2408.14438v2	null
2024-08-26	Uncertainties of Latent Representations in Computer Vision	Michael Kirchhof et.al.	2408.14281v1	null
2024-08-26	Self-supervised Speech Representations Still Struggle with African American Vernacular English	Kalvin Chang et.al.	2408.14262v1	link
2024-08-26	AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework	Jie Feng et.al.	2408.13986v1	link
2024-08-25	OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation	Muhammad Rameez ur Rahman et.al.	2408.13936v1	link
2024-08-25	Infrared Domain Adaptation with Zero-Shot Quantization	Burak Sevsay et.al.	2408.13925v1	null
2024-08-25	LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback	Tanushree Banerjee et.al.	2408.13915v1	null
2024-08-25	Splatt3R: Zero-shot Gaussian Splatting from Uncalibarated Image Pairs	Brandon Smart et.al.	2408.13912v1	null
2024-08-25	Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization	Jia-Run Du et.al.	2408.13777v1	link
2024-08-23	On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning	Tiago Tavares et.al.	2408.13068v1	null
2024-08-23	WildFusion: Individual Animal Identification with Calibrated Similarity Fusion	Vojtěch Cermak et.al.	2408.12934v1	link
2024-08-23	Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey	Yichi Zhang et.al.	2408.12889v1	link
2024-08-23	Predicting Affective States from Screen Text Sentiment	Songyan Teng et.al.	2408.12844v1	null
2024-08-23	Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery	Zhenyuan Yang et.al.	2408.12821v1	null
2024-08-23	VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models	Purushothaman Natarajan et.al.	2408.12808v1	link
2024-08-23	Cap2Sum: Learning to Summarize Videos by Generating Captions	Cairong Zhao et.al.	2408.12800v1	null
2024-08-22	Segment Anything Model for Grain Characterization in Hard Drive Design	Kai Nichols et.al.	2408.12732v1	null
2024-08-22	Cell-ontology guided transcriptome foundation model	Xinyu Yuan et.al.	2408.12373v1	null
2024-08-22	SAM-SP: Self-Prompting Makes SAM Great Again	Chunpeng Zhou et.al.	2408.12364v1	null
2024-08-22	Adapt CLIP as Aggregation Instructor for Image Dehazing	Xiaozhe Zhang et.al.	2408.12317v1	null
2024-08-22	Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations	Kai Tzu-iunn Ong et.al.	2408.12315v1	null
2024-08-23	Tactile-Morph Skills: Energy-Based Control Meets Data-Driven Learning	Anran Zhang et.al.	2408.12285v2	null
2024-08-22	Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning	Ziming Liu et.al.	2408.12253v1	null
2024-08-22	LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction	Aishik Nagar et.al.	2408.12249v1	null
2024-08-22	PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph	Yijin Xu et.al.	2408.12248v1	null
2024-08-22	OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion	Guoting Wei et.al.	2408.12246v1	link
2024-08-23	Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment	Kun Luo et.al.	2408.12194v2	null
2024-08-21	Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction	Anthony GX-Chen et.al.	2408.11816v1	null
2024-08-21	EmbodiedSAM: Online Segment Any 3D Thing in Real Time	Xiuwei Xu et.al.	2408.11811v1	null
2024-08-21	Iterative Object Count Optimization for Text-to-image Diffusion Models	Oz Zafar et.al.	2408.11721v1	null
2024-08-21	Memorization In In-Context Learning	Shahriar Golchin et.al.	2408.11546v1	null
2024-08-21	Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech	Anastasia Avdeeva et.al.	2408.11528v1	null
2024-08-21	XDT-CXR: Investigating Cross-Disease Transferability in Zero-Shot Binary Classification of Chest X-Rays	Umaima Rahman et.al.	2408.11493v1	link
2024-08-21	Enabling Small Models for Zero-Shot Classification through Model Label Learning	Jia Zhang et.al.	2408.11449v1	null
2024-08-21	EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning	Bohao Xing et.al.	2408.11424v1	link
2024-08-21	Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies	Sai Koneru et.al.	2408.11327v1	null
2024-08-21	Towards Evaluating Large Language Models on Sarcasm Understanding	Yazhou Zhang et.al.	2408.11319v1	null
2024-08-21	CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network	Zijian Zhao et.al.	2408.10919v2	null
2024-08-20	ViLReF: A Chinese Vision-Language Retinal Foundation Model	Shengzhu Yang et.al.	2408.10894v1	link
2024-08-20	Open 3D World in Autonomous Driving	Xinlong Cheng et.al.	2408.10880v1	null
2024-08-20	SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS	Karl El Hajal et.al.	2408.10771v1	null
2024-08-20	Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian	Cem Üyük et.al.	2408.10724v1	null
2024-08-20	AnyGraph: Graph Foundation Model in the Wild	Lianghao Xia et.al.	2408.10700v1	link
2024-08-20	Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches	Yanjie Dong et.al.	2408.10691v1	null
2024-08-20	A Review of Human-Object Interaction Detection	Yuxiao Wang et.al.	2408.10641v1	null
2024-08-20	LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models	Yupeng Su et.al.	2408.10631v1	link
2024-08-20	Generalizable Facial Expression Recognition	Yuhang Zhang et.al.	2408.10614v1	link
2024-08-19	SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models	Anke Tang et.al.	2408.10174v1	link
2024-08-19	Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track	Feiyu Pan et.al.	2408.10125v1	null
2024-08-19	GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization	Ran Liu et.al.	2408.10115v1	link
2024-08-19	Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision	Zhijun Jia et.al.	2408.10096v1	null
2024-08-19	CLIPCleaner: Cleaning Noisy Labels with CLIP	Chen Feng et.al.	2408.10012v1	link
2024-08-19	Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype	Yadong Lu et.al.	2408.09984v1	null
2024-08-19	Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision	Dario Zanca et.al.	2408.09948v1	null
2024-08-19	DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery	Corentin Dumery et.al.	2408.09928v1	null
2024-08-19	SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images	Sihan Yang et.al.	2408.09886v1	link
2024-08-19	Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving	Jun Yan et.al.	2408.09839v1	link
2024-08-16	DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models	Eman Ali et.al.	2408.08855v1	null
2024-08-16	ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis	Yubao Zhao et.al.	2408.08849v1	link
2024-08-16	EasyRec: Simple yet Effective Language Models for Recommendation	Xubin Ren et.al.	2408.08821v1	link
2024-08-16	ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language	Yongkang Liu et.al.	2408.08724v1	null
2024-08-16	TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning	Miaoge Li et.al.	2408.08703v1	null
2024-08-16	A Mean Field Ansatz for Zero-Shot Weight Transfer	Xingyuan Chen et.al.	2408.08681v1	null
2024-08-16	GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model	Xavier Riley et.al.	2408.08653v1	null
2024-08-16	Persona is a Double-edged Sword: Enhancing the Zero-shot Reasoning by Ensembling the Role-playing and Neutral Prompts	Junseok Kim et.al.	2408.08631v1	null
2024-08-16	Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation	Tri Ton et.al.	2408.08591v1	null
2024-08-16	CommunityKG-RAG: Leveraging Community Structures in Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking	Rong-Ching Chang et.al.	2408.08535v1	null
2024-08-15	ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws	Ruihang Li et.al.	2408.08310v1	null
2024-08-16	Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion	Abeer Aldayel et.al.	2408.08212v2	null
2024-08-15	Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging	Stefano Woerner et.al.	2408.08058v1	link
2024-08-15	LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning	Jiajie Li et.al.	2408.07981v1	null
2024-08-15	Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training	Yiming Li et.al.	2408.07919v1	link
2024-08-15	DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions	Ryosuke Korekata et.al.	2408.07910v1	null
2024-08-15	A Spitting Image: Modular Superpixel Tokenization in Vision Transformers	Marius Aasan et.al.	2408.07680v2	link
2024-08-14	Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques	Ivory Yang et.al.	2408.07676v1	null
2024-08-14	SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning	Jianye Xu et.al.	2408.07644v1	link
2024-08-14	Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health	Yongquan Hu et.al.	2408.07313v1	null
2024-08-14	MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing	Yongquan Hu et.al.	2408.07311v1	null
2024-08-14	GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval	Zechen Bai et.al.	2408.07249v1	null
2024-08-13	Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents	Pranav Putta et.al.	2408.07199v1	null
2024-08-13	PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping	Subash Khanal et.al.	2408.07050v1	link
2024-08-15	Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2	Osher Rafaeli et.al.	2408.06970v2	null
2024-08-13	How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts	Huichen Will Wang et.al.	2408.06837v1	null
2024-08-13	PRESENT: Zero-Shot Text-to-Prosody Control	Perry Lam et.al.	2408.06827v1	link
2024-08-13	Visual Neural Decoding via Improved Visual-EEG Semantic Consistency	Hongzhou Chen et.al.	2408.06788v1	null
2024-08-13	Do Vision-Language Foundational models show Robust Visual Perception?	Shivam Chandhok et.al.	2408.06781v1	link
2024-08-13	DC3DO: Diffusion Classifier for 3D Objects	Nursena Koprucu et.al.	2408.06693v1	link
2024-08-13	CROME: Cross-Modal Adapters for Efficient Multimodal LLM	Sayna Ebrahimi et.al.	2408.06610v1	null
2024-08-12	UniT: Unified Tactile Representation for Robot Learning	Zhengtong Xu et.al.	2408.06481v1	link
2024-08-12	From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model	Athulya Sundaresan Geetha et.al.	2408.06305v1	null
2024-08-12	Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM	Trisha Das et.al.	2408.06285v1	null
2024-08-12	A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution	Sampath Rajapaksha et.al.	2408.06272v1	null
2024-08-12	3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs)	Jaydeep Rade et.al.	2408.06244v1	null
2024-08-12	Zero-shot 3D Segmentation of Abdominal Organs in CT Scans Using Segment Anything Model 2: Adapting Video Tracking Capabilities for 3D Medical Imaging	Yosuke Yamagishi et.al.	2408.06170v1	null
2024-08-12	OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning	Mushui Liu et.al.	2408.06158v1	link
2024-08-12	Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction	Jakob Thumm et.al.	2408.06105v1	link
2024-08-12	Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces	Junrui Zhang et.al.	2408.06083v1	null
2024-08-12	Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games	Chiu-Chou Lin et.al.	2408.06051v1	link
2024-08-12	Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection	Yixin Guo et.al.	2408.05974v1	link
2024-08-09	Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement	Weiqing Yang et.al.	2408.05006v1	null
2024-08-09	SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement	Chaofan Li et.al.	2408.04919v1	null
2024-08-09	Towards a Generative Approach for Emotion Detection and Reasoning	Ankita Bhaumik et.al.	2408.04906v1	null
2024-08-09	ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation	Mengcheng Lan et.al.	2408.04883v1	link
2024-08-09	On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey	Jingcai Guo et.al.	2408.04879v1	link
2024-08-09	ChatGPT Meets Iris Biometrics	Parisa Farmanifard et.al.	2408.04868v1	null
2024-08-09	An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting	Rui Cao et.al.	2408.04867v1	link
2024-08-09	One Shot is Enough for Sequential Infrared Small Target Segmentation	Bingbing Dan et.al.	2408.04823v1	link
2024-08-09	FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers	Joshua Nathaniel Williams et.al.	2408.04816v1	link
2024-08-08	Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2	Andrew Seohwan Yu et.al.	2408.04762v1	null
2024-08-08	Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics	Ruining Li et.al.	2408.04631v1	null
2024-08-08	SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation	Jieming Yu et.al.	2408.04593v1	null
2024-08-08	SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals	Haoran Zheng et.al.	2408.04575v1	null
2024-08-08	Conversational Prompt Engineering	Liat Ein-Dor et.al.	2408.04560v1	null
2024-08-08	Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation	Daniele Rege Cambrin et.al.	2408.04523v1	link
2024-08-08	Model-Based Transfer Learning for Contextual Reinforcement Learning	Jung-Hoon Cho et.al.	2408.04498v1	link
2024-08-08	Towards Synergistic Deep Learning Models for Volumetric Cirrhotic Liver Segmentation in MRIs	Vandan Gorade et.al.	2408.04491v1	null
2024-08-08	KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination	Yin Gu et.al.	2408.04336v1	null
2024-08-08	Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP	François Remy et.al.	2408.04303v1	link
2024-08-08	Learning to Rewrite: Generalized LLM-Generated Text Detection	Wei Hao et.al.	2408.04237v1	null
2024-08-07	Achieving Human Level Competitive Robot Table Tennis	David B. D'Ambrosio et.al.	2408.03906v1	null
2024-08-07	Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond	Beomseok Lee et.al.	2408.03900v1	link
2024-08-07	Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning	Zi-Yi Dou et.al.	2408.03567v1	null
2024-08-07	Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving	Amirhosein Chahe et.al.	2408.03516v1	null
2024-08-07	Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval	Iman Azimi et.al.	2408.02964v2	link
2024-08-06	Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps	Yifan Zhu et.al.	2408.02949v1	null
2024-08-05	Interactive 3D Medical Image Segmentation with SAM 2	Chuyun Shen et.al.	2408.02635v1	link
2024-08-05	Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection	Ting Lei et.al.	2408.02484v1	link
2024-08-07	TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments	Daeun Song et.al.	2408.02454v2	null
2024-08-05	Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages	Carlos Mullov et.al.	2408.02290v1	null
2024-08-05	Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes	Dimitris Angelis et.al.	2408.02275v1	null
2024-08-05	Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts	Andong Tan et.al.	2408.02265v1	null
2024-08-05	Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets	Lucas Choi et.al.	2408.02244v1	null
2024-08-05	Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings	Md. Arid Hasan et.al.	2408.02237v1	null
2024-08-05	ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning	Yuxuan Wang et.al.	2408.02210v1	null
2024-08-05	Large-scale Deployment of Vision-based Tactile Sensors on Multi-fingered Grippers	Meng Wang et.al.	2408.02206v1	null
2024-08-02	Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features	Mengyu Bu et.al.	2408.01394v1	link
2024-08-02	Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation	Jheng-Hong Yang et.al.	2408.01363v1	null
2024-08-02	Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks	Anders Giovanni Møller et.al.	2408.01346v1	null
2024-08-02	Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework	Liuyuan Wen et.al.	2408.01284v1	link
2024-08-02	HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling	YiFan Hao et.al.	2408.01230v1	link
2024-08-05	Agentic LLM Workflows for Generating Patient-Friendly Medical Reports	Malavikha Sudarshan et.al.	2408.01112v2	link
2024-08-02	An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search	Hung-Nghiep Tran et.al.	2408.01094v1	null
2024-08-02	UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents	Yi Tu et.al.	2408.01038v1	null
2024-08-01	Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)	Bin Han et.al.	2408.00932v1	null
2024-08-01	Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	Siyu Jiao et.al.	2408.00744v1	link
2024-08-01	Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions	Guangzhi Xiong et.al.	2408.00727v1	link
2024-08-01	SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data	Yichen Lu et.al.	2408.00624v1	link
2024-08-01	A new approach for encoding code and assisting code understanding	Mengdan Fan et.al.	2408.00521v1	null
2024-08-01	GalleryGPT: Analyzing Paintings with Large Multimodal Models	Yi Bin et.al.	2408.00491v1	link
2024-08-01	SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement	Ze Wang et.al.	2408.00486v1	null
2024-08-01	Few-shot Defect Image Generation based on Consistency Modeling	Qingfeng Shi et.al.	2408.00372v1	link
2024-08-01	IN-Sight: Interactive Navigation through Sight	Philipp Schoch et.al.	2408.00343v1	null
2024-07-31	Open-Vocabulary Audio-Visual Semantic Segmentation	Ruohao Guo et.al.	2407.21721v1	null
2024-07-31	Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation	Xiang Luo et.al.	2407.21633v1	link
2024-07-31	EZSR: Event-based Zero-Shot Recognition	Yan Yang et.al.	2407.21616v1	null
2024-07-31	Fine-gained Zero-shot Video Sampling	Dengsheng Chen et.al.	2407.21475v1	null
2024-07-31	Generalized Tampered Scene Text Detection in the era of Generative AI	Chenfan Qu et.al.	2407.21422v1	null
2024-07-31	Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs	Elan Markowitz et.al.	2407.21358v1	link
2024-07-31	DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations	Dongwon Son et.al.	2407.21267v1	null
2024-07-30	Learning Stable Robot Grasping with Transformer-based Tactile Control Policies	En Yen Puang et.al.	2407.21172v1	link
2024-07-30	Zero Shot Health Trajectory Prediction Using Transformer	Pawel Renc et.al.	2407.21124v1	link
2024-07-30	Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian	Serena Auriemma et.al.	2407.20654v1	null
2024-07-30	Pruning Large Language Models with Semi-Structural Adaptive Sparse Training	Weiyu Huang et.al.	2407.20584v1	link
2024-07-29	Evaluating Large Language Models for automatic analysis of teacher simulations	David de-Fitero-Dominguez et.al.	2407.20360v1	null
2024-07-29	Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing	Ekaterina Iakovleva et.al.	2407.20232v1	null
2024-07-29	QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval	Hongming Tan et.al.	2407.20207v1	null
2024-07-29	Diffusion Feedback Helps CLIP See Better	Wenxuan Wang et.al.	2407.20171v1	link
2024-07-29	Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations	Fangyijie Wang et.al.	2407.20072v1	link
2024-07-29	Leveraging Foundation Models for Zero-Shot IoT Sensing	Dinghao Xue et.al.	2407.19893v1	link
2024-07-29	Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model	Zhenyu Tao et.al.	2407.19765v1	null
2024-07-29	Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation	Manish Bhattarai et.al.	2407.19619v1	null
2024-07-29	AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs	Muhammad Arbab Arshad et.al.	2407.19617v1	null
2024-07-28	XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training	Biao Wu et.al.	2407.19546v1	link
2024-07-28	Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis	Fatema Tuj Johora Faria et.al.	2407.19528v1	link
2024-07-26	Automatic Detection of Moral Values in Music Lyrics	Vjosa Preniqi et.al.	2407.18787v1	link
2024-07-26	Adversarial Robustification via Text-to-Image Diffusion Models	Daewon Choi et.al.	2407.18658v1	link
2024-07-29	Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks	Mahmoud Salhab et.al.	2407.18571v2	null
2024-07-26	Is larger always better? Evaluating and prompting large language models for non-generative medical tasks	Yinghao Zhu et.al.	2407.18525v1	link
2024-07-26	Lensless fiber endomicroscopic phase imaging with speckle-conditioned diffusion model	Zhaoqing Chen et.al.	2407.18456v1	null
2024-07-26	HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors	Ashkan Ganj et.al.	2407.18443v1	link
2024-07-25	HDL-GPT: High-Quality HDL is All You Need	Bhuvnesh Kumar et.al.	2407.18423v1	null
2024-07-25	Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation	Lining Yu et.al.	2407.18390v1	null
2024-07-25	Robust Claim Verification Through Fact Detection	Nazanin Jafari et.al.	2407.18367v1	link
2024-07-25	SSTD: Stripe-Like Space Target Detection using Single-Point Supervision	Zijian Zhu et.al.	2407.18097v1	null
2024-07-25	Audio Entailment: Assessing Deductive Reasoning for Audio Understanding	Soham Deshmukh et.al.	2407.18062v1	link
2024-07-25	Difficulty Estimation and Simplification of French Text Using LLMs	Henri Jamet et.al.	2407.18061v1	null
2024-07-25	I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition	Yannis Vasilakis et.al.	2407.18058v1	link
2024-07-25	Amortized Active Learning for Nonparametric Functions	Cen-You Li et.al.	2407.17992v1	null
2024-07-25	BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation	Xiang Zhang et.al.	2407.17952v1	null
2024-07-25	DAM: Towards A Foundation Model for Time Series Forecasting	Luke Darlow et.al.	2407.17880v1	null
2024-07-25	Exploring Description-Augmented Dataless Intent Classification	Ruoyu Hu et.al.	2407.17862v1	link
2024-07-25	Scaling A Simple Approach to Zero-Shot Speech Recognition	Jinming Zhao et.al.	2407.17852v1	link
2024-07-24	Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning	Hongwei Jin et.al.	2407.17545v1	link
2024-07-24	3D Question Answering for City Scene Understanding	Penglei Sun et.al.	2407.17398v1	null
2024-07-24	Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition	Ke Bao et.al.	2407.17344v1	null
2024-07-24	Multi-label Cluster Discrimination for Visual Representation Learning	Xiang An et.al.	2407.17331v1	link
2024-07-24	DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture	Akshaya Athwale et.al.	2407.17328v1	null
2024-07-24	Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification?	Johannes Kiechle et.al.	2407.17219v1	link
2024-07-24	Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model	Jan Lehečka et.al.	2407.17167v1	null
2024-07-23	PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer	Samhita Marri et.al.	2407.16829v1	null
2024-07-23	Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition	Abhi Kamboj et.al.	2407.16803v1	null
2024-07-23	Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions	Kai Liu et.al.	2407.16725v1	link
2024-07-23	Lawma: The Power of Specialization for Legal Tasks	Ricardo Dominguez-Olmedo et.al.	2407.16615v1	null
2024-07-23	Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning	Xinwei Liu et.al.	2407.16307v1	link
2024-07-23	PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment	Jiahuan Li et.al.	2407.16222v1	link
2024-07-23	No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation	Shuai Chen et.al.	2407.16182v1	null
2024-07-23	Improved Few-Shot Image Classification Through Multiple-Choice Questions	Dipika Khullar et.al.	2407.16145v1	null
2024-07-22	Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models	Raza Imam et.al.	2407.15913v1	link
2024-07-22	AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description	Junyu Xie et.al.	2407.15850v1	link
2024-07-22	Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Vikash Sehwag et.al.	2407.15811v1	null
2024-07-22	AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection	Yunkang Cao et.al.	2407.15795v1	link
2024-07-22	CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning	Emanuele Frascaroli et.al.	2407.15793v1	link
2024-07-22	Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders	Laura Niss et.al.	2407.15731v1	null
2024-07-23	Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition	Jinfu Liu et.al.	2407.15706v2	link
2024-07-22	SLVideo: A Sign Language Video Moment Retrieval Framework	Gonçalo Vinagre Martins et.al.	2407.15668v1	null
2024-07-23	Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning	Xiangyan Qu et.al.	2407.15613v2	link
2024-07-22	High-flexibility reconstruction of small-scale motions in wall turbulence using a generalized zero-shot learning	Haokai Wu et.al.	2407.15604v1	null
2024-07-22	X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images	Yunpeng Wang et.al.	2407.15356v1	link
2024-07-19	Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models	Xuenan Xu et.al.	2407.14355v1	link
2024-07-19	Multimodal Misinformation Detection using Large Vision-Language Models	Sahar Tahmasebi et.al.	2407.14321v1	null
2024-07-19	Foundation Models for Autonomous Robots in Unstructured Environments	Hossein Naderi et.al.	2407.14296v1	null
2024-07-19	OpenSU3D: Open World 3D Scene Understanding using Foundation Models	Rafay Mohiuddin et.al.	2407.14279v1	null
2024-07-19	ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation	Qing Xu et.al.	2407.14153v1	link
2024-07-19	Zero-Shot Underwater Gesture Recognition	Sandipan Sarma et.al.	2407.14103v1	link
2024-07-19	Multi-modal Relation Distillation for Unified 3D Representation Learning	Huiqun Wang et.al.	2407.14007v1	null
2024-07-19	Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models	Quan Li et.al.	2407.13989v1	null
2024-07-18	Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning	Ans Munir et.al.	2407.13715v1	link
2024-07-18	MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis	Ziming Zhong et.al.	2407.13675v1	link
2024-07-18	Robust Calibration of Large Vision-Language Adapters	Balamurali Murugesan et.al.	2407.13588v1	link
2024-07-18	Towards Zero-Shot Multimodal Machine Translation	Matthieu Futeral et.al.	2407.13579v1	link
2024-07-18	Pushing the Limits of Reactive Planning: Learning to Escape Local Minima	Isar Meijer et.al.	2407.13530v1	null
2024-07-18	INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages	Abhishek Kumar Singh et.al.	2407.13522v1	null
2024-07-18	Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks	Samy Ateia et.al.	2407.13511v1	link
2024-07-18	SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders	Sheng-Wei Li et.al.	2407.13460v1	link
2024-07-18	BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models	Moon Ye-Bin et.al.	2407.13442v1	null
2024-07-18	Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols	Gertjan Burghouts et.al.	2407.13382v1	null
2024-07-17	Zero-shot Text-guided Infinite Image Synthesis with LLM guidance	Soyeong Kwon et.al.	2407.12642v1	null
2024-07-17	Evaluating the transferability potential of deep learning models for climate downscaling	Ayush Prasad et.al.	2407.12517v1	null
2024-07-17	Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning	Mustafa Dogan et.al.	2407.12498v1	null
2024-07-17	TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish	Arda Yüksel et.al.	2407.12402v1	link
2024-07-17	Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection	Zhenni Yu et.al.	2407.12339v1	link
2024-07-17	ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map	Yilin Ye et.al.	2407.12315v1	link
2024-07-17	VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation	Zhen Qu et.al.	2407.12276v1	link
2024-07-17	Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge	Xuxiong Liu et.al.	2407.12257v1	null
2024-07-17	Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech	Haibin Wu et.al.	2407.12229v1	link
2024-07-16	Scaling Sign Language Translation	Biao Zhang et.al.	2407.11855v1	null
2024-07-16	Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection	Gaetan Lopez Latouche et.al.	2407.11854v1	null
2024-07-16	Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model	Dominik Winter et.al.	2407.11664v1	null
2024-07-16	A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting	He Chang et.al.	2407.11638v1	null
2024-07-16	DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training	Guillermo Jimenez-Perez et.al.	2407.11594v1	null
2024-07-16	Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval	Yubao Tang et.al.	2407.11504v1	null
2024-07-16	Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes	Zhi Cai et.al.	2407.11464v1	link
2024-07-16	InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains	Yinzhu Quan et.al.	2407.11384v1	link
2024-07-16	Large Vision-Language Models as Emotion Recognizers in Context Awareness	Yuxuan Lei et.al.	2407.11300v1	null
2024-07-16	Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems	Yaşar Utku Alçalar et.al.	2407.11288v1	null
2024-07-15	Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation	Friedhelm Hamann et.al.	2407.10802v1	link
2024-07-15	Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education	Rui Yang et.al.	2407.10794v1	link
2024-07-15	Codebook LLMs: Adapting Political Science Codebooks for LLM Use and Adapting LLMs to Follow Codebooks	Andrew Halterman et.al.	2407.10747v1	null
2024-07-15	Anticipating Future Object Compositions without Forgetting	Youssef Zahran et.al.	2407.10723v1	null
2024-07-16	Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning	Yulong Wang et.al.	2407.10718v2	link
2024-07-15	$\texttt{MixGR}$ : Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity	Fengyu Cai et.al.	2407.10691v1	link
2024-07-15	OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer	Yu Wang et.al.	2407.10655v1	link
2024-07-16	Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics	Yuang Zhang et.al.	2407.10648v2	null
2024-07-15	Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control	Yu-Hua Chen et.al.	2407.10646v1	null
2024-07-15	Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection	Barah Fazili et.al.	2407.10582v1	link
2024-07-12	Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting	Jinning Li et.al.	2407.09475v1	null
2024-07-12	From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation	Hanrong Shi et.al.	2407.09191v1	null
2024-07-12	STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs	Yiheng Huang et.al.	2407.09096v1	null
2024-07-12	OVExp: Open Vocabulary Exploration for Object-Oriented Navigation	Meng Wei et.al.	2407.09016v1	null
2024-07-15	Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation	Biqing Qi et.al.	2407.08940v2	link
2024-07-11	DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement	Benjamin A. Newman et.al.	2407.08876v1	null
2024-07-11	Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification	Wenshuo Peng et.al.	2407.08787v1	null
2024-07-11	Real-Time Anomaly Detection and Reactive Planning with Large Language Models	Rohan Sinha et.al.	2407.08735v1	null
2024-07-11	Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data	Cherie Ho et.al.	2407.08726v1	null
2024-07-11	HACMan++: Spatially-Grounded Motion Primitives for Manipulation	Bowen Jiang et.al.	2407.08585v1	null
2024-07-11	Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models	Ying Zhang et.al.	2407.08532v1	null
2024-07-11	Emergent Visual-Semantic Hierarchies in Image-Text Representations	Morris Alper et.al.	2407.08521v1	link
2024-07-11	Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization	Jinlong Li et.al.	2407.08374v1	null
2024-07-11	Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation	Tong Shao et.al.	2407.08268v1	link
2024-07-11	Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling	Noam Elata et.al.	2407.08256v1	null
2024-07-11	Leveraging LLMs to Predict Affective States via Smartphone Sensor Features	Tianyi Zhang et.al.	2407.08240v1	null
2024-07-11	Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning	Wenrui Li et.al.	2407.08130v1	null
2024-07-10	Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing	Jessica Yin et.al.	2407.07885v1	null
2024-07-11	Toto: Time Series Optimized Transformer for Observability	Ben Cohen et.al.	2407.07874v2	null
2024-07-10	OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion	Hao Wang et.al.	2407.07844v1	link
2024-07-10	Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR	Nandan Thakur et.al.	2407.07790v1	link
2024-07-11	SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis	Zihao Wang et.al.	2407.07728v2	link
2024-07-10	Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data	Motoshige Sato et.al.	2407.07595v1	null
2024-07-10	Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction	Yili Liu et.al.	2407.07587v1	null
2024-07-11	InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior	Chenguo Lin et.al.	2407.07580v2	null
2024-07-10	Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search	Kirill Paramonov et.al.	2407.07541v1	link
2024-07-10	IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection	Mingjin Zhang et.al.	2407.07520v1	link
2024-07-09	Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning	J. Crosbie et.al.	2407.07011v1	null
2024-07-09	Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning	Mayank Singh et.al.	2407.06893v1	null
2024-07-09	Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Rui Qian et.al.	2407.06871v1	null
2024-07-09	PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations	Zhanhong Ye et.al.	2407.06664v1	null
2024-07-09	Variational Zero-shot Multispectral Pansharpening	Xiangyu Rui et.al.	2407.06633v1	link
2024-07-09	CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding	Wenhao Xu et.al.	2407.06611v1	null
2024-07-09	VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving	Yibo Liu et.al.	2407.06516v1	null
2024-07-08	CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings	Anthony Varkey et.al.	2407.06360v1	link
2024-07-08	CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation	Xinying Guo et.al.	2407.06188v1	null
2024-07-08	C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition	Rongchang Li et.al.	2407.06113v1	link
2024-07-08	Pseudo-triplet Guided Few-shot Composed Image Retrieval	Bohan Hou et.al.	2407.06001v1	null
2024-07-08	Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation	Jiaqi Chen et.al.	2407.05890v1	null
2024-07-08	HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels	Yingying Jiang et.al.	2407.05795v1	null
2024-07-08	When is the consistent prediction likely to be a correct prediction?	Alex Nguyen et.al.	2407.05778v1	null
2024-07-08	Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition	Seungju Kim et.al.	2407.05733v1	null
2024-07-08	Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification	Jiaying Shi et.al.	2407.05647v1	null
2024-07-08	GenFollower: Enhancing Car-Following Prediction with Large Language Models	Xianda Chen et.al.	2407.05611v1	null
2024-07-08	Open-world Multi-label Text Classification with Extremely Weak Supervision	Xintong Li et.al.	2407.05609v1	link
2024-07-05	LaRa: Efficient Large-Baseline Radiance Fields	Anpei Chen et.al.	2407.04699v1	null
2024-07-05	ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models	Yuzhe Gu et.al.	2407.04693v1	link
2024-07-05	RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation	Yuxuan Kuang et.al.	2407.04689v1	link
2024-07-05	Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework	Reza Averly et.al.	2407.04629v1	null
2024-07-05	AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation	Yuhan Zhu et.al.	2407.04603v1	link
2024-07-05	GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning	Aleksander Ficek et.al.	2407.04528v1	null
2024-07-05	AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents	Petr Anokhin et.al.	2407.04363v1	link
2024-07-05	Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning	Mainak Singha et.al.	2407.04207v1	link
2024-07-04	Query-Guided Self-Supervised Summarization of Nursing Notes	Ya Gao et.al.	2407.04125v1	null
2024-07-04	FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs	Tongyi SpeechTeam et.al.	2407.04051v1	link
2024-07-03	Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation	Marco Mistretta et.al.	2407.03056v1	link
2024-07-03	SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning	Bac Nguyen et.al.	2407.03036v1	null
2024-07-03	FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering	Xiaochen Wang et.al.	2407.02964v1	null
2024-07-03	LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation	Hongke Zhao et.al.	2407.02833v1	null
2024-07-03	ZEAL: Surgical Skill Assessment with Zero-shot Tool Inference Using Unified Foundation Model	Satoshi Kondo et.al.	2407.02738v1	null
2024-07-02	LLM-Select: Feature Selection with Large Language Models	Daniel P. Jeong et.al.	2407.02694v1	null
2024-07-02	Open Panoramic Segmentation	Junwei Zheng et.al.	2407.02685v1	link
2024-07-02	Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images	Furqan Shaukat et.al.	2407.02625v1	null
2024-07-02	Open Scene Graphs for Open World Object-Goal Navigation	Joel Loo et.al.	2407.02473v1	null
2024-07-02	SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation	Sayan Nag et.al.	2407.02389v1	null
2024-07-02	Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts	Chunlan Ma et.al.	2407.02320v1	null
2024-07-02	Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization	Yuchen Hu et.al.	2407.02243v1	null
2024-07-02	FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs	Haodong Chen et.al.	2407.02157v1	null
2024-07-02	Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model	Cong Cao et.al.	2407.01960v1	null
2024-07-02	Text-Aware Diffusion for Policy Learning	Calvin Luo et.al.	2407.01903v1	null
2024-07-01	DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models	Chang-Han Yeh et.al.	2407.01519v1	link
2024-07-01	Semantic Compositions Enhance Vision-Language Contrastive Learning	Maxwell Aladago et.al.	2407.01408v1	null
2024-07-01	PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction	Xuan Yu et.al.	2407.01349v1	null
2024-06-28	STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical	Guohao Sun et.al.	2406.19973v1	link
2024-06-28	Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies	Pingcheng Jian et.al.	2406.19971v1	null
2024-06-28	Untangling the Unrestricted Web: Automatic Identification of Multilingual Registers	Erik Henriksson et.al.	2406.19892v1	link
2024-06-28	Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood	Yang Xu et.al.	2406.19874v1	link
2024-06-27	Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations	Ritam Dutt et.al.	2406.19545v1	link
2024-06-27	The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models	Xiliang Zhu et.al.	2406.19358v1	null
2024-06-27	IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language	Lucky Susanto et.al.	2406.19349v1	null
2024-06-27	Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment	Hao Fei et.al.	2406.19255v1	null
2024-06-30	Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO	Fuseini Mumuni et.al.	2406.19057v2	null
2024-06-27	Zero-shot domain adaptation based on dual-level mix and contrast	Yu Zhe et.al.	2406.18996v1	null
2024-06-28	Manipulate-Anything: Automating Real-World Robots using Vision-Language Models	Jiafei Duan et.al.	2406.18915v2	null
2024-06-27	DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment	Ke-Han Lu et.al.	2406.18871v1	null
2024-06-27	Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models	Yicheng Xu et.al.	2406.18868v1	link
2024-06-27	Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach	Yuxiang Huang et.al.	2406.18837v1	null
2024-06-27	Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs	Huaying Zhang et.al.	2406.18836v1	null
2024-06-26	Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation	Ahmed Njifenjou et.al.	2406.18460v1	null
2024-06-26	Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets	Simon Münker et.al.	2406.18239v1	null
2024-06-26	Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps	Dicong Qiu et.al.	2406.18115v1	null
2024-06-26	Boosting Soft Q-Learning by Bounding	Jacob Adamczyk et.al.	2406.18033v1	link
2024-06-26	E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS	Sefik Emre Eskimez et.al.	2406.18009v1	link
2024-06-26	Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model	Zhuo Zheng et.al.	2406.17998v1	link
2024-06-25	Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts	Xuyang Wu et.al.	2406.17974v1	link
2024-06-25	Efficient Document Ranking with Learnable Late Interactions	Ziwei Ji et.al.	2406.17968v1	null
2024-06-25	The Overcooked Generalisation Challenge	Constantin Ruhdorfer et.al.	2406.17949v1	null
2024-06-25	CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design	Nafis Neehal et.al.	2406.17888v1	link
2024-06-25	Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity	Chih-Hsuan Yang et.al.	2406.17720v1	link
2024-06-25	LaTable: Towards Large Tabular Models	Boris van Breugel et.al.	2406.17673v1	null
2024-06-26	SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond	Marco Comunità et.al.	2406.17672v2	null
2024-06-26	Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP	Sedigheh Eslami et.al.	2406.17639v2	link
2024-06-25	Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images	Boyu Chen et.al.	2406.17577v1	link
2024-06-25	High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model	Joun Yeop Lee et.al.	2406.17310v1	null
2024-06-25	Zero-Shot Long-Form Video Understanding through Screenplay	Yongliang Wu et.al.	2406.17309v1	null
2024-06-24	CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation	Abe Bohan Hou et.al.	2406.17186v1	link
2024-06-24	Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models	Nisarg Patel et.al.	2406.17169v1	link
2024-06-24	Vastextures: Vast repository of textures and PBR materials extracted from real-world images using unsupervised methods	Sagi Eppel et.al.	2406.17146v1	null
2024-06-24	Can Quantum Computers Do Nothing?	Alexander Nico-Katz et.al.	2406.16861v1	null
2024-06-24	USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations	Mounika Marreddy et.al.	2406.16833v1	null
2024-06-25	Towards Zero-Shot Text-To-Speech for Arabic Dialects	Khai Duy Doan et.al.	2406.16751v2	null
2024-06-24	Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings	Andrea Posada et.al.	2406.16611v1	link
2024-06-24	eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure	Hoorieh Sabzevari et.al.	2406.16490v1	link
2024-06-24	UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding	Dongyang Li et.al.	2406.16372v1	link
2024-06-24	EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records	Yeonsu Kwon et.al.	2406.16341v1	link
2024-06-24	DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task	Wenhan Liu et.al.	2406.16332v1	link
2024-06-24	Anomaly Detection of Tabular Data Using LLMs	Aodong Li et.al.	2406.16308v1	null
2024-06-24	LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments	Zixia Jia et.al.	2406.16294v1	link
2024-06-21	Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild	Nadav Orzech et.al.	2406.15331v1	null
2024-06-21	LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs	Ziyan Jiang et.al.	2406.15319v1	null
2024-06-21	Retrieval Augmented Zero-Shot Text Classification	Tassallah Abdullahi et.al.	2406.15241v1	link
2024-06-21	A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation	Irune Zubiaga et.al.	2406.15227v1	link
2024-06-21	How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy?	Subhankar Maity et.al.	2406.15211v1	null
2024-06-21	Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding	Mohan Li et.al.	2406.15209v1	null
2024-06-21	Latent Space Translation via Inverse Relative Projection	Valentino Maiorca et.al.	2406.15057v1	null
2024-06-21	Behaviour Distillation	Andrei Lupu et.al.	2406.15042v1	link
2024-06-21	Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning	Suyi Li et.al.	2406.14962v1	link
2024-06-21	Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video	Zhengbang Yang et.al.	2406.14877v1	null
2024-06-20	Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps	Nikita Starodubcev et.al.	2406.14539v1	null
2024-06-20	APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking	Can Jin et.al.	2406.14449v1	null
2024-06-20	Transferable Boltzmann Generators	Leon Klein et.al.	2406.14426v1	null
2024-06-20	Zero-Shot Image Denoising for High-Resolution Electron Microscopy	Xuanyu Tian et.al.	2406.14264v1	link
2024-06-20	SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots	Weixing Wang et.al.	2406.14208v1	null
2024-06-20	A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning	Panagiotis Kaliosis et.al.	2406.14164v1	link
2024-06-20	One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging	Linhan Yang et.al.	2406.14136v1	null
2024-06-20	An Investigation of Prompt Variations for Zero-shot LLM-based Rankers	Shuoqi Sun et.al.	2406.14117v1	link
2024-06-20	Understanding Different Design Choices in Training Large Time Series Models	Yu-Neng Chuang et.al.	2406.14045v1	null
2024-06-20	Taxonomy-Guided Zero-Shot Recommendations with LLMs	Yueqing Liang et.al.	2406.14043v1	link
2024-06-18	Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation	Ning-Hsu Wang et.al.	2406.12849v1	null
2024-06-18	Generating Educational Materials with Different Levels of Readability using LLMs	Chieh-Yang Huang et.al.	2406.12787v1	null
2024-06-18	MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning	Shuo Xu et.al.	2406.12757v1	null
2024-06-19	Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA	Miaoyu Li et.al.	2406.12746v2	link
2024-06-18	Large Language Model as a Universal Clinical Multi-task Decoder	Yujiang Wu et.al.	2406.12738v1	null
2024-06-18	BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity	Zahra Gharaee et.al.	2406.12723v1	link
2024-06-18	GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models	Yongtao Ge et.al.	2406.12671v1	link
2024-06-18	Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model	Jiang-Xin Shi et.al.	2406.12638v1	link
2024-06-18	News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation	Andreea Iana et.al.	2406.12634v1	link
2024-06-18	SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation	Yixia Li et.al.	2406.12629v1	link
2024-06-17	Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity	Bingxiang He et.al.	2406.11721v1	link
2024-06-17	TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy	Yiqun Chen et.al.	2406.11678v1	link
2024-06-17	A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4	Ming Gu et.al.	2406.11651v1	link
2024-06-17	AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection	Lingjie Kong et.al.	2406.11643v1	link
2024-06-17	Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better!	Mingyang Song et.al.	2406.11629v1	link
2024-06-17	Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency	Vasiliki Kougia et.al.	2406.11486v1	link
2024-06-17	How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment	Heyan Huang et.al.	2406.11474v1	null
2024-06-17	Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction	Shilong Li et.al.	2406.11429v1	link
2024-06-17	DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer	Keon Lee et.al.	2406.11427v1	null
2024-06-17	BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM	Zhewen Shen et.al.	2406.11418v1	null
2024-06-14	Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation	Nameer Hirschkind et.al.	2406.10223v1	null
2024-06-14	Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models	Carson Denison et.al.	2406.10162v1	link
2024-06-14	Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition	Guinan Li et.al.	2406.10152v1	null
2024-06-14	Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection	Mehar Khurana et.al.	2406.10115v1	link
2024-06-14	dGrasp: NeRF-Informed Implicit Grasp Policies with Supervised Optimization Slopes	Gergely Sóti et.al.	2406.09939v1	null
2024-06-14	POWN: Prototypical Open-World Node Classification	Marcel Hoffmann et.al.	2406.09926v1	link
2024-06-14	CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions	Mingyu Derek Ma et.al.	2406.09923v1	link
2024-06-14	Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy	Linhan Ma et.al.	2406.09844v1	null
2024-06-14	Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting	Ce Hao et.al.	2406.09767v1	null
2024-06-14	Learning Language Structures through Grounding	Freda Shi et.al.	2406.09662v1	null
2024-06-13	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Muhammad Maaz et.al.	2406.09418v1	link
2024-06-13	Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition	Youngtaek Oh et.al.	2406.09388v1	link
2024-06-13	Scale-Invariant Monocular Depth Estimation via SSI Depth	S. Mahdi H. Miangoleh et.al.	2406.09374v1	null
2024-06-13	**Learning from Nat

Name		Name	Last commit message	Last commit date
Latest commit History 1,207 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
README.md		README.md
config.yaml		config.yaml
daily_arxiv.py		daily_arxiv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updated on 2025.01.06

6D Pose

Point Cloud Registration

Point Cloud Segmentation

Zero-shot

About

Releases

Packages

Languages

Jianqiuer/Awesome6DPoseEstimation

Folders and files

Latest commit

History

Repository files navigation

Updated on 2025.01.06

6D Pose

Point Cloud Registration

Point Cloud Segmentation

Zero-shot

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages