Skip to content

Jianqiuer/Awesome6DPoseEstimation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updated on 2025.01.06

Table of Contents
  1. 6D Pose
  2. Point Cloud Registration
  3. Point Cloud Segmentation
  4. Zero-shot

6D Pose

Publish Date Title Authors PDF Code
2025-01-03 Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions Xincheng Shuai et.al. 2501.01425v2 null
2025-01-02 On Unifying Video Generation and Camera Pose Estimation Chun-Hao Paul Huang et.al. 2501.01409v1 null
2025-01-02 L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild Soumyaratna Debnath et.al. 2501.01174v1 null
2024-12-31 Relative Pose Observability Analysis Using Dual Quaternions Nicholas B. Andrews et.al. 2501.00657v1 null
2024-12-31 VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception Zhaoliang Wan et.al. 2501.00510v1 null
2024-12-30 Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields Evgenii Kruzhkov et.al. 2412.20976v1 null
2024-12-30 ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning Hrishikesh Gupta et.al. 2412.20830v1 link
2024-12-30 Frequency-aware Event Cloud Network Hongwei Ren et.al. 2412.20803v1 null
2024-12-30 KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences Keng-Wei Chang et.al. 2412.20767v1 null
2024-12-30 Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study Boris Bačić et.al. 2412.20733v1 null
2024-12-29 Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation Qucheng Peng et.al. 2412.20538v1 link
2024-12-28 MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing Shuo Wang et.al. 2412.20082v1 null
2024-12-28 GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting Atticus J. Zeller et.al. 2412.20056v1 link
2024-12-27 Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation Guangsheng Xu et.al. 2412.19676v1 link
2024-12-27 Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images Xudong Cai et.al. 2412.19518v1 null
2024-12-26 Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos Changwoon Choi et.al. 2412.19089v1 null
2024-12-23 Reconstructing People, Places, and Cameras Lea Müller et.al. 2412.17806v1 null
2024-12-22 Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry Zhaoxing Zhang et.al. 2412.16923v1 null
2024-12-21 EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose Yung-Hong Sun et.al. 2412.16742v1 null
2024-12-21 FACTS: Fine-Grained Action Classification for Tactical Sports Christopher Lai et.al. 2412.16454v1 null
2024-12-20 Can Generative Video Models Help Pose Estimation? Ruojin Cai et.al. 2412.16155v1 null
2024-12-20 Monkey Transfer Learning Can Improve Human Pose Estimation Bradley Scott et.al. 2412.15966v1 null
2024-12-19 Scaling 4D Representations João Carreira et.al. 2412.15212v1 null
2024-12-13 IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education Roberto Daza et.al. 2412.14195v1 link
2024-12-18 Level-Set Parameters: Novel Representation for 3D Shape Analysis Huan Lei et.al. 2412.13502v1 null
2024-12-18 Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation Xiaoqi An et.al. 2412.13454v1 null
2024-12-17 CondiMen: Conditional Multi-Person Mesh Recovery Brégier Romain et.al. 2412.13058v1 null
2024-12-17 ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries Wangyu Xue et.al. 2412.12675v1 null
2024-12-16 Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion Adam Bethell et.al. 2412.11420v1 null
2024-12-13 ExeChecker: Where Did I Go Wrong? Yiwen Gu et.al. 2412.10573v1 null
2024-12-11 CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty Harry Zhang et.al. 2412.10431v1 null
2024-12-13 RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting Lizhi Bai et.al. 2412.09868v1 null
2024-12-12 Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos Linyi Jin et.al. 2412.09621v1 null
2024-12-12 FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction Jiale Xu et.al. 2412.09573v1 null
2024-12-11 BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation Shengze Wang et.al. 2412.08640v1 null
2024-12-12 Drift-free Visual SLAM using Digital Twins Roxane Merat et.al. 2412.08496v2 null
2024-12-11 Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization Siyan Dong et.al. 2412.08376v1 link
2024-12-10 LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models Ziqi Lu et.al. 2412.07746v1 null
2024-12-09 MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds Zhenggang Tang et.al. 2412.06974v1 null
2024-12-09 An Efficient Scene Coordinate Encoding and Relocalization Method Kuan Xu et.al. 2412.06488v1 link
2024-12-09 Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation Marsha Mariya Kappan et.al. 2412.06227v1 null
2024-12-06 CCS: Continuous Learning for Customized Incremental Wireless Sensing Services Qunhang Fu et.al. 2412.04821v1 null
2024-12-05 ProPLIKS: Probablistic 3D human body pose estimation Karthik Shetty et.al. 2412.04665v1 null
2024-12-05 DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction Ben Kaye et.al. 2412.04464v1 null
2024-12-05 Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation Alan Li et.al. 2412.04279v1 null
2024-12-04 Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis Qitao Zhao et.al. 2412.03570v1 null
2024-12-06 NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Lingen Li et.al. 2412.03517v2 null
2024-12-05 A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks Proma Hossain Progga et.al. 2412.03498v2 null
2024-12-04 MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras Huai Yu et.al. 2412.03146v1 link
2024-12-04 An indoor DSO-based ceiling-vision odometry system for indoor industrial environments Abdelhak Bougouffa et.al. 2412.02950v1 null
2024-12-03 EgoCast: Forecasting Egocentric Human Pose in the Wild Maria Escobar et.al. 2412.02903v1 null
2024-12-02 emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation Sasha Salter et.al. 2412.02725v1 link
2024-12-03 ProbPose: A Probabilistic Approach to 2D Human Pose Estimation Miroslav Purkrabek et.al. 2412.02254v1 null
2024-12-03 Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images Xiangyong Lu et.al. 2412.02197v1 link
2024-12-03 CLERF: Contrastive LEaRning for Full Range Head Pose Estimation Ting-Ruen Wei et.al. 2412.02066v1 null
2024-12-02 Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle Miroslav Purkrabek et.al. 2412.01562v1 link
2024-12-02 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting Yufeng Jin et.al. 2412.01543v1 null
2024-12-02 HandOS: 3D Hand Reconstruction in One Stage Xingyu Chen et.al. 2412.01537v1 null
2024-12-02 SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames Yuxuan Zhou et.al. 2412.01500v1 link
2024-12-02 MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection Yonghao Dang et.al. 2412.01422v1 null
2024-12-02 Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures Qiyuan Shen et.al. 2412.01299v1 null
2024-12-02 CRISP: Object Pose and Shape Estimation with Test-Time Adaptation Jingnan Shi et.al. 2412.01052v1 null
2024-11-29 Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling Qirui Wu et.al. 2411.19492v1 null
2024-11-29 Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning Yang You et.al. 2411.19458v1 null
2024-11-28 GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model Rui Zhou et.al. 2411.19289v1 null
2024-11-28 HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos Prithviraj Banerjee et.al. 2411.19167v1 null
2024-11-28 Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations Tjark Behrens et.al. 2411.19162v1 link
2024-11-28 Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation Mathias Hudoba de Badyn et.al. 2411.19033v1 null
2024-11-28 Waterfall Transformer for Multi-person Pose Estimation Navin Ranjan et.al. 2411.18944v1 null
2024-12-02 AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers Sherwin Bahmani et.al. 2411.18673v2 null
2024-11-27 XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration Denys Rozumnyi et.al. 2411.18377v1 null
2024-11-27 Manual-PA: Learning 3D Part Assembly from Instruction Diagrams Jiahao Zhang et.al. 2411.18011v1 null
2024-11-26 Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors Ziang Xu et.al. 2411.17790v1 null
2024-11-26 Geometric Point Attention Transformer for 3D Shape Reassembly Jiahan Li et.al. 2411.17788v1 null
2024-11-26 RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training Raktim Gautam Goswami et.al. 2411.17662v1 null
2024-11-26 Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles Susu Fang et.al. 2411.17432v1 null
2024-11-26 Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration Junyuan Deng et.al. 2411.17240v1 link
2024-11-28 SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting Gyeongjin Kang et.al. 2411.17190v3 null
2024-11-26 GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation Xin Liu et.al. 2411.17174v1 null
2024-11-25 Diffusion Features for Zero-Shot 6DoF Object Pose Estimation Bernd Von Gimborn et.al. 2411.16668v1 null
2024-11-25 Edge Weight Prediction For Category-Agnostic Pose Estimation Or Hirschorn et.al. 2411.16665v1 link
2024-11-25 SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis Hyojun Go et.al. 2411.16443v1 link
2024-11-25 One Diffusion to Generate Them All Duong H. Le et.al. 2411.16318v1 link
2024-11-25 UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image Xingyu Liu et.al. 2411.16106v1 null
2024-11-24 Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching Yujing Sun et.al. 2411.15860v1 link
2024-11-24 PEnG: Pose-Enhanced Geo-Localisation Tavis Shore et.al. 2411.15742v1 null
2024-11-22 Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications Changseob Song et.al. 2411.15366v1 null
2024-11-22 Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation Huy Le et.al. 2411.14913v1 null
2024-11-22 mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect Shuting Hu et.al. 2411.14656v1 null
2024-11-21 DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Tianhe Ren et.al. 2411.14347v1 link
2024-11-21 SEMPose: A Single End-to-end Network for Multi-object Pose Estimation Xin Liu et.al. 2411.14002v1 null
2024-11-21 Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain Vidya Sudevan et.al. 2411.13988v1 null
2024-11-21 Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework Vidya Sudevan et.al. 2411.13962v1 null
2024-11-20 Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation Rahm Ranjan et.al. 2411.13716v1 null
2024-11-20 Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction Yi Gu et.al. 2411.13620v1 null
2024-11-19 VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference Seong Jong Yoo et.al. 2411.13607v1 link
2024-11-20 DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild Weicai Ye et.al. 2411.13291v1 null
2024-11-20 X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation Yuchen Yang et.al. 2411.13026v1 link
2024-11-19 IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose Fei Ren et.al. 2411.12676v1 null
2024-11-15 SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction Yutao Tang et.al. 2411.12592v1 link
2024-11-19 GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping Teli Ma et.al. 2411.12286v1 null
2024-11-18 IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos Yunong Liu et.al. 2411.11409v1 link
2024-11-15 USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting Kang Chen et.al. 2411.10504v1 link
2024-11-13 ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening Hojun Jang et.al. 2411.09435v1 null
2024-11-13 Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis Dominik Borer et.al. 2411.08603v1 null
2024-11-13 DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization Yueming Xu et.al. 2411.08373v1 null
2024-11-16 RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation Shuocheng Yang et.al. 2411.07699v2 link
2024-11-12 Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction Rotem Atari et.al. 2411.07644v1 null
2024-11-12 Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions Shuwei Xing et.al. 2411.07495v1 null
2024-11-08 Acoustic-based 3D Human Pose Estimation Robust to Human Position Yusuke Oumi et.al. 2411.07165v1 null
2024-11-11 CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models Junho Kim et.al. 2411.06869v1 null
2024-11-11 GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting Daehan Lee et.al. 2411.06766v1 link
2024-11-11 GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction Shizhe Yuan et.al. 2411.06725v1 null
2024-11-10 Magnetic Field Aided Vehicle Localization with Acceleration Correction Mrunmayee Deshpande et.al. 2411.06543v1 null
2024-11-10 Visuotactile-Based Learning for Insertion with Compliant Hands Osher Azulay et.al. 2411.06408v1 link
2024-11-08 Poze: Sports Technique Feedback under Data Constraints Agamdeep Singh et.al. 2411.05734v1 null
2024-11-08 DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions Rafael Berral-Soler et.al. 2411.05552v1 link
2024-11-08 Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map Chanuk Yang et.al. 2411.05497v1 null
2024-11-08 Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements Kunrui Ze et.al. 2411.05481v1 null
2024-11-07 Social EgoMesh Estimation Luca Scofano et.al. 2411.04598v1 link
2024-11-07 Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory Ali K. AlShami et.al. 2411.04501v1 null
2024-11-08 SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation Xun Tu et.al. 2411.04386v2 null
2024-11-08 GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting Jilan Mei et.al. 2411.03807v3 null
2024-11-06 Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage Claus D. Hansen et.al. 2411.03724v1 null
2024-11-05 Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data Seunggeun Chi et.al. 2411.03561v1 null
2024-11-05 HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features Arnab Dey et.al. 2411.03086v1 null
2024-11-04 Semantic Masking and Visual Feature Matching for Robust Localization Luisa Mao et.al. 2411.01804v1 null
2024-11-03 Activating Self-Attention for Multi-Scene Absolute Pose Regression Miso Lee et.al. 2411.01443v1 link
2024-11-04 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction Jongmin Lee et.al. 2411.00543v2 null
2024-10-31 Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis Brody McNutt et.al. 2411.00196v1 null
2024-10-31 No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images Botao Ye et.al. 2410.24207v1 link
2024-11-06 SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation Aditya Agarwal et.al. 2410.23643v2 null
2024-10-30 SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark HyunJun Jung et.al. 2410.22715v1 null
2024-10-29 LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues Hanqing Jiang et.al. 2410.22213v1 null
2024-10-29 PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting Sunghwan Hong et.al. 2410.22128v1 link
2024-10-29 HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation Zhoujie Xu et.al. 2410.22079v1 null
2024-10-29 EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data Zhonghua Yi et.al. 2410.21743v1 link
2024-10-28 Synthetica: Large Scale Synthetic Data for Robot Perception Ritvik Singh et.al. 2410.21153v1 null
2024-10-29 BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment Chih-Hsiang Hsu et.al. 2410.20731v2 link
2024-11-01 RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior Mingjiang Liang et.al. 2410.20358v2 null
2024-10-27 Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions Rawal Khirodkar et.al. 2410.20294v1 null
2024-10-26 Neural Fields in Robotics: A Survey Muhammad Zubair Irshad et.al. 2410.20220v1 link
2024-10-25 DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems Muhammad Zaeem Shahzad et.al. 2410.19336v1 null
2024-10-24 Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction Junyi Chen et.al. 2410.18962v1 null
2024-10-24 VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation Daniel Bermuth et.al. 2410.18723v1 link
2024-10-23 Robust Two-View Geometry Estimation with Implicit Differentiation Vladislav Pyatov et.al. 2410.17983v1 link
2024-10-23 YOLOv11: An Overview of the Key Architectural Enhancements Rahima Khanam et.al. 2410.17725v1 link
2024-10-21 Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers Andrea Berra et.al. 2410.15802v1 null
2024-10-21 ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos Tao Tang et.al. 2410.15582v1 link
2024-10-20 Neural Active Structure-from-Motion in Dark and Textureless Environment Kazuto Ichimaru et.al. 2410.15378v1 null
2024-10-20 POSE: Pose estimation Of virtual Sync Exhibit system Hao-Tang Tsui et.al. 2410.15343v1 link
2024-10-18 Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing Jianping Li et.al. 2410.14565v1 null
2024-10-18 Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior Calvin-Khang Ta et.al. 2410.14540v1 null
2024-10-18 Sim2real Cattle Joint Estimation in 3D point clouds Okour Mohammad et.al. 2410.14419v1 null
2024-10-18 Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping Renguang Chen et.al. 2410.14161v1 null
2024-10-15 From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images unyang Wu et.al. 2410.13896v1 null
2024-10-17 DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions Edison P. Velasco-Sánchez et.al. 2410.13541v1 null
2024-10-17 Object Pose Estimation Using Implicit Representation For Transparent Objects Varun Burde et.al. 2410.13465v1 null
2024-10-16 Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation Francesco Evangelisti et.al. 2410.12679v1 null
2024-10-15 Contrastive Touch-to-Touch Pretraining Samanta Rodriguez et.al. 2410.11834v1 null
2024-10-18 X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing Xinyan Chen et.al. 2410.10167v2 null
2024-10-13 Occluded Human Pose Estimation based on Limb Joint Augmentation Gangtao Han et.al. 2410.09885v1 null
2024-10-12 Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors Hritam Basak et.al. 2410.09467v1 null
2024-10-12 Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis Qianyi Deng et.al. 2410.09312v1 link
2024-10-11 CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation Jianyu Zhao et.al. 2410.09010v1 link
2024-10-11 Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization Christian Schmidt et.al. 2410.08743v1 link
2024-10-10 Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation Felix Petersen et.al. 2410.08125v1 null
2024-10-10 Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation Maria Makarova et.al. 2410.07801v1 null
2024-10-10 Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos Cuong Le et.al. 2410.07795v1 link
2024-10-12 Autonomous Driving in Unstructured Environments: How Far Have We Come? Chen Min et.al. 2410.07701v2 link
2024-10-10 Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks Minxing Zhang et.al. 2410.07670v1 null
2024-10-09 OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB Yunzhi Lin et.al. 2410.06694v1 null
2024-10-08 SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging Ziyang Chen et.al. 2410.06028v1 link
2024-10-08 AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry Thomas Jantos et.al. 2410.05996v1 null
2024-10-08 Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? Charalambos Tzamos et.al. 2410.05984v1 link
2024-10-08 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance Ruocheng Wang et.al. 2410.05791v1 null
2024-10-07 Comparison of marker-less 2D image-based methods for infant pose estimation Lennart Jahn et.al. 2410.04980v1 null
2024-10-06 Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion Mehwish Ghafoor et.al. 2410.04574v1 link
2024-10-06 LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation Jianhao Jiao et.al. 2410.04419v1 null
2024-10-05 Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis Juan Ignacio Bravo Pérez-Villar et.al. 2410.04298v1 link
2024-10-05 A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems Nikola Radulov et.al. 2410.04242v1 link
2024-10-04 Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos Ziyu Wang et.al. 2410.03858v1 null
2024-10-04 Universal Global State Estimation for Inertial Navigation Systems Sifeddine Benahmed et.al. 2410.03846v1 null
2024-10-04 MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion Junyi Zhang et.al. 2410.03825v1 null
2024-10-04 Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images Ci Li et.al. 2410.03438v1 null
2024-10-04 HRVMamba: High-Resolution Visual State Space Model for Dense Prediction Hao Zhang et.al. 2410.03174v1 null
2024-10-04 CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization Shigemichi Matsuzaki et.al. 2410.03054v1 null
2024-10-03 Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition Nikolaos Stathoulopoulos et.al. 2410.02643v1 null
2024-10-03 Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features Chengkai Hou et.al. 2410.02237v1 null
2024-10-02 SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment Xingyu Ji et.al. 2410.01618v1 null
2024-10-02 SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network Ahmed Tawfik Aboukhadra et.al. 2410.01293v1 null
2024-10-01 Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models Jerry Yan et.al. 2410.01061v1 null
2024-10-01 RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations Kaichen Zhou et.al. 2410.00713v1 link
2024-10-01 GERA: Geometric Embedding for Efficient Point Registration Analysis Geng Li et.al. 2410.00589v1 null
2024-09-30 Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations Muhammad Saif Ullah Khan et.al. 2409.20469v1 null
2024-09-30 Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies Shalini Sarode et.al. 2409.20237v1 null
2024-09-30 PuzzleBoard: A New Camera Calibration Pattern with Position Encoding Peer Stelldinger et.al. 2409.20127v1 link
2024-09-30 Robust Gaussian Splatting SLAM by Leveraging Loop Closure Zunjie Zhu et.al. 2409.20111v1 null
2024-09-30 GearTrack: Automating 6D Pose Estimation Yu Deng et.al. 2409.19986v1 null
2024-09-29 PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond Chen Song et.al. 2409.19772v1 link
2024-09-29 GelSlim 4.0: Focusing on Touch and Reproducibility Andrea Sipos et.al. 2409.19770v1 null
2024-09-27 Robust Proximity Operations using Probabilistic Markov Models Deep Parikh et.al. 2409.19062v1 null
2024-09-27 Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras Yipeng Lu et.al. 2409.18673v1 null
2024-09-27 DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences Jingwei Song et.al. 2409.18457v1 null
2024-09-30 Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation Mengchen Zhang et.al. 2409.18261v2 link
2024-09-26 AI-Powered Augmented Reality for Satellite Assembly, Integration and Test Alvaro Patricio et.al. 2409.18101v1 null
2024-09-27 Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes Katja Ludwig et.al. 2409.17671v2 null
2024-09-25 Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits Shaoxiong Yao et.al. 2409.17389v1 null
2024-09-25 Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots Zhichao Liu et.al. 2409.17116v1 null
2024-09-25 Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles Ran Jing et.al. 2409.17111v1 null
2024-09-25 Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation Lucas Carvalho de Lima et.al. 2409.16680v1 null
2024-09-25 FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation Jingyi Tang et.al. 2409.16600v1 null
2024-09-25 Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots Masoud Dayani Najafabadi et.al. 2409.16595v1 link
2024-09-24 PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings Sutharsan Mahendren et.al. 2409.15832v1 null
2024-09-24 LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation Ruida Zhang et.al. 2409.15727v1 link
2024-09-23 Framework for Robust Localization of UUVs and Mapping of Net Pens David Botta et.al. 2409.15475v1 null
2024-09-23 FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera Guoyang Zhao et.al. 2409.15054v1 link
2024-09-23 BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach Stefano Puliti et.al. 2409.14755v1 link
2024-09-23 ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps Haiming Gao et.al. 2409.14723v1 null
2024-09-22 Tactile Functasets: Neural Implicit Representations of Tactile Datasets Sikai Li et.al. 2409.14592v1 null
2024-09-22 AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way Sining Huang et.al. 2409.14577v1 null
2024-09-22 DROP: Dexterous Reorientation via Online Planning Albert H. Li et.al. 2409.14562v1 null
2024-09-21 Combining Absolute and Semi-Generalized Relative Poses for Visual Localization Vojtech Panek et.al. 2409.14269v1 null
2024-09-18 SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection Tim Engelbracht et.al. 2409.11870v1 link
2024-09-18 End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation Thomas Pöllabauer et.al. 2409.11819v1 null
2024-09-18 Bridging Domain Gap for Flight-Ready Spaceborne Vision Tae Ha Park et.al. 2409.11661v1 null
2024-09-17 Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification Frederik Hagelskjær et.al. 2409.11512v1 null
2024-09-17 Training Datasets Generation for Machine Learning: Application to Vision Based Navigation Jérémy Lebreton et.al. 2409.11383v1 null
2024-09-17 OmniGen: Unified Image Generation Shitao Xiao et.al. 2409.11340v1 link
2024-09-17 ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges Thien-Minh Nguyen et.al. 2409.11122v1 link
2024-09-17 Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB Alessandro Simoni et.al. 2409.11104v1 null
2024-09-21 HGSLoc: 3DGS-based Heuristic Camera Pose Refinement Zhongyan Niu et.al. 2409.10925v2 null
2024-09-17 Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter Deep Parikh et.al. 2409.10815v1 null
2024-09-16 CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera Jingpei Lu et.al. 2409.10441v1 null
2024-09-16 HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models Vineet Bhat et.al. 2409.10419v1 null
2024-09-16 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? Téo Guichoux et.al. 2409.10357v1 null
2024-09-16 Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference Huy-Dung Nguyen et.al. 2409.10095v1 null
2024-09-15 Precise Pick-and-Place using Score-Based Diffusion Networks Shih-Wei Guo et.al. 2409.09725v1 null
2024-09-15 Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild Nie Lin et.al. 2409.09714v1 null
2024-09-15 Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision Deep Parikh et.al. 2409.09665v1 null
2024-09-15 A Scalable Tabletop Satellite Automation Testbed:Design And Experiments Deep Parikh et.al. 2409.09633v1 null
2024-09-14 MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry Yuheng Qiu et.al. 2409.09479v1 null
2024-09-14 Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM Haoying Li et.al. 2409.09410v1 null
2024-09-13 Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry Yunus Bilge Kurt et.al. 2409.08769v1 link
2024-09-13 WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users Yunzhi Li et.al. 2409.08494v1 link
2024-09-12 Bayesian Inverse Graphics for Few-Shot Concept Learning Octavio Arriaga et.al. 2409.08351v1 link
2024-09-12 Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation Samanta Rodriguez et.al. 2409.08269v1 null
2024-09-12 Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation Haoying Li et.al. 2409.07933v1 null
2024-09-12 GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions Liang Feng et.al. 2409.07798v1 null
2024-09-12 GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution Liang Feng et.al. 2409.07752v1 null
2024-09-11 FaVoR: Features via Voxel Rendering for Camera Relocalization Vincenzo Polizzi et.al. 2409.07571v1 null
2024-09-11 Benchmarking 2D Egocentric Hand Pose Datasets Olga Taran et.al. 2409.07337v1 null
2024-09-11 iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation Shuolong Chen et.al. 2409.07116v1 link
2024-09-11 Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry Anbo Tao et.al. 2409.06948v1 null
2024-09-13 A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch Haodong Zheng et.al. 2409.06912v2 null
2024-09-11 Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences Shishir Reddy Vutukur et.al. 2409.06683v2 link
2024-09-10 PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation Ginger Delmas et.al. 2409.06535v1 null
2024-09-10 Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation Mohsi Jawaid et.al. 2409.06240v1 null
2024-09-09 From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models Tessa Pulli et.al. 2409.05413v1 null
2024-09-08 HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions Jianping Li et.al. 2409.05006v1 null
2024-09-06 Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands Yotam Erel et.al. 2409.04397v1 null
2024-09-06 GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Lorenza Prospero et.al. 2409.04196v1 null
2024-09-06 Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics Woojin Cho et.al. 2409.04033v1 null
2024-09-06 Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments Therese Joseph et.al. 2409.03998v1 null
2024-09-09 The Influence of Faulty Labels in Data Sets on Human Pose Estimation Arnold Schwarz et.al. 2409.03887v2 null
2024-09-05 MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation Philipp Quentin et.al. 2409.03556v1 null
2024-09-05 UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking Md. Mahfuzur Rahman et.al. 2409.03245v1 null
2024-09-01 Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach Wenjun Huang et.al. 2409.02715v1 null
2024-09-04 Object Gaussian for Monocular 6D Pose Estimation from Sparse Views Luqing Luo et.al. 2409.02581v1 null
2024-09-03 EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision Yiming Zhao et.al. 2409.02224v1 null
2024-09-03 Deep learning for objective estimation of Parkinsonian tremor severity Felipe Duque-Quiceno et.al. 2409.02011v1 null
2024-09-03 SPiKE: 3D Human Pose from Point Cloud Sequences Irene Ballester et.al. 2409.01879v1 link
2024-09-02 Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds Mohammed H. AlSharif et.al. 2409.01002v1 null
2024-09-01 Detection, Recognition and Pose Estimation of Tabletop Objects Sanjuksha Nirgude et.al. 2409.00869v1 null
2024-09-01 DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation Huixin Zhang et.al. 2409.00744v1 link
2024-09-01 MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds Ziqiang Dang et.al. 2409.00736v1 null
2024-08-31 ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action Longyun Liao et.al. 2409.00449v1 null
2024-09-04 Augmented Reality without Borders: Achieving Precise Localization Without Maps Albert Gassol Puigjaner et.al. 2408.17373v3 null
2024-08-30 BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities Boris Meden et.al. 2408.17297v1 null
2024-08-30 EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs Zhen Fan et.al. 2408.17168v1 null
2024-09-01 Generic Objects as Pose Probes for Few-Shot View Synthesis Zhirui Gao et.al. 2408.16690v2 null
2024-08-29 OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation Yuchen Che et.al. 2408.16547v1 link
2024-08-29 GRPose: Learning Graph Relations for Human Image Generation with Pose Priors Xiangchen Yin et.al. 2408.16540v1 link
2024-08-28 Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators Nikita Kister et.al. 2408.16536v1 null
2024-08-28 Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation Laura Bragagnolo et.al. 2408.15810v1 link
2024-08-30 Addressing the challenges of loop detection in agricultural environments Nicolás Soncini et.al. 2408.15761v2 link
2024-08-28 Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph Zherong Zhang et.al. 2408.15750v1 null
2024-08-28 Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction Salma Salimi et.al. 2408.15717v1 null
2024-08-26 Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model Abu Saleh Musa Miah et.al. 2408.14111v1 null
2024-08-25 InterTrack: Tracking Human Object Interaction without Object Templates Xianghui Xie et.al. 2408.13953v1 null
2024-08-24 Temporally-consistent 3D Reconstruction of Birds Johannes Hägerlind et.al. 2408.13629v1 null
2024-08-24 Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation Jianing Song et.al. 2408.13587v1 null
2024-08-27 Sapiens: Foundation for Human Vision Models Rawal Khirodkar et.al. 2408.12569v3 null
2024-08-21 GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting Wanshui Gan et.al. 2408.11447v1 link
2024-08-20 GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting Changkun Liu et.al. 2408.11085v1 null
2024-08-20 ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data Elia Bonetto et.al. 2408.10831v1 null
2024-08-20 MPL: Lifting 3D Human Pose from Multi-view 2D Poses Seyed Abolfazl Ghasemzadeh et.al. 2408.10805v1 link
2024-08-19 RUMI: Rummaging Using Mutual Information Sheng Zhong et.al. 2408.10450v1 null
2024-08-19 SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Chao Xu et.al. 2408.10195v1 null
2024-08-19 SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition Wiktor Mucha et.al. 2408.10037v1 link
2024-08-19 Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation Qianhui Men et.al. 2408.09931v1 null
2024-08-18 OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare Chen Long-fei et.al. 2408.09409v1 null
2024-08-17 An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface Kevin Jose Thomas et.al. 2408.09311v1 link
2024-08-16 ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation Hao Tang et.al. 2408.09042v1 null
2024-08-16 Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS Wei Sun et.al. 2408.08723v1 null
2024-08-16 SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis Xingyue Lin et.al. 2408.08623v1 null
2024-08-15 HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning Hongyu Li et.al. 2408.08312v1 null
2024-08-15 Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation Varun Burde et.al. 2408.08234v1 link
2024-08-15 Towards Practical Human Motion Prediction with LiDAR Point Clouds Xiao Han et.al. 2408.08202v1 null
2024-08-15 Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment Qiushuo Cheng et.al. 2408.08182v1 null
2024-08-15 Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models Tianyu Wang et.al. 2408.07975v1 null
2024-08-15 GOReloc: Graph-based Object-Level Relocalization for Visual SLAM Yutong Wang et.al. 2408.07917v1 link
2024-08-13 Grasping by Hanging: a Learning-Free Grasping Detection Method for Previously Unseen Objects Wanze Li et.al. 2408.06734v1 null
2024-08-13 A Miniature Vision-Based Localization System for Indoor Blimps Shicong Ma et.al. 2408.06648v1 null
2024-08-12 UniT: Unified Tactile Representation for Robot Learning Zhengtong Xu et.al. 2408.06481v1 link
2024-08-12 Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques Navid Ghassemi et.al. 2408.06336v1 null
2024-08-12 CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments Yanpeng Jia et.al. 2408.05981v1 null
2024-08-12 PAFormer: Part Aware Transformer for Person Re-identification Hyeono Jung et.al. 2408.05918v1 null
2024-08-11 SABER-6D: Shape Representation Based Implicit Object Pose Estimation Shishir Reddy Vutukur et.al. 2408.05867v1 null
2024-08-10 Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis Zhongche Qu et.al. 2408.05635v1 null
2024-08-10 Anticipation through Head Pose Estimation: a preliminary study Federico Figari Tomenotti et.al. 2408.05516v1 null
2024-08-09 Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing Lennart Niecksch et.al. 2408.04979v1 null
2024-08-07 PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model Yunlong Huang et.al. 2408.03540v1 null
2024-08-06 Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera Zibin Liu et.al. 2408.03225v1 link
2024-08-06 Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW Elia Cereda et.al. 2408.03168v1 null
2024-08-06 BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications G. Manni et.al. 2408.03078v1 link
2024-08-07 Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network Xinyi Zhang et.al. 2408.02922v2 null
2024-08-05 Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises Aleksa Marusic et.al. 2408.02855v1 null
2024-08-05 Joint-Motion Mutual Learning for Pose Estimation in Videos Sifan Wu et.al. 2408.02285v1 null
2024-08-04 AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos Feichi Lu et.al. 2408.02110v1 null
2024-08-04 Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem Tian Zhan et.al. 2408.01945v1 null
2024-08-03 MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions Rahul Islam et.al. 2408.01850v1 null
2024-08-03 BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles Lun Luo et.al. 2408.01841v1 link
2024-08-03 E $^3$ NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images Yunshan Qi et.al. 2408.01840v1 null
2024-08-03 Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality Leina Elansary et.al. 2408.01728v1 null
2024-08-03 Stimulating Imagination: Towards General-purpose Object Rearrangement Jianyang Wu et.al. 2408.01655v1 null
2024-08-02 Full-range Head Pose Geometric Data Augmentations Huei-Chung Hu et.al. 2408.01566v1 null
2024-07-31 Adapting Skills to Novel Grasps: A Self-Supervised Approach Georgios Papagiannis et.al. 2408.00178v1 null
2024-07-31 Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods Xusheng Luo et.al. 2408.00117v1 null
2024-07-30 StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset Chaofan Huo et.al. 2407.20545v1 link
2024-07-30 HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation Wencan Cheng et.al. 2407.20542v1 link
2024-07-30 Markers Identification for Relative Pose Estimation of an Uncooperative Target Batu Candan et.al. 2407.20515v1 null
2024-07-29 BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation Kieran Saunders et.al. 2407.20437v1 null
2024-07-28 Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph Zhengcen Li et.al. 2407.19497v1 link
2024-07-26 Flexible graph convolutional network for 3D human pose estimation Abu Taib Mohammed Shahjahan et.al. 2407.19077v1 link
2024-07-26 From 2D to 3D: AISG-SLA Visual Localization Challenge Jialin Gao et.al. 2407.18590v1 null
2024-07-28 HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation Zhenzhi Wang et.al. 2407.17438v2 link
2024-07-24 Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments Wei Gao et.al. 2407.17078v1 null
2024-07-30 DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction Xiaobiao Du et.al. 2407.16988v2 link
2024-07-24 Pose Estimation from Camera Images for Underwater Inspection Luyuan Peng et.al. 2407.16961v1 null
2024-07-23 COALA: A Practical and Vision-Centric Federated Learning Platform Weiming Zhuang et.al. 2407.16560v1 link
2024-07-23 Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features Romeo Valentin et.al. 2407.16223v1 null
2024-07-23 Optimal camera-robot pose estimation in linear time from points and lines Guangyang Zeng et.al. 2407.16151v1 null
2024-07-23 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images Jie Zhao et.al. 2407.16137v1 null
2024-07-21 CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models Zheng Chong et.al. 2407.15886v1 link
2024-07-22 RADA: Robust and Accurate Feature Learning with Domain Adaptation Jingtai He et.al. 2407.15791v1 null
2024-07-22 Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection Kangqi Ma et.al. 2407.15771v1 null
2024-07-22 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model Matteo Bortolon et.al. 2407.15484v1 null
2024-07-23 Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions Yihao Ai et.al. 2407.15451v2 link
2024-07-22 avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality Dizhi Ma et.al. 2407.15373v1 null
2024-07-20 From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM Lorenzo Montano-Oliván et.al. 2407.14797v1 null
2024-07-19 ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation Luke Bidulka et.al. 2407.14605v1 null
2024-07-19 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry Sungho Chun et.al. 2407.14136v1 link
2024-07-18 RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark Yuan-Hao Ho et.al. 2407.13930v1 null
2024-07-19 GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation Bangyan Liao et.al. 2407.13537v2 link
2024-07-18 SCAPE: A Simple and Strong Category-Agnostic Pose Estimator Yujia Liang et.al. 2407.13483v1 link
2024-07-17 SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization Yiyang Chen et.al. 2407.12667v1 link
2024-07-17 Invertible Neural Warp for NeRF Shin-Fang Chng et.al. 2407.12354v1 null
2024-07-16 NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models Francesco Milano et.al. 2407.12207v1 link
2024-07-16 Monocular pose estimation of articulated surgical instruments in open surgery Robert Spektor et.al. 2407.12138v1 null
2024-07-17 GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection Jingwen Yu et.al. 2407.11736v2 link
2024-07-16 TCFormer: Visual Recognition via Token Clustering Transformer Wang Zeng et.al. 2407.11321v1 link
2024-07-15 A BlueROV2-based platform for underwater mapping experiments Tudor Alinei-Poiana et.al. 2407.10901v1 link
2024-07-15 LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning Zhuozhu Jian et.al. 2407.10782v1 null
2024-07-15 Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis Antoine Legrand et.al. 2407.10762v1 null
2024-07-16 GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation Haonan Wang et.al. 2407.10756v2 null
2024-07-15 Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs Nicholas Carlotti et.al. 2407.10661v1 null
2024-07-15 Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function Giulia Panconi et.al. 2407.10590v1 null
2024-07-14 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects Weiming Zhi et.al. 2407.10331v1 null
2024-07-16 psifx -- Psychological and Social Interactions Feature Extraction Package Guillaume Rochette et.al. 2407.10266v2 null
2024-07-14 PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation Nermin Samet et.al. 2407.10220v1 link
2024-07-14 3DEgo: 3D Editing on the Go! Umar Khalid et.al. 2407.10102v1 null
2024-07-12 iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning Tom Fischer et.al. 2407.09271v1 link
2024-07-12 HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation Manuel Birlo et.al. 2407.09215v1 null
2024-07-12 KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting Andrew Jeong et.al. 2407.08909v1 null
2024-07-11 RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation Tao Jiang et.al. 2407.08634v1 link
2024-07-11 SRPose: Two-view Relative Pose Estimation with Sparse Keypoints Rui Yin et.al. 2407.08199v1 link
2024-07-11 SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM Neng Wang et.al. 2407.08106v1 link
2024-07-10 RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects Jiahao Nick Li et.al. 2407.08081v1 null
2024-07-10 Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization Jinjie Mai et.al. 2407.08023v1 link
2024-07-10 Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation Junjia Han et.al. 2407.07389v1 null
2024-07-09 Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images Chuanrui Zhang et.al. 2407.06984v1 null
2024-07-09 Computer vision tasks for intelligent aerospace missions: An overview Huilin Chen et.al. 2407.06513v1 null
2024-07-08 GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields Weiyi Xue et.al. 2407.05597v1 null
2024-07-10 On the power of data augmentation for head pose estimation Michael Welter et.al. 2407.05357v2 link
2024-07-07 SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning Yi Feng et.al. 2407.05283v1 link
2024-07-05 Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos Leonhard Sommer et.al. 2407.04384v1 link
2024-07-04 Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation Laiyan Ding et.al. 2407.04041v1 link
2024-07-04 Markerless Multi-view 3D Human Pose Estimation: a survey Ana Filipa Rodrigues Nogueira et.al. 2407.03817v1 null
2024-07-04 A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios Zikang Yuan et.al. 2407.03590v1 link
2024-07-03 Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation Mengmeng Cui et.al. 2407.02990v1 null
2024-07-03 Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction Jiaxin Guo et.al. 2407.02918v1 link
2024-07-02 SUPER: Seated Upper Body Pose Estimation using mmWave Radars Bo Zhang et.al. 2407.02455v1 null
2024-07-02 ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction Bo Qian et.al. 2407.02129v1 null
2024-07-02 Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval Nicola Messina et.al. 2407.02104v1 null
2024-07-01 Active Human Pose Estimation via an Autonomous UAV Agent Jingxi Chen et.al. 2407.01811v1 null
2024-07-01 RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields Haochen Jiang et.al. 2407.01303v1 link
2024-07-01 Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization Ruofei Bai et.al. 2407.01013v1 link
2024-06-30 Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation Adnan Abdullah et.al. 2407.00848v1 null
2024-06-29 When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration Philipp Allgeuer et.al. 2407.00518v1 link
2024-06-28 Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review Moseli Mots'oehli et.al. 2407.00252v1 null
2024-06-28 EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans Nicola Garau et.al. 2406.19726v1 null
2024-06-28 CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services DongKi Noh et.al. 2406.19634v1 null
2024-06-27 Multimodal Visual-haptic pose estimation in the presence of transient occlusion Michael Zechmair et.al. 2406.19323v1 null
2024-06-27 Human Modelling and Pose Estimation Overview Pawel Knap et.al. 2406.19290v1 null
2024-06-26 Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference Yuan Gao et.al. 2406.18453v1 link
2024-06-27 Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods Filipe Gama et.al. 2406.17382v2 null
2024-06-24 High-resolution open-vocabulary object 6D pose estimation Jaime Corsetti et.al. 2406.16384v1 null
2024-06-23 Breaking the Frame: Image Retrieval by Visual Overlap Prediction Tong Wei et.al. 2406.16204v1 link
2024-06-21 Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe Sandeep Singh Sengar et.al. 2406.15649v1 link
2024-06-24 Investigating the impact of 2D gesture representation on co-speech gesture generation Teo Guichoux et.al. 2406.15111v2 null
2024-06-20 Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data Moira Shooter et.al. 2406.14412v1 null
2024-06-20 PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions Sihan Ma et.al. 2406.14367v1 null
2024-06-19 NeRF-Feat: 6D Object Pose Estimation using Feature Rendering Shishir Reddy Vutukur et.al. 2406.13796v1 null
2024-06-19 CNN Based Flank Predictor for Quadruped Animal Species Vanessa Suessle et.al. 2406.13588v1 null
2024-06-19 MVSBoost: An Efficient Point Cloud-based 3D Reconstruction Umair Haroon et.al. 2406.13515v1 null
2024-06-19 An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses Johanna Bräunig et.al. 2406.13464v1 null
2024-06-18 Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings Ruijie Tang et.al. 2406.13048v1 null
2024-06-17 Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization Huaiji Zhou et.al. 2406.11766v1 null
2024-06-17 Domain Generalization for In-Orbit 6D Pose Estimation Antoine Legrand et.al. 2406.11743v1 null
2024-06-17 SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking Tianhong Catherine Yu et.al. 2406.11645v1 null
2024-06-14 Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization Wonho Song et.al. 2406.11599v1 null
2024-06-15 MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception M. Mahbubur Rahman et.al. 2406.10708v1 link
2024-06-15 Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference Shayan Shekarforoush et.al. 2406.10455v1 null
2024-06-14 The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences Bria Long et.al. 2406.10447v1 null
2024-06-14 OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics Yoni Gozlan et.al. 2406.09788v1 null
2024-06-13 ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Wufei Ma et.al. 2406.09613v1 link
2024-06-13 Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV Maneesha Wickramasuriya et.al. 2406.09260v1 link
2024-06-14 Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning Huy Hoang Nguyen et.al. 2406.09039v2 null
2024-06-14 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks Jiannan Wu et.al. 2406.08394v2 link
2024-06-12 Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization Jiaxin Deng et.al. 2406.08001v1 null
2024-06-12 IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes Fengtian Lang et.al. 2406.07937v1 link
2024-06-12 From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers Swaminathan Gurumurthy et.al. 2406.07785v1 link
2024-06-12 SPIN: Spacecraft Imagery for Navigation Javier Montalvo et.al. 2406.07500v2 link
2024-06-11 Realistic Data Generation for 6D Pose Estimation of Surgical Instruments Juan Antonio Barragan et.al. 2406.07328v1 link
2024-06-11 SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale Shester Gueuwou et.al. 2406.06907v1 null
2024-06-10 Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation Shenghao Li et.al. 2406.06374v1 link
2024-06-08 A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks Muhammad Suhail Saleem et.al. 2406.05522v1 null
2024-06-06 GLACE: Global Local Accelerated Coordinate Encoding Fangjinhua Wang et.al. 2406.04340v1 link
2024-06-06 Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking Jiyao Zhang et.al. 2406.04316v1 null
2024-06-05 Hi5: 2D Hand Pose Estimation with Zero Human Annotation Masum Hasan et.al. 2406.03599v1 null
2024-06-05 Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices Xingjian Yang et.al. 2406.02977v1 null
2024-06-04 CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation Dejia Xu et.al. 2406.02509v1 null
2024-06-04 HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model Yu Tian et.al. 2406.01914v1 null
2024-06-03 A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios Enrico Martini et.al. 2406.01832v1 link
2024-06-01 Equivariant amortized inference of poses for cryo-EM Larissa de Ruijter et.al. 2406.01630v1 null
2024-06-03 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information Sihan Wen et.al. 2406.01196v1 null
2024-06-01 CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation Matan Rusanovsky et.al. 2406.00384v1 link
2024-05-30 Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach Muhammad Saif Ullah Khan et.al. 2405.20084v1 null
2024-05-30 TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM Peifeng Jiang et.al. 2405.19614v1 null
2024-05-29 Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives Mingqi Yuan et.al. 2405.19531v1 null
2024-05-29 Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation Sabrina Cynthia Triess et.al. 2405.19173v1 null
2024-05-28 World Models for General Surgical Grasping Hongbin Lin et.al. 2405.17940v1 null
2024-05-27 MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds Jiahui Lei et.al. 2405.17421v1 link
2024-05-27 Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding Niloofar Azizi et.al. 2405.17397v1 null
2024-05-27 $\text{Di}^2\text{Pose}$ : Discrete Diffusion Model for Occluded 3D Human Pose Estimation Weiquan Wang et.al. 2405.17016v1 null
2024-05-27 Clustering-based Learning for UAV Tracking and Pose Estimation Jiaping Xiao et.al. 2405.16867v1 null
2024-05-26 Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge Tianchen Deng et.al. 2405.16464v1 link
2024-05-25 Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality Hakim Ikebayashi et.al. 2405.16008v1 null
2024-05-23 CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments Yang Zhou et.al. 2405.14731v1 link
2024-05-23 Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation Daniel Kienzle et.al. 2405.14467v1 link
2024-05-21 Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos Jayroop Ramesh et.al. 2405.13235v1 link
2024-05-21 Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations Antoine Legrand et.al. 2405.12728v1 null
2024-05-21 PoseGravity: Pose Estimation from Points and Lines with Axis Prior Akshay Chandrasekhar et.al. 2405.12646v1 link
2024-05-19 Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation Zejun Gu et.al. 2405.12247v1 null
2024-05-20 AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements Calvin Yeung et.al. 2405.12070v1 link
2024-05-19 Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries Christiaan G. A. Viviers et.al. 2405.11677v1 link
2024-05-19 Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation Zejun Gu et.al. 2405.11448v1 null
2024-05-18 PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking Yifan Yang et.al. 2405.11257v1 null
2024-05-18 MotionGS : Compact Gaussian Splatting SLAM by Motion Filter Xinli Guo et.al. 2405.11129v1 link
2024-05-17 Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation Yongliang Lin et.al. 2405.10557v1 null
2024-05-16 Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder Mohamed Ilyes Lakhal et.al. 2405.10423v1 null
2024-05-17 Toon3D: Seeing Cartoons from a New Perspective Ethan Weber et.al. 2405.10320v2 null
2024-05-15 Task-adaptive Q-Face Haomiao Sun et.al. 2405.09059v1 null
2024-05-14 RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images Zong-Wei Hong et.al. 2405.08483v1 link
2024-05-14 TP3M: Transformer-based Pseudo 3D Image Matching with Reference Liming Han et.al. 2405.08434v1 null
2024-05-13 Deep Learning-Based Object Pose Estimation: A Comprehensive Survey Jian Liu et.al. 2405.07801v1 link
2024-05-13 JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation Xubo Luo et.al. 2405.07429v1 link
2024-05-11 TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization Zhen Tan et.al. 2405.07027v1 link
2024-05-11 AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation Xingxu Li et.al. 2405.06959v1 null
2024-05-10 CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras James Tang et.al. 2405.06845v1 link
2024-05-10 MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization Pengcheng Zhu et.al. 2405.06241v1 null
2024-05-10 Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera Haixin Shi et.al. 2405.05858v2 null
2024-05-09 Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion Huanyu Tian et.al. 2405.05817v1 null
2024-05-09 NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM Yiping Xie et.al. 2405.05807v1 null
2024-05-09 Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview Yuhang Ming et.al. 2405.05526v1 null
2024-05-08 Adversary-Guided Motion Retargeting for Skeleton Anonymization Thomas Carr et.al. 2405.05428v1 null
2024-05-08 FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models Jinglin Xu et.al. 2405.05216v1 link
2024-05-08 ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion Bing Zhu et.al. 2405.05164v1 null
2024-05-08 GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation Ivan Bilić et.al. 2405.04890v1 null
2024-05-07 Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation Jenny Wang et.al. 2405.04609v1 null
2024-05-07 Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map Yuxuan Xia et.al. 2405.04290v1 null
2024-05-07 Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform Zhijian Qiao et.al. 2405.03969v1 null
2024-05-07 Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints Xiongjun Guan et.al. 2405.03959v1 link
2024-05-06 Pose Priors from Language Models Sanjay Subramanian et.al. 2405.03689v1 null
2024-05-06 Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors Amit Moryossef et.al. 2405.03545v1 link
2024-05-05 Multi-hop graph transformer network for 3D human pose estimation Zaedul Islam et.al. 2405.03055v1 null
2024-05-05 Blending Distributed NeRFs with Tri-stage Robust Pose Optimization Baijun Ye et.al. 2405.02880v1 null
2024-05-03 WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD Xuxin Cheng et.al. 2405.02241v1 link
2024-05-03 Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation Xianzhou Zeng et.al. 2405.02114v1 link
2024-05-03 An Onboard Framework for Staircases Modeling Based on Point Clouds Chun Qing et.al. 2405.01918v1 null
2024-05-06 ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness Deegan Atha et.al. 2405.01673v2 null
2024-05-02 IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning Ryan Hoque et.al. 2405.01472v1 null
2024-05-02 Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning Liu Qiyuan et.al. 2405.01284v1 null
2024-05-02 Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors Wenxuan Guo et.al. 2405.01112v1 null
2024-05-02 CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications Jan Blumenkamp et.al. 2405.01107v1 null
2024-05-04 HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images Zixun Jiao et.al. 2405.01066v2 null
2024-05-01 Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods Andrew J. Kramer et.al. 2405.00600v1 null
2024-04-30 Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging Rayan Armani et.al. 2404.19541v1 link
2024-04-30 UniFS: Universal Few-shot Instance Perception with Point Representations Sheng Jin et.al. 2404.19401v1 link
2024-04-30 Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training Xingyu Song et.al. 2404.19279v1 link
2024-04-30 XFeat: Accelerated Features for Lightweight Image Matching Guilherme Potje et.al. 2404.19174v1 null
2024-04-29 Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction Antoine Maiorca et.al. 2404.18628v1 null
2024-04-29 Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle Jungwoo Lee et.al. 2404.18395v1 null
2024-04-29 Reconstructing Satellites in 3D from Amateur Telescope Images Zhiming Chang et.al. 2404.18394v1 null
2024-04-27 Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs Yiming Bao et.al. 2404.17837v1 null
2024-04-26 Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses Yi Shen et.al. 2404.17685v1 null
2024-04-26 SLAM for Indoor Mapping of Wide Area Construction Environments Vincent Ress et.al. 2404.17215v1 null
2024-04-25 WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users William Huang et.al. 2404.17063v1 link
2024-04-25 Transformer-Based Local Feature Matching for Multimodal Image Registration Remi Delaunay et.al. 2404.16802v1 null
2024-04-25 DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation Leandro Di Bella et.al. 2404.16558v1 null
2024-04-25 Efficient Solution of Point-Line Absolute Pose Petr Hruby et.al. 2404.16552v1 link
2024-04-25 COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images Panagiotis Sapoutzoglou et.al. 2404.16471v1 link
2024-04-25 MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter Kenji Koide et.al. 2404.16370v1 null
2024-04-24 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement Filipa Lino et.al. 2404.16136v1 link
2024-04-23 SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation Xiangyu Xu et.al. 2404.15276v1 link
2024-04-25 Domain adaptive pose estimation via multi-level alignment Yugan Chen et.al. 2404.14885v2 link
2024-04-23 Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking Kexin Meng et.al. 2404.14835v1 null
2024-04-23 UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues Vandad Davoodnia et.al. 2404.14634v1 null
2024-04-22 DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation Yonghao Dang et.al. 2404.14025v1 link
2024-04-23 CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory Yunlong Ran et.al. 2404.13896v2 null
2024-04-21 Resampling-free Particle Filters in High-dimensions Akhilan Boopathy et.al. 2404.13698v1 link
2024-04-20 EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment Guanghao Li et.al. 2404.13346v1 link
2024-04-18 Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds Oliver Lemke et.al. 2404.12440v1 null
2024-04-18 Gait Recognition from Highly Compressed Videos Andrei Niculae et.al. 2404.12183v1 null
2024-04-17 Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding George Retsinas et.al. 2404.12144v1 link
2024-04-17 Kathakali Hand Gesture Recognition With Minimal Data Kavitha Raju et.al. 2404.11205v1 null
2024-04-17 GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement Linfang Zheng et.al. 2404.11139v1 null
2024-04-17 CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation Lianyu Hu et.al. 2404.11111v1 link
2024-04-16 HumMUSS: Human Motion Understanding using State Space Models Arnab Kumar Mondal et.al. 2404.10880v1 null
2024-04-16 Invariant Kalman Filtering with Noise-Free Pseudo-Measurements Sven Goffin et.al. 2404.10687v1 null
2024-04-16 The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement Gabriele Trivigno et.al. 2404.10438v1 null
2024-04-16 GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling Huantao Ren et.al. 2404.10213v1 null
2024-04-16 LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark Avinash Upadhyay et.al. 2404.10212v1 link
2024-04-15 LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives Jiadi Cui et.al. 2404.09748v1 null
2024-04-14 In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition Wiktor Mucha et.al. 2404.09308v1 link
2024-04-13 DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector Johan Edstedt et.al. 2404.08928v1 link
2024-04-16 3D Human Scan With A Moving Event Camera Kai Kohyama et.al. 2404.08504v2 null
2024-04-11 Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method Tashmoy Ghosh et.al. 2404.07649v1 null
2024-04-11 GLID: Pre-training a Generalist Encoder-Decoder Vision Model Jihao Liu et.al. 2404.07603v1 null
2024-04-10 Measuring proximity to standard planes during fetal brain ultrasound scanning Chiara Di Vece et.al. 2404.07124v1 null
2024-04-10 MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints Bedirhan Uguz et.al. 2404.07094v1 null
2024-04-10 Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting Xiaolei Lang et.al. 2404.06926v1 null
2024-04-09 Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences Axel Barroso-Laguna et.al. 2404.06337v1 link
2024-04-09 Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes Tianchen Deng et.al. 2404.06050v1 null
2024-04-08 Learning 3D-Aware GANs from Unposed Images with Template Feature Field Xinya Chen et.al. 2404.05705v1 null
2024-04-08 Learning a Category-level Object Pose Estimator without Pose Annotations Fengrui Tian et.al. 2404.05626v1 null
2024-04-08 DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker Jiapeng Wu et.al. 2404.05518v1 link
2024-04-08 Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks Maksym Ivashechkin et.al. 2404.05414v1 null
2024-04-08 STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs Kush Hari et.al. 2404.05151v1 null
2024-04-05 ToolEENet: Tool Affordance 6D Pose Estimation Yunlong Wang et.al. 2404.04193v1 null
2024-04-04 SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation Sichen Chen et.al. 2404.03518v1 link
2024-04-04 Multi Positive Contrastive Learning with Pose-Consistent Generated Images Sho Inayoshi et.al. 2404.03256v1 null
2024-04-04 HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud Wencan Cheng et.al. 2404.03159v1 link
2024-04-03 Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones Luca Crupi et.al. 2404.02567v1 null
2024-04-03 Semi-Supervised Unconstrained Head Pose Estimation in the Wild Huayi Zhou et.al. 2404.02544v1 link
2024-04-02 3D Congealing: 3D-Aware Image Alignment in the Wild Yunzhi Zhang et.al. 2404.02125v1 null
2024-04-02 SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation Vinkle Srivastav et.al. 2404.02041v1 link
2024-04-01 Marrying NeRF with Feature Matching for One-step Pose Estimation Ronghan Chen et.al. 2404.00891v1 null
2024-03-31 Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation Meisam Kabiri et.al. 2404.00691v1 null
2024-03-31 OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos Dongyoung Choi et.al. 2404.00676v1 null
2024-04-02 KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation Jihua Peng et.al. 2404.00658v2 link
2024-03-29 FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model Molin Zhang et.al. 2404.00132v1 null
2024-03-29 Latent Embedding Clustering for Occlusion Robust Head Pose Estimation José Celestino et.al. 2403.20251v1 null
2024-03-29 A Unified Framework for Human-centric Point Cloud Video Understanding Yiteng Xu et.al. 2403.20031v1 null
2024-04-01 Video-Based Human Pose Regression via Decoupled Space-Time Aggregation Jijie He et.al. 2403.19926v2 link
2024-03-28 Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation Xiao Lin et.al. 2403.19527v1 link
2024-03-27 Object Pose Estimation via the Aggregation of Diffusion Features Tianfu Wang et.al. 2403.18791v1 link
2024-03-27 RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation Yang Tian et.al. 2403.18259v1 null
2024-03-26 Mathematical Foundation and Corrections for Full Range Head Pose Estimation Huei-Chung Hu et.al. 2403.18104v1 null
2024-03-26 EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation Chenhongyi Yang et.al. 2403.18080v1 link
2024-03-26 A Survey on 3D Egocentric Human Pose Estimation Md Mushfiqur Azam et.al. 2403.17893v1 link
2024-03-26 GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction Hrishav Bakul Barua et.al. 2403.17837v1 link
2024-03-26 DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions Sammy Christen et.al. 2403.17827v1 null
2024-03-26 System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners Felix Esser et.al. 2403.17788v1 null
2024-03-25 Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos Remy Sabathier et.al. 2403.17103v1 link
2024-03-25 Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging Mahdieh Dashtbani Moghari et.al. 2403.16490v1 null
2024-03-25 Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects Zicong Fan et.al. 2403.16428v1 link
2024-03-25 A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups Yixiao Ge et.al. 2403.16411v1 null
2024-03-25 ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation Hannah Schieber et.al. 2403.16400v1 link
2024-03-24 KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments Abdelrahman Younes et.al. 2403.16238v1 null
2024-03-24 Diffusion Model is a Good Pose Estimator from 3D RF-Vision Junqiao Fan et.al. 2403.16198v1 null
2024-03-23 UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation Yuliang Guo et.al. 2403.15705v1 link
2024-03-22 InterFusion: Text-Driven Generation of 3D Human-Object Interaction Sisi Dai et.al. 2403.15612v1 link
2024-03-22 Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times Sepehr Sabeti et.al. 2403.15571v1 null
2024-03-22 Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications Vít Krátký et.al. 2403.15333v1 null
2024-03-22 WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization Jialu Wang et.al. 2403.15272v1 null
2024-03-22 DITTO: Demonstration Imitation by Trajectory Transformation Nick Heppert et.al. 2403.15203v1 null
2024-03-22 Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning Bumsoo Kim et.al. 2403.15048v1 null
2024-03-22 Trajectory Regularization Enhances Self-Supervised Geometric Representation Jiayun Wang et.al. 2403.14973v1 link
2024-03-21 VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding Ahmad Mahmood et.al. 2403.14743v1 link
2024-03-21 Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation Ruyi Lian et.al. 2403.14559v1 null
2024-03-23 Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset Andrea Avogaro et.al. 2403.14447v2 null
2024-03-21 Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests Haedam Oh et.al. 2403.14326v1 null
2024-03-21 Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation Francesco Di Felice et.al. 2403.14279v1 null
2024-03-20 DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses Chen Zhao et.al. 2403.13683v1 link
2024-03-20 Meta-Point Learning and Refining for Category-Agnostic Pose Estimation Junjie Chen et.al. 2403.13647v1 link
2024-03-20 Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery Mayura Manawadu et.al. 2403.13434v1 null
2024-03-20 DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation Yamin Mao et.al. 2403.13405v1 null
2024-03-20 ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics Qiaojun Yu et.al. 2403.13365v1 null
2024-03-20 MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination Weiying Wang et.al. 2403.13348v1 null
2024-03-19 FaceXFormer: A Unified Transformer for Facial Analysis Kartik Narayan et.al. 2403.12960v1 link
2024-03-19 WHAC: World-grounded Humans and Cameras Wanqi Yin et.al. 2403.12959v1 link
2024-03-19 Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation Jingtao Sun et.al. 2403.12728v1 link
2024-03-19 IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model Matteo Bortolon et.al. 2403.12682v1 null
2024-03-19 In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing Mingrui Yu et.al. 2403.12676v1 null
2024-03-19 Self-learning Canonical Space for Multi-view 3D Human Pose Estimation Xiaoben Li et.al. 2403.12440v1 null
2024-03-20 Human Mesh Recovery from Arbitrary Multi-view Images Xiaoben Li et.al. 2403.12434v2 link
2024-03-19 XPose: eXplainable Human Pose Estimation Luyu Qiu et.al. 2403.12370v1 null
2024-03-18 HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data Mengqi Zhang et.al. 2403.12011v1 null
2024-03-18 Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction Wolfgang Fuhl et.al. 2403.11665v1 null
2024-03-18 An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation Zewen Xu et.al. 2403.11639v1 null
2024-03-18 LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models Yang Yang et.al. 2403.11627v1 link
2024-03-18 GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects Sungphill Moon et.al. 2403.11510v1 null
2024-03-17 A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation Qucheng Peng et.al. 2403.11310v1 link
2024-03-17 Compact 3D Gaussian Splatting For Dense Visual SLAM Tianchen Deng et.al. 2403.11247v1 link
2024-03-16 Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty Lakshadeep Naik et.al. 2403.10874v1 null
2024-03-16 DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation Christopher Kolios et.al. 2403.10773v1 null
2024-03-15 GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation Dingding Cai et.al. 2403.10683v1 null
2024-03-15 CLOSURE: Fast Quantification of Pose Uncertainty Sets Yihuai Gao et.al. 2403.09990v1 null
2024-03-14 ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image Fangqiang Ding et.al. 2403.09871v1 null
2024-03-14 BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects Tomas Hodan et.al. 2403.09799v1 null
2024-03-14 Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR Sebastián Barbas Laina et.al. 2403.09596v1 null
2024-03-14 Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting Pawel Knap et.al. 2403.09437v1 null
2024-03-14 LM2D: Lyrics- and Music-Driven Dance Synthesis Wenjie Yin et.al. 2403.09407v1 null
2024-03-14 SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios Ding-Tao Huang et.al. 2403.09317v1 link
2024-03-14 MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion Arul Selvam Periyasamy et.al. 2403.09309v1 null
2024-03-13 Data Augmentation in Human-Centric Vision Wentao Jiang et.al. 2403.08650v1 null
2024-03-15 PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections Matteo Taiana et.al. 2403.08586v2 null
2024-03-13 NeRF-Supervised Feature Point Detection and Description Ali Youssef et.al. 2403.08156v1 link
2024-03-12 Q-SLAM: Quadric Representations for Monocular SLAM Chensheng Peng et.al. 2403.08125v1 null
2024-03-12 MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation Yuelong Li et.al. 2403.08019v1 link
2024-03-12 Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation Kira Wursthorn et.al. 2403.07741v1 null
2024-03-12 Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving JunDa Cheng et.al. 2403.07535v1 link
2024-03-12 Category-Agnostic Pose Estimation for Point Clouds Bowen Liu et.al. 2403.07437v1 null
2024-03-12 Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery Yike Zhang et.al. 2403.07219v1 null
2024-03-11 Real-Time Simulated Avatar from Head-Mounted Sensors Zhengyi Luo et.al. 2403.06862v1 null
2024-03-11 Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition Erkut Akdag et.al. 2403.06577v1 null
2024-03-10 Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation Paweł A. Pierzchlewicz et.al. 2403.06164v1 link
2024-03-10 Diffusion Models Trained with Large Data Are Transferable Visual Models Guangkai Xu et.al. 2403.06090v1 link
2024-03-08 Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm Ziyu Zhang et.al. 2403.05666v1 null
2024-03-11 Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation Tarek Bouazza et.al. 2403.05450v2 null
2024-03-07 Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps Ivana Collado-Gonzalez et.al. 2403.04936v1 null
2024-03-07 That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation Georgi Pramatarov et.al. 2403.04755v1 null
2024-03-07 Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser Qingyuan Cai et.al. 2403.04444v1 link
2024-03-09 Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation Ruicong Liu et.al. 2403.04381v2 link
2024-03-05 FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation Chris Rockwell et.al. 2403.03221v1 null
2024-03-05 NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors Yannan He et.al. 2403.03122v1 null
2024-03-05 Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection Mohamed Afifi et.al. 2403.03111v1 null
2024-03-05 Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps Timothy Chen et.al. 2403.02751v1 null
2024-03-04 PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station Cunyi Yin et.al. 2403.01913v1 link
2024-03-04 A Simple Baseline for Efficient Hand Mesh Reconstruction Zhishan Zhou et.al. 2403.01813v1 null
2024-03-03 MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images Junwen Huang et.al. 2403.01517v1 null
2024-03-02 Single-image camera calibration with model-free distortion correction Katia Genovese et.al. 2403.01263v1 null
2024-03-02 Grid-based Fast and Structural Visual Odometry Zhang Zhihe et.al. 2403.01110v1 null
2024-03-01 Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations Syed Shabbir Ahmed et.al. 2403.00988v1 null
2024-03-04 TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity Sangwoon Kim et.al. 2403.00049v2 null
2024-03-01 Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach Sarina Thomas et.al. 2402.19062v2 null
2024-02-29 Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey Yang Liu et.al. 2402.18844v1 link
2024-02-28 Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting Taeho Kang et.al. 2402.18330v1 link
2024-02-28 Location-guided Head Pose Estimation for Fisheye Image Bing Li et.al. 2402.18320v1 null
2024-02-28 NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images Jingrui Yu et.al. 2402.18196v1 link
2024-02-28 Six-Point Method for Multi-Camera Systems with Reduced Solution Space Banglei Guan et.al. 2402.18066v1 link
2024-02-27 Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association Zhaoying Wang et.al. 2402.17504v1 null
2024-02-26 HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields Haozhe Qi et.al. 2402.17062v1 link
2024-02-26 DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation Shang Wu et.al. 2402.16640v1 null
2024-02-26 GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video Xinqi Liu et.al. 2402.16607v1 null
2024-02-26 DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer Yizhe Wu et.al. 2402.16308v1 null
2024-02-25 XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras Arnav Mishra et.al. 2402.16175v1 null
2024-02-25 VOLoc: Visual Place Recognition by Querying Compressed Lidar Map Xudong Cai et.al. 2402.15961v1 link
2024-02-24 CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge Xiao Lin et.al. 2402.15726v1 null
2024-02-23 Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones Matteo Risso et.al. 2402.15273v1 null
2024-02-22 Cameras as Rays: Pose Estimation via Ray Diffusion Jason Y. Zhang et.al. 2402.14817v1 null
2024-02-22 S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR Jialun Pei et.al. 2402.14461v1 link
2024-02-22 VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning Jingyao Li et.al. 2402.14456v1 null
2024-02-22 Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks Daniel Holmberg et.al. 2402.14400v1 link
2024-02-22 Secure Navigation using Landmark-based Localization in a GPS-denied Environment Ganesh Sapkota et.al. 2402.14280v1 null
2024-02-21 SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings Rishabh Bajpai et.al. 2402.14143v1 null
2024-02-21 High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks Luca Crupi et.al. 2402.13756v1 null
2024-02-21 EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization Zhendong Xiao et.al. 2402.13537v1 null
2024-02-20 DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation Takuya Ikeda et.al. 2402.12647v1 link
2024-02-19 Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment Ganesh Sapkota et.al. 2402.12551v1 null
2024-02-18 Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training Huayi Zhou et.al. 2402.11566v1 link
2024-02-17 Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review Merryn D. Constable et.al. 2402.11288v1 null
2024-02-17 Dense Matchers for Dense Tracking Tomáš Jelínek et.al. 2402.11287v1 null
2024-02-16 Occlusion Resilient 3D Human Pose Estimation Soumava Kumar Roy et.al. 2402.11036v1 null
2024-02-16 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations Tsung-Wei Ke et.al. 2402.10885v1 null
2024-02-15 Lester: rotoscope animation through video object segmentation and tracking Ruben Tous et.al. 2402.09883v1 link
2024-02-15 Foul prediction with estimated poses from soccer broadcast video Jiale Fang et.al. 2402.09650v1 null
2024-02-16 IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture Varun Ramani et.al. 2402.08923v2 null
2024-02-13 Are Semi-Dense Detector-Free Methods Good at Matching Local Features? Matthieu Vilain et.al. 2402.08671v1 null
2024-02-13 Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities Syed S. Ahmed et.al. 2402.08566v1 null
2024-02-13 Learning to Produce Semi-dense Correspondences for Visual Localization Khang Truong Giang et.al. 2402.08359v1 link
2024-02-12 Extending 3D body pose estimation for robotic-assistive therapies of autistic children Laura Santos et.al. 2402.08006v1 null
2024-02-12 GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance Shiyu Li et.al. 2402.07677v1 link
2024-02-12 UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments Ahmed Radwan et.al. 2402.07537v1 null
2024-02-09 Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation Peter Hönig et.al. 2402.06436v1 null
2024-02-08 Real-time Holistic Robot Pose Estimation with Unknown States Shikun Ban et.al. 2402.05655v1 link
2024-02-08 Extending 6D Object Pose Estimators for Stereo Vision Thomas Pöllabauer et.al. 2402.05610v1 null
2024-02-09 NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction Zhongqun Zhang et.al. 2402.05532v2 null
2024-02-07 Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training Thomas Pöllabauer et.al. 2402.04979v1 null
2024-02-07 4-Dimensional deformation part model for pose estimation using Kalman filter constraints Enrique Martinez-Berti et.al. 2402.04953v1 null
2024-02-07 STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation Peter Hönig et.al. 2402.04878v1 link
2024-02-05 A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model Murad Hasan et.al. 2402.03417v1 null
2024-02-05 SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM Mingrui Li et.al. 2402.03246v1 link
2024-02-05 Extreme Two-View Geometry From Object Poses with Diffusion Models Yujing Sun et.al. 2402.02800v1 link
2024-02-04 Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation Ti Wang et.al. 2402.02339v1 null
2024-02-01 mmID: High-Resolution mmWave Imaging for Human Identification Sakila S. Jayaweera et.al. 2402.00996v1 null
2024-02-01 In-Bed Pose Estimation: A Review Ziya Ata Yazıcı et.al. 2402.00700v1 null
2024-02-01 WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness Mateus Valverde Gasparino et.al. 2402.00683v1 link
2024-02-02 CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration Daniele Cattaneo et.al. 2402.00129v2 null
2024-01-31 Improved Scene Landmark Detection for Camera Localization Tien Do et.al. 2401.18083v1 link
2024-01-30 Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics Nastaran Darabi et.al. 2401.17481v1 null
2024-01-30 MESA: Matching Everything by Segmenting Anything Yesheng Zhang et.al. 2401.16741v1 null
2024-01-30 Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers Jianbin Jiao et.al. 2401.16700v1 link
2024-01-29 Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation Jaewoo Park et.al. 2401.16284v1 null
2024-01-29 Reconstructing Close Human Interactions from Multiple Views Qing Shuai et.al. 2401.16173v1 link
2024-01-28 Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras Yu-Jhe Li et.al. 2401.15616v1 null
2024-01-30 Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization Kihoon Shin et.al. 2401.15313v2 null
2024-01-26 Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones Beatrice Alessandra Motetti et.al. 2401.15236v1 null
2024-01-26 SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras Hanz Cuevas-Velasquez et.al. 2401.14785v1 null
2024-01-24 Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter Dongmyoung Lee et.al. 2401.13405v1 null
2024-01-24 Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry Qi Cai et.al. 2401.13357v1 null
2024-01-23 SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization Mingyang Li et.al. 2401.13076v1 link
2024-01-24 RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos Hongchi Xia et.al. 2401.12592v2 null
2024-01-26 MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR Changkun Liu et.al. 2401.11511v2 null
2024-01-19 SCENES: Subpixel Correspondence Estimation With Epipolar Supervision Dominik A. Kloepfer et.al. 2401.10886v1 null
2024-01-19 Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation Prakhar Kaushik et.al. 2401.10848v1 null
2024-01-22 TEXterity: Tactile Extrinsic deXterity Antonia Bronars et.al. 2401.10230v2 null
2024-01-18 Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework Junkun Jiang et.al. 2401.09836v1 link
2024-01-17 DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing Hao Qu et.al. 2401.09160v1 null
2024-01-17 PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency Yue Pan et.al. 2401.09101v1 link
2024-01-16 AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization Qi Liao et.al. 2401.08360v1 null
2024-01-16 S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera Thanh Nguyen Canh et.al. 2401.08134v1 null
2024-01-15 Collaboratively Self-supervised Video Representation Learning for Action Recognition Jie Zhang et.al. 2401.07584v1 null
2024-01-14 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework Fan Zhang et.al. 2401.07251v1 null
2024-01-11 On the representation and methodology for wide and short range head pose estimation Alejandro Cobo et.al. 2401.05807v1 link
2024-01-10 Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects Tianhang Cheng et.al. 2401.05236v1 link
2024-01-10 Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits Helena Russello et.al. 2401.05202v1 null
2024-01-10 Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton Hongbo Kang et.al. 2401.04921v1 link
2024-01-15 Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker Jingtao Sun et.al. 2401.04377v2 link
2024-01-07 RHOBIN Challenge: Reconstruction of Human Object Interaction Xianghui Xie et.al. 2401.04143v1 null
2024-01-08 D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement Danqi Yan et.al. 2401.03914v1 null
2024-01-07 Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems Victor Adewopo et.al. 2401.03587v1 null
2024-01-04 Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications Darshan Venkatrayappa et.al. 2401.02383v1 null
2024-01-04 Fit-NGP: Fitting Object Models to Neural Graphics Primitives Marwan Taher et.al. 2401.02357v1 null
2024-01-04 PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation Lukas Meyer et.al. 2401.02281v1 link
2024-01-03 Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique Ekram Alam et.al. 2401.01587v1 link
2024-01-05 PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization Jiaming He et.al. 2401.01081v2 link
2023-12-30 3D Human Pose Perception from Egocentric Stereo Videos Hiroyasu Akada et.al. 2401.00889v1 null
2024-01-01 Geometry Depth Consistency in RGBD Relative Pose Estimation Sourav Kumar et.al. 2401.00639v1 null
2023-12-30 A comprehensive framework for occluded human pose estimation Linhao Xu et.al. 2401.00155v1 null
2024-01-02 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation Li Xu et.al. 2401.00029v2 null
2023-12-29 MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments Andrew Fishberg et.al. 2312.17731v1 link
2023-12-28 iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views Chin-Hsuan Wu et.al. 2312.17250v1 link
2023-12-28 EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion Jianping Jiang et.al. 2312.16933v1 null
2023-12-28 SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction Zikang Yuan et.al. 2312.16800v1 link
2023-12-28 L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry Feiya Li et.al. 2312.16787v1 null
2023-12-27 HMP: Hand Motion Priors for Pose and Shape Estimation from Video Enes Duran et.al. 2312.16737v1 null
2023-12-27 Camera calibration for the surround-view system: a benchmark and dataset L Qin et.al. 2312.16499v1 null
2023-12-24 TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions Rohit Lal et.al. 2312.16221v1 link
2023-12-26 Graph Context Transformation Learning for Progressive Correspondence Pruning Junwen Guo et.al. 2312.15971v1 link
2023-12-25 Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation Feng Zhou et.al. 2312.15636v1 null
2023-12-25 APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond Yuxiang Yang et.al. 2312.15612v1 link
2023-12-23 PACE: Pose Annotations in Cluttered Environments Yang You et.al. 2312.15130v1 link
2023-12-22 PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF Mohsen Gholami et.al. 2312.14915v1 link
2023-12-22 Harnessing Diffusion Models for Visual Perception with Meta Prompts Qiang Wan et.al. 2312.14733v1 link
2023-12-22 Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization Joaquin Rodriguez et.al. 2312.14697v1 link
2023-12-22 PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer Neha Sengar et.al. 2312.14577v1 null
2023-12-22 Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning Jay Shenoy et.al. 2312.14432v1 null
2023-12-21 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera Christen Millerdurai et.al. 2312.14157v1 null
2023-12-21 DUSt3R: Geometric 3D Vision Made Easy Shuzhe Wang et.al. 2312.14132v1 link
2023-12-20 NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields Jens Naumann et.al. 2312.13471v1 null
2023-12-20 Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach Habib Boloorchi Tabrizi et.al. 2312.13162v1 link
2023-12-18 Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics Yesukhei Jagvaral et.al. 2312.11707v1 null
2023-12-18 Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface Vicu-Mihalis Maer et.al. 2312.11401v1 null
2023-12-17 SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation Xiaoqi An et.al. 2312.10758v1 link
2023-12-17 PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields Boming Zhao et.al. 2312.10649v1 null
2023-12-15 SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation David C. Jeong et.al. 2312.10195v1 link
2023-12-14 iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching Yuan Sun et.al. 2312.09031v1 null
2023-12-14 Scene 3-D Reconstruction System in Scattering Medium Zhuoyifan Zhang et.al. 2312.09005v1 null
2023-12-14 CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming Kian Eng Ong et.al. 2312.08764v1 link
2023-12-20 PnP for Two-Dimensional Pose Estimation Joshua Wang et.al. 2312.08488v2 link
2023-12-13 Pose and shear-based tactile servoing John Lloyd et.al. 2312.08411v1 null
2023-12-13 FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects Bowen Wen et.al. 2312.08344v1 link
2023-12-13 Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation Arul Selvam Periyasamy et.al. 2312.08268v1 null
2023-12-13 CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation Eugenio Chisari et.al. 2312.08240v1 null
2023-12-13 C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation Florian Fervers et.al. 2312.08060v1 null
2023-12-13 Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation Jingwei Yang et.al. 2312.07964v1 null
2023-12-13 Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users Tianxun Zhou et.al. 2312.07854v1 null
2023-12-12 RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation Peng Lu et.al. 2312.07526v1 link
2023-12-12 COLMAP-Free 3D Gaussian Splatting Yang Fu et.al. 2312.07504v1 null
2023-12-12 RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments Pavel Petracek et.al. 2312.07337v1 link
2023-12-12 Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs Sunghwan Hong et.al. 2312.07246v1 link
2023-12-12 Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation Yuchen Yang et.al. 2312.07051v1 link
2023-12-12 Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation Nikhil Kashyap et.al. 2312.06965v1 null
2023-12-12 Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice Soham Bafana et.al. 2312.06914v1 link
2023-12-11 Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach Travis Driver et.al. 2312.06865v1 link
2023-12-11 Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input Trung-Hieu Hoang et.al. 2312.06797v1 null
2023-12-11 3D Hand Pose Estimation in Egocentric Images in the Wild Aditya Prakash et.al. 2312.06583v1 null
2023-12-11 PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation Zhiyu Pan et.al. 2312.06409v1 null
2023-12-11 ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation Cédric Rommel et.al. 2312.06386v1 link
2023-12-10 From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation Javier Tirado-Garín et.al. 2312.05995v1 link
2023-12-09 You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception Sheng Jin et.al. 2312.05525v1 link
2023-12-07 Image and AIS Data Fusion Technique for Maritime Computer Vision Applications Emre Gülsoylu et.al. 2312.05270v1 link
2023-12-07 Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection Kohei Yamashita et.al. 2312.04527v1 null
2023-12-07 Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images Yiqun Zhang et.al. 2312.04236v1 null
2023-12-06 Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning Xinshun Wang et.al. 2312.03703v1 link
2023-12-06 Cooperative Probabilistic Trajectory Forecasting under Occlusion Anshul Nayak et.al. 2312.03296v1 null
2023-12-05 A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis Niccolò Bisagno et.al. 2312.02613v1 null
2023-12-05 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation K. Samarawickrama et.al. 2312.02593v1 link
2023-12-05 PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation Geonhyup Lee et.al. 2312.02531v1 null
2023-12-04 GenEM: Physics-Informed Generative Cryo-Electron Microscopy Jiakai Zhang et.al. 2312.02235v1 null
2023-12-02 Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors Yu Zhang et.al. 2312.02196v1 link
2023-12-04 iMatching: Imperative Correspondence Learning Zitong Zhan et.al. 2312.02141v1 link
2023-12-04 SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM Nikhil Keetha et.al. 2312.02126v1 link
2023-12-04 Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection Xubin Zhong et.al. 2312.01713v1 null
2023-12-05 Hulk: A Universal Knowledge Translator for Human-Centric Tasks Yizhou Wang et.al. 2312.01697v2 link
2023-12-04 Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks Yan Xu et.al. 2312.01561v1 null
2023-12-01 Object 6D pose estimation meets zero-shot learning Andrea Caraffa et.al. 2312.00947v1 null
2023-12-01 Open-vocabulary object 6D pose estimation Jaime Corsetti et.al. 2312.00690v1 null
2023-12-01 Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras Mohammad Altillawi et.al. 2312.00500v1 null
2023-12-01 Learning Unorthogonalized Matrices for Rotation Estimation Kerui Gu et.al. 2312.00462v1 null
2023-11-30 PoseGPT: Chatting about 3D Human Pose Yao Feng et.al. 2311.18836v1 null
2023-11-30 FoundPose: Unseen Object Pose Estimation with Foundation Features Evin Pınar Örnek et.al. 2311.18809v1 null
2023-11-30 Pose Estimation and Tracking for ASIST Ari Goodman et.al. 2311.18665v1 null
2023-11-29 A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem Wolfgang Hoegele et.al. 2311.18107v1 null
2023-11-29 Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation Or Hirschorn et.al. 2311.17891v1 link
2023-11-29 Cinematic Behavior Transfer via NeRF-based Differentiable Filming Xuekun Jiang et.al. 2311.17754v1 null
2023-11-29 PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens Sebastian Stapf et.al. 2311.17504v1 null
2023-11-28 On the Calibration of Human Pose Estimation Kerui Gu et.al. 2311.17105v1 null
2023-11-28 Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence Junyi Zhang et.al. 2311.17034v1 link
2023-11-28 HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors Shutong Zhang et.al. 2311.16552v1 null
2023-11-28 Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement Jian Wang et.al. 2311.16495v1 null
2023-11-24 UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning Zhongyu Jiang et.al. 2311.16477v1 null
2023-11-27 DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization Zhaoyang Xia et.al. 2311.16060v1 link
2023-11-27 Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid Yukai Tang et.al. 2311.15962v1 null
2023-11-27 Computer Vision for Carriers: PATRIOT Ari Goodman et.al. 2311.15914v1 null
2023-11-27 SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation Jiehong Lin et.al. 2311.15707v1 link
2023-11-24 RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling Xiaoyue Wan et.al. 2311.14242v1 null
2023-11-23 Appearance-based gaze estimation enhanced with synthetic images using deep neural networks Dmytro Herashchenko et.al. 2311.14175v1 link
2023-11-23 GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence Van Nguyen Nguyen et.al. 2311.14155v1 link
2023-11-23 GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence Pengyuan Wang et.al. 2311.13777v1 null
2023-11-22 HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation Chengpeng Wu et.al. 2311.13615v1 link
2023-11-24 Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor Di Wang et.al. 2311.12778v2 null
2023-11-21 HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation Yongliang Lin et.al. 2311.12588v1 link
2023-11-21 CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems Young-Hee Lee et.al. 2311.12580v1 null
2023-11-21 HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling Afshin Bozorgpour et.al. 2311.12486v1 link
2023-11-21 Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency Christian Keilstrup Ingwersen et.al. 2311.12421v1 null
2023-11-20 Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models Pooya Fayyazsanavi et.al. 2311.12128v1 link
2023-11-20 Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation Wenhao Li et.al. 2311.12028v1 link
2023-11-20 SniffyArt: The Dataset of Smelling Persons Mathias Zinnen et.al. 2311.11888v1 null
2023-11-21 Robot Hand-Eye Calibration using Structure-from-Motion Nicolas Andreff et.al. 2311.11808v2 null
2023-11-18 SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation Yamei Chen et.al. 2311.11125v1 link
2023-11-18 Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment Parth Rawal et.al. 2311.11039v1 null
2023-11-18 Multiple View Geometry Transformers for 3D Human Pose Estimation Ziwei Liao et.al. 2311.10983v1 link
2023-11-18 Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process Zixun Huang et.al. 2311.10918v1 null
2023-11-17 BiHRNet: A Binary high-resolution network for Human Pose Estimation Zhicheng Zhang et.al. 2311.10296v1 null
2023-11-16 Match and Locate: low-frequency monocular odometry based on deep feature matching Stepan Konev et.al. 2311.10034v1 null
2023-11-16 LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters Yibin Wu et.al. 2311.09887v1 link
2023-11-16 Improved TokenPose with Sparsity Anning Li et.al. 2311.09653v1 null
2023-11-16 Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation Yangzheng Wu et.al. 2311.09500v1 null
2023-11-15 NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios En-Te Lin et.al. 2311.09269v1 link
2023-11-15 Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation Abhishek Goudar et.al. 2311.09056v1 link
2023-11-14 LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping Sujal Vijayaraghavan et.al. 2311.08438v1 null
2023-11-13 SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Ziyi Lin et.al. 2311.07575v1 link
2023-11-13 Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers Luca Lach et.al. 2311.07257v1 link
2023-11-10 CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM Ruben Sanchez-Garcia et.al. 2311.06194v1 link
2023-11-10 2D Image head pose estimation via latent space regression under occlusion settings José Celestino et.al. 2311.06038v1 link
2023-11-10 Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous Ziwei Wang et.al. 2311.05992v1 null
2023-11-10 A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries Stefan Zellmann et.al. 2311.05887v1 link
2023-11-09 Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking Mederic Fourmy et.al. 2311.05344v1 null
2023-11-09 Spatial Attention-based Distribution Integration Network for Human Pose Estimation Sihan Gao et.al. 2311.05323v1 null
2023-11-09 SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing Arunkumar Rathinam et.al. 2311.05310v1 null
2023-11-09 Differentiable Cloth Parameter Identification and State Estimation in Manipulation Dongzhe Zheng et.al. 2311.05141v1 null
2023-11-09 POISE: Pose Guided Human Silhouette Extraction under Occlusions Arindam Dutta et.al. 2311.05077v1 link
2023-11-08 Active Transfer Learning for Efficient Video-Specific Human Pose Estimation Hiromu Taketsugu et.al. 2311.05041v1 link
2023-11-08 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud Jianchao Ci et.al. 2311.04699v1 null
2023-11-09 Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations Xiaoting Yin et.al. 2311.04591v2 link
2023-11-08 Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images Nishant Jain et.al. 2311.04521v1 null
2023-11-08 PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points Tong Hua et.al. 2311.04477v1 null
2023-11-08 UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields Injae Kim et.al. 2311.03784v2 link
2023-11-06 A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation Qitao Zhao et.al. 2311.03312v1 null
2023-11-06 Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction Silvia Romero-Azpitarte et.al. 2311.03146v1 null
2023-11-06 Simultaneous Time Synchronization and Mutual Localization for Multi-robot System Xiangyong Wen et.al. 2311.02948v1 null
2023-11-06 Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation Xueyan Oh et.al. 2311.02900v1 null
2023-11-06 Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning Nobline Yoo et.al. 2311.02815v1 link
2023-11-03 Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression Jiaqi Wu et.al. 2311.01782v1 link
2023-11-03 Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation Jiaqi Wu et.al. 2311.01770v1 null
2023-11-02 Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors Gabriele M. Caddeo et.al. 2311.01380v1 link
2023-11-01 A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios Wenyang Hu et.al. 2311.00401v1 null
2023-10-31 HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception Junkun Yuan et.al. 2310.20695v1 link
2023-10-31 Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior Qingqing Zhao et.al. 2310.20249v1 null
2023-10-30 FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound Chaoyu Chen et.al. 2310.19293v1 null
2023-10-29 Distributed Nonlinear Filtering using Triangular Transport Maps Daniel Grange et.al. 2310.19000v1 null
2023-10-29 TIC-TAC: A Framework To Learn And Evaluate Your Covariance Megh Shukla et.al. 2310.18953v1 link
2023-10-29 Improving Multi-Person Pose Tracking with A Confidence Network Zehua Fu et.al. 2310.18920v1 null
2023-10-29 HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration Weiyi Xue et.al. 2310.18874v1 null
2023-10-28 Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process Xiao Hu et.al. 2310.18569v1 null
2023-10-27 ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation Michael Zechmair et.al. 2310.18009v1 null
2023-10-26 Learning Extrinsic Dexterity with Parameterized Manipulation Primitives Shih-Min Yang et.al. 2310.17785v1 null
2023-10-26 6-DoF Stability Field via Diffusion Models Takuma Yoneda et.al. 2310.17649v1 null
2023-10-26 SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation Haobo Jiang et.al. 2310.17359v1 null
2023-10-26 Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs Ryota Tanaka et.al. 2310.17193v1 link
2023-10-25 Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers Gerald Ebmer et.al. 2310.16618v1 null
2023-10-25 ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors Xiaoxuan Ma et.al. 2310.16447v1 link
2023-10-25 MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network Soroush Mehraban et.al. 2310.16288v1 link
2023-10-25 TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer Xiao Lin et.al. 2310.16279v1 null
2023-10-23 Converting Depth Images and Point Clouds for Feature-based Pose Estimation Robert Lösch et.al. 2310.14924v1 link
2023-10-23 Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings Hazem Youssef et.al. 2310.14914v1 null
2023-10-23 Player Re-Identification Using Body Part Appearences Mahesh Bhosale et.al. 2310.14469v1 null
2023-10-20 LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly Bowen Fu et.al. 2310.13819v1 null
2023-10-20 FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer Xinyu Zhang et.al. 2310.13605v1 null
2023-10-20 ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation Zhehan Li et.al. 2310.13324v1 link
2023-10-20 CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants Shaoan Wang et.al. 2310.13320v1 link
2023-10-19 Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey Lijuan Zhou et.al. 2310.13039v1 null
2023-10-19 FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects Mayank Lunayach et.al. 2310.12974v1 link
2023-10-18 Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation Bosang Kim et.al. 2310.12189v1 null
2023-10-18 One-Shot Imitation Learning: A Pose Estimation Perspective Pietro Vitiello et.al. 2310.12077v1 null
2023-10-18 ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map Ahmed Tawfik Aboukhadra et.al. 2310.11811v1 null
2023-10-17 Holistic Parking Slot Detection with Polygon-Shaped Representations Lihao Wang et.al. 2310.11629v1 null
2023-10-17 Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication Chelsey Edge et.al. 2310.11536v1 null
2023-10-18 AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths Jiaxin Wei et.al. 2310.09982v2 link
2023-10-15 Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior Xiaotong Chen et.al. 2310.09956v1 null
2023-10-15 Socially reactive navigation models for mobile robots in dynamic environments Ricarte Ribeiro et.al. 2310.09916v1 link
2023-10-15 MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection David C. Jeong et.al. 2310.09757v1 link
2023-10-16 IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints Mohammed Ayman Shalaby et.al. 2310.08686v2 null
2023-10-12 Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor Ozdemir Can Kara et.al. 2310.08398v1 null
2023-10-12 Multimodal Active Measurement for Human Mesh Recovery in Close Proximity Takahiro Maeda et.al. 2310.08116v1 link
2023-10-12 X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention Yixuan Zhou et.al. 2310.08042v1 link
2023-10-12 PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction Jia-Wang Bian et.al. 2310.07449v2 link
2023-10-11 SAGE-ICP: Semantic Information-Assisted ICP Jiaming Cui et.al. 2310.07237v1 link
2023-10-11 DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation Rong Wang et.al. 2310.07206v1 link
2023-10-12 FABind: Fast and Accurate Protein-Ligand Binding Qizhi Pei et.al. 2310.06763v2 link
2023-10-10 EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation Baichuan Huang et.al. 2310.06751v1 null
2023-10-09 Augmenting Vision-Based Human Pose Estimation with Rotation Matrix Milad Vazan et.al. 2310.06068v1 null
2023-10-07 Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles Elton F. de S. Soares et.al. 2310.04837v1 null
2023-10-10 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction Zhishan Zhou et.al. 2310.04769v2 null
2023-10-06 SwimXYZ: A large-scale dataset of synthetic swimming motions and videos Fiche Guénolé et.al. 2310.04360v1 null
2023-10-05 BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields Ágoston István Csehi et.al. 2310.03563v1 null
2023-10-05 3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation Chen Zhao et.al. 2310.03534v1 null
2023-10-05 RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation Boshi An et.al. 2310.03478v1 null
2023-10-05 Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code Hongwei Li et.al. 2310.03470v1 null
2023-10-04 Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC Hongyi Fan et.al. 2310.02719v1 null
2023-10-05 USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields Moyang Li et.al. 2310.02687v2 link
2023-10-03 Beyond the Benchmark: Detecting Diverse Anomalies in Videos Yoav Arad et.al. 2310.01904v1 link
2023-10-03 MFOS: Model-Free & One-Shot Object Pose Estimation JongMin Lee et.al. 2310.01897v1 null
2023-10-02 LEAP: Liberate Sparse-view 3D Modeling from Camera Poses Hanwen Jiang et.al. 2310.01410v1 link
2023-10-02 H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation Yanjie Ze et.al. 2310.01404v1 link
2023-10-04 Self-supervised Learning of Contextualized Local Visual Embeddings Thalles Santos Silva et.al. 2310.00527v3 link
2023-09-30 Diff-DOPE: Differentiable Deep Object Pose Estimation Jonathan Tremblay et.al. 2310.00463v1 null
2023-09-29 Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration Jungseok Hong et.al. 2310.00146v1 null
2023-09-29 Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation Zhuoran Yu et.al. 2310.00099v1 null
2023-09-29 Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head Qian Wu et.al. 2309.17143v1 link
2023-09-29 AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi Yunjiao Zhou et.al. 2309.16964v1 null
2023-09-28 End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon Guillaume Bono et.al. 2309.16634v1 null
2023-09-28 Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task Frederik Hagelskjær et.al. 2309.16221v1 null
2023-09-28 Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing Lu Dai et.al. 2309.16189v1 null
2023-09-28 Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation Sameer Pai et.al. 2309.16170v1 null
2023-09-28 CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting Shaoxiang Guo et.al. 2309.16140v1 null
2023-09-28 A Modular Bio-inspired Robotic Hand with High Sensitivity Chao Liu et.al. 2309.16081v1 null
2023-09-27 Handbook on Leveraging Lines for Two-View Relative Pose Estimation Petr Hruby et.al. 2309.16040v1 null
2023-09-27 Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature Shengze Jin et.al. 2309.16023v1 null
2023-09-27 Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range Xinran Li et.al. 2309.15367v1 null
2023-09-26 Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone Peter Hardy et.al. 2309.14865v1 null
2023-09-26 Learning Vision-Based Bipedal Locomotion for Challenging Terrain Helei Duan et.al. 2309.14594v1 null
2023-09-25 Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators Yinan Meng et.al. 2309.14279v1 null
2023-09-25 Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics Philipp Quentin et.al. 2309.14265v1 null
2023-09-25 BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation Uyoung Jeong et.al. 2309.14072v1 link
2023-09-24 Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset Zixun Huang et.al. 2309.13570v1 link
2023-09-21 ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding Yu Cheng et.al. 2309.12183v1 null
2023-09-21 ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers Philipp Ausserlechner et.al. 2309.11986v1 null
2023-09-21 Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views Taeho Kang et.al. 2309.11962v1 link
2023-09-21 A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose Qingtian Wu et.al. 2309.11773v1 null
2023-09-20 Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation Krishna Kanth Nakka et.al. 2309.11667v1 null
2023-09-20 Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering Tae Ha Park et.al. 2309.11645v1 null
2023-09-20 OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving Heng Li et.al. 2309.11011v1 link
2023-09-19 Language-Conditioned Affordance-Pose Detection in 3D Point Clouds Toan Nguyen et.al. 2309.10911v1 null
2023-09-19 MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings Surbhi Madan et.al. 2309.10765v1 link
2023-09-19 SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction Anilkumar Swamy et.al. 2309.10748v1 null
2023-09-20 GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild Simon Schaefer et.al. 2309.10369v2 null
2023-09-19 RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery Jiaxin Wei et.al. 2309.10255v1 link
2023-09-18 Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation Kathia Melbouci et.al. 2309.09934v1 null
2023-09-18 Application-driven Validation of Posteriors in Inverse Problems Tim J. Adler et.al. 2309.09764v1 null
2023-09-18 RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy Mert Asim Karaoglu et.al. 2309.09563v1 null
2023-09-18 Sparse and Privacy-enhanced Representation for Human Pose Estimation Ting-Ying Lin et.al. 2309.09515v1 null
2023-09-19 RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation Lijun Li et.al. 2309.09301v2 link
2023-09-16 Optimal Initialization Strategies for Range-Only Trajectory Estimation Abhishek Goudar et.al. 2309.09011v1 null
2023-09-16 DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF Mert Asim Karaoglu et.al. 2309.08927v1 link
2023-09-16 Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning Pengyu Yin et.al. 2309.08914v1 link
2023-09-15 Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild Sungchan Park et.al. 2309.08644v1 null
2023-09-15 YCB-Ev: Event-vision dataset for 6DoF object pose estimation Pavel Rojtberg et.al. 2309.08482v1 link
2023-09-15 Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM Chenghao Shi et.al. 2309.08086v1 null
2023-09-14 Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success Gergely Sóti et.al. 2309.08040v1 null
2023-09-14 TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting Rohan Choudhury et.al. 2309.07910v1 null
2023-09-14 Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation Thorsten Hempel et.al. 2309.07654v1 link
2023-09-14 EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization Minjung Kim et.al. 2309.07471v1 link
2023-09-14 Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images Junyang Wu et.al. 2309.07390v1 null
2023-09-13 LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation Peter Hardy et.al. 2309.07243v1 null
2023-09-13 3D Active Metric-Semantic SLAM Yuezhan Tao et.al. 2309.06950v1 null
2023-09-11 ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion Hongyu Li et.al. 2309.05662v1 null
2023-09-11 Towards Intuitive HMI for UAV Control Filip Zoric et.al. 2309.05460v1 null
2023-09-12 FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild Jiong Wang et.al. 2309.05073v2 link
2023-09-09 Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation Boyuan Jiang et.al. 2309.04756v1 link
2023-09-09 Mirror-Aware Neural Humans Daniel Ajisafe et.al. 2309.04750v1 link
2023-09-08 Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry Akankshya Kar et.al. 2309.04147v1 null
2023-09-07 ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation Hui Zhang et.al. 2309.03891v1 null
2023-09-05 An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria Upender Kalwa et.al. 2309.03235v1 null
2023-09-05 A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments W. Jacob Wagner et.al. 2309.02569v1 null
2023-09-05 GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction Youmin Zhang et.al. 2309.02436v1 link
2023-09-05 DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation Lei Zhou et.al. 2309.01925v1 link
2023-09-04 On the Query Strategies for Efficient Online Active Distillation Michele Boldo et.al. 2309.01612v1 null
2023-09-04 DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion Cédric Rommel et.al. 2309.01575v1 null
2023-09-06 Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation Hanbing Liu et.al. 2309.01365v2 link
2023-09-04 SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras Himanshu Pahadia et.al. 2309.01324v1 null
2023-09-03 BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking Dorian F. Henning et.al. 2309.01236v1 null
2023-09-02 Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis Jerrin Bright et.al. 2309.01010v1 null
2023-09-01 Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture Shaohua Pan et.al. 2309.00310v1 link
2023-08-31 EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild Manuel Kaufmann et.al. 2308.16894v1 link
2023-08-31 SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects Ning Gao et.al. 2308.16528v1 null
2023-08-30 Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports İrem Üstek et.al. 2308.16325v1 link
2023-08-30 SignDiff: Learning Diffusion Models for American Sign Language Production Sen Fang et.al. 2308.16082v1 null
2023-08-30 Learning Structure-from-Motion with Graph Attention Networks Lucas Brynte et.al. 2308.15984v1 link
2023-08-30 Reconstructing Groups of People with Hypergraph Relational Reasoning Buzhen Huang et.al. 2308.15844v1 link
2023-08-29 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking Urs Waldmann et.al. 2308.15316v1 link
2023-08-29 Spatio-temporal MLP-graph network for 3D human pose estimation Tanvir Hassan et.al. 2308.15313v1 link
2023-08-29 Pose-Free Neural Radiance Fields via Implicit Pose Regularization Jiahui Zhang et.al. 2308.15049v1 null
2023-08-28 R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras Aron Schmied et.al. 2308.14713v1 null
2023-08-28 Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease Gabriela T. Acevedo Trebbau et.al. 2308.14679v1 null
2023-08-28 Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera Jun Yang et.al. 2308.14665v1 null
2023-08-28 CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment Pengcheng Dong et.al. 2308.14324v1 null
2023-08-27 LDL: Line Distance Functions for Panoramic Localization Junho Kim et.al. 2308.13989v1 link
2023-08-26 Prior-guided Source-free Domain Adaptation for Human Pose Estimation Dripta S. Raychaudhuri et.al. 2308.13954v1 null
2023-08-26 Vision-Based Human Pose Estimation via Deep Learning: A Survey Gongjin Lan et.al. 2308.13872v1 null
2023-08-24 POCO: 3D Pose and Shape Estimation with Confidence Sai Kumar Dwivedi et.al. 2308.12965v1 link
2023-08-24 Robot Pose Nowcasting: Forecast the Future to Improve the Present Alessandro Simoni et.al. 2308.12914v1 null
2023-08-23 Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map Timothy D Barfoot et.al. 2308.12418v1 null
2023-08-22 Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape Jiacong Xu et.al. 2308.11737v1 null
2023-08-22 TrackFlow: Multi-Object Tracking with Normalizing Flows Gianluca Mancusi et.al. 2308.11513v1 null
2023-08-22 A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles Yusheng Wang et.al. 2308.11492v1 null
2023-08-22 PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation Soubarna Banik et.al. 2308.11440v1 null
2023-08-22 Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views Wentian Qu et.al. 2308.11198v1 null
2023-08-21 Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images Tze Ho Elden Tse et.al. 2308.11015v1 null
2023-08-21 Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data Patrick Ruhkamp et.al. 2308.10627v1 null
2023-08-21 GaitPT: Skeletons Are All You Need For Gait Recognition Andy Catruna et.al. 2308.10623v1 null
2023-08-21 Approximately Equivariant Graph Networks Ningyuan Huang et.al. 2308.10436v1 link
2023-08-21 In-Rack Test Tube Pose Estimation Using RGB-D Data Hao Chen et.al. 2308.10411v1 null
2023-08-20 Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video Yingxuan You et.al. 2308.10305v1 link
2023-08-20 OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision Shujie Zhang et.al. 2308.10146v1 link
2023-08-19 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation Yi Zhang et.al. 2308.10123v1 link
2023-08-19 Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation Yang Hai et.al. 2308.10016v1 link
2023-08-19 UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning Meiqi Sun et.al. 2308.09953v1 null
2023-08-22 Scene-Aware Feature Matching Xiaoyong Lu et.al. 2308.09949v2 null
2023-08-18 PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation Hanbing Liu et.al. 2308.09678v1 link
2023-08-18 Improving 3D Pose Estimation for Sign Language Maksym Ivashechkin et.al. 2308.09525v1 null
2023-08-18 Denoising Diffusion for 3D Hand Pose Estimation from Images Maksym Ivashechkin et.al. 2308.09523v1 null
2023-08-18 ResQ: Residual Quantization for Video Perception Davide Abati et.al. 2308.09511v1 null
2023-08-17 MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices Dongyang Yu et.al. 2308.09084v1 null
2023-08-17 Pedestrian Environment Model for Automated Driving Adrian Holzbock et.al. 2308.09080v1 link
2023-08-17 Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction Yuhao Yang et.al. 2308.08518v2 null
2023-08-16 View Consistent Purification for Accurate Cross-View Localization Shan Wang et.al. 2308.08110v1 null
2023-08-15 Learning Better Keypoints for Multi-Object 6DoF Pose Estimation Yangzheng Wu et.al. 2308.07827v1 link
2023-08-14 Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation Huan Liu et.al. 2308.07313v1 link
2023-08-12 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion Guirong Zhuo et.al. 2308.06573v1 null
2023-08-17 EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes Jiaxi Jiang et.al. 2308.06493v2 null
2023-08-11 Aggressive Aerial Grasping using a Soft Drone with Onboard Perception Samuel Ubellacker et.al. 2308.06351v1 null
2023-08-11 VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields Dominic Maggio et.al. 2308.05939v1 null
2023-08-10 Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations Frederike Dümbgen et.al. 2308.05783v1 link
2023-08-10 KS-APR: Keyframe Selection for Robust Absolute Pose Regression Changkun Liu et.al. 2308.05459v1 null
2023-08-10 How-to Augmented Lagrangian on Factor Graphs Barbara Bazzana et.al. 2308.05444v1 null
2023-08-10 Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation Jun Zhou et.al. 2308.05438v1 link
2023-08-10 Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR Changkun Liu et.al. 2308.05394v1 null
2023-08-10 Double-chain Constraints for 3D Human Pose Estimation in Images and Videos Hongbo Kang et.al. 2308.05298v1 link
2023-08-09 ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction Weijie Chen et.al. 2308.04956v1 null
2023-08-07 SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention Efimia Panagiotaki et.al. 2308.03718v1 link
2023-08-07 A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior Jose Sosa et.al. 2308.03411v1 null
2023-08-06 Source-free Domain Adaptive Human Pose Estimation Qucheng Peng et.al. 2308.03202v1 link
2023-08-04 Diffusion-Augmented Depth Prediction with Sparse Annotations Jiaqi Li et.al. 2308.02283v1 null
2023-08-04 DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field Haowen Wang et.al. 2308.02239v1 null
2023-08-07 Robust Self-Supervised Extrinsic Self-Calibration Takayuki Kanai et.al. 2308.02153v2 null
2023-08-03 Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter Luca Crupi et.al. 2308.01833v1 null
2023-08-03 Active Acoustic Sensing for Robot Manipulation Shihan Lu et.al. 2308.01600v1 null
2023-08-02 HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions Andrew Guo et.al. 2308.01477v1 null
2023-08-06 Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes Bohao Fan et.al. 2308.00628v2 link
2023-08-01 Markerless human pose estimation for biomedical applications: a survey Andrea Avogaro et.al. 2308.00519v1 null
2023-08-01 Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches Pia Hanfeld et.al. 2308.00344v1 link
2023-08-01 Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis Asish Bera et.al. 2308.00323v1 null
2023-08-01 Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) Chaochao Zhou et.al. 2308.00214v1 null
2023-07-31 Lightweight Super-Resolution Head for Human Pose Estimation Haonan Wang et.al. 2307.16765v1 link
2023-07-31 DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation Runyang Feng et.al. 2307.16687v1 null
2023-07-30 Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction Prajval Kumar Murali et.al. 2307.16254v1 null
2023-07-30 Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems Cen Liu et.al. 2307.16117v1 link
2023-07-29 Iterative Graph Filtering Network for 3D Human Pose Estimation Zaedul Islam et.al. 2307.16074v1 link
2023-07-29 HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation Zuyan Liu et.al. 2307.16061v1 null
2023-07-29 Effective Whole-body Pose Estimation with Two-stages Distillation Zhendong Yang et.al. 2307.15880v1 link
2023-07-28 TrackAgent: 6D Object Tracking via Reinforcement Learning Konstantin Röhrl et.al. 2307.15671v1 null
2023-07-28 Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation Jaime Corsetti et.al. 2307.15514v1 link
2023-07-28 Robust Visual Sim-to-Real Transfer for Robotic Manipulation Ricardo Garcia et.al. 2307.15320v1 null
2023-07-27 Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving Peter Bauer et.al. 2307.14889v1 null
2023-07-26 Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control Yijiong Lin et.al. 2307.14510v1 null
2023-07-28 CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor Alexandros Filotheou et.al. 2307.14247v2 link
2023-07-26 Deep Robust Multi-Robot Re-localisation in Natural Environments Milad Ramezani et.al. 2307.13950v1 null
2023-07-25 Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior Jose Sosa et.al. 2307.13361v1 null
2023-07-23 TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation Huijie Zhang et.al. 2307.12400v1 null
2023-07-25 FDCT: Fast Depth Completion for Transparent Objects Tianan Li et.al. 2307.12274v2 link
2023-07-22 Challenges for Monocular 6D Object Pose Estimation in Robotics Stefan Thalhammer et.al. 2307.12172v1 null
2023-07-22 Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap Zhijian Qiao et.al. 2307.12116v1 link
2023-07-22 Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence Yang Tian et.al. 2307.12106v1 link
2023-07-26 LAMP: Leveraging Language Prompts for Multi-person Pose Estimation Shengnan Hu et.al. 2307.11934v2 link
2023-07-21 YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation Arul Selvam Periyasamy et.al. 2307.11550v1 null
2023-07-21 KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation Ivano Donadi et.al. 2307.11543v1 link
2023-07-21 Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots Mihir Kulkarni et.al. 2307.11522v1 null
2023-07-20 SimCol3D -- 3D Reconstruction during Colonoscopy Challenge Anita Rau et.al. 2307.11261v1 link
2023-07-20 MSQNet: Actor-agnostic Action Recognition with Multi-modal Query Anindya Mondal et.al. 2307.10763v1 link
2023-07-19 POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities Rui Wang et.al. 2307.10387v1 link
2023-07-18 ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting Hongwei Zheng et.al. 2307.09026v1 null
2023-07-17 Human Emergency Detection during Autonomous Hospital Transports Andreas Zachariae et.al. 2307.08359v1 link
2023-07-17 Self-supervised Monocular Depth Estimation: Let's Talk About The Weather Kieran Saunders et.al. 2307.08357v1 null
2023-07-20 Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer Yujiao Shi et.al. 2307.08015v3 link
2023-07-15 Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents Ke Cao et.al. 2307.07763v1 null
2023-07-13 Haptic-guided assisted telemanipulation approach for grasping desired objects from heaps Maxime Adjigble et.al. 2307.07053v1 null
2023-07-13 Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data Miroslav Purkrábek et.al. 2307.06737v1 link
2023-07-12 Deep learning-based estimation of whole-body kinematics from multi-view images Kien X. Nguyen et.al. 2307.05896v1 link
2023-07-12 GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Bruce X. B. Yu et.al. 2307.05853v1 link
2023-07-09 TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement Mahmoud Abdulsalam et.al. 2307.05561v1 null
2023-07-11 ResMatch: Residual Attention Learning for Local Feature Matching Yuxin Deng et.al. 2307.05180v1 link
2023-07-07 Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation Jessica Yin et.al. 2307.03839v1 null
2023-07-07 Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation Zhongyu Jiang et.al. 2307.03833v1 link
2023-07-07 Equivariant Single View Pose Prediction Via Induced and Restricted Representations Owen Howell et.al. 2307.03704v1 null
2023-07-07 RCDN -- Robust X-Corner Detection Algorithm based on Advanced CNN Model Ben Chen et.al. 2307.03505v1 null
2023-07-06 Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning Christian Jauch et.al. 2307.03007v1 null
2023-07-06 Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive Eran Bamani et.al. 2307.02949v1 null
2023-07-06 A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition Orhan Konak et.al. 2307.02906v1 null
2023-07-04 Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones Elia Cereda et.al. 2307.01559v1 null
2023-07-03 Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach Dongyang Yu et.al. 2307.01004v1 null
2023-07-01 Automatic Solver Generator for Systems of Laurent Polynomial Equations Evgeniy Martyushev et.al. 2307.00320v1 link
2023-07-01 SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation Fabian Duffhauss et.al. 2307.00306v1 link
2023-06-30 GIRA: Gaussian Mixture Models for Inference and Robot Autonomy Kshitij Goel et.al. 2307.00071v1 link
2023-06-30 Towards the extraction of robust sign embeddings for low resource sign language recognition Mathieu De Coster et.al. 2306.17558v1 null
2023-06-30 Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle Václav Pritzl et.al. 2306.17544v1 link
2023-06-30 Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization Stephen Hausler et.al. 2306.17529v1 null
2023-06-29 ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models Weihao Cheng et.al. 2306.17140v1 null
2023-06-29 Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation Zhongwei Qiu et.al. 2306.17074v1 null
2023-06-28 Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects Alireza Rezazadeh et.al. 2306.15858v1 null
2023-06-09 Data-Link: High Fidelity Manufacturing Datasets for Model2Real Transfer under Industrial Settings Sunny Katyara et.al. 2306.05766v1 null
2023-05-28 Counter-Hypothetical Particle Filters for Single Object Pose Tracking Elizabeth A. Olson et.al. 2305.17828v1 null
2023-05-25 Enhanced 6D Pose Estimation for Robotic Fruit Picking Marco Costanzo et.al. 2305.15856v1 null
2023-05-22 You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example Walter Goodwin et.al. 2305.12626v1 null
2023-05-18 Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose Yichen Zhang et.al. 2305.10808v1 link
2023-05-08 RelPose++: Recovering 6D Poses from Sparse-view Observations Amy Lin et.al. 2305.04926v1 link
2023-04-17 Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation Elena Govi et.al. 2304.08230v1 link
2023-03-28 CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects Nick Heppert et.al. 2303.15782v1 link
2023-03-23 Prior-free Category-level Pose Estimation with Implicit Space Transformation Jianhui Liu et.al. 2303.13479v1 link
2023-06-21 6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics Maximilian Ulmer et.al. 2303.13241v3 null
2023-03-22 Rigidity-Aware Detection for 6D Object Pose Estimation Yang Hai et.al. 2303.12396v1 link
2023-03-22 Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation Heng Yang et.al. 2303.12246v1 link
2023-03-21 Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation Fulin Liu et.al. 2303.11516v1 link
2023-03-18 SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations Boyan Wan et.al. 2303.10346v1 null
2023-03-12 Module-Wise Network Quantization for 6D Object Pose Estimation Saqib Javed et.al. 2303.06753v1 link
2023-03-09 SpyroPose: Importance Sampling Pyramids for Object Pose Distribution Estimation in SE(3) Rasmus Laurvig Haugaard et.al. 2303.05308v1 null
2023-03-03 Depth-based 6DoF Object Pose Estimation using Swin Transformer Zhujun Li et.al. 2303.02133v1 link
2023-03-02 Canonical mapping as a general-purpose object descriptor for robotic manipulation Benjamin Joffe et.al. 2303.01331v1 null
2023-02-14 MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation Dingding Cai et.al. 2302.07300v1 null
2023-02-14 Model-Based Underwater 6D Pose Estimation from RGB Davide Sapienza et.al. 2302.06821v1 null
2023-02-02 A Projective Geometric View for 6D Pose Estimation in mmWave MIMO Systems Shengqiang Shen et.al. 2302.00227v2 null
2023-01-31 Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors Gabriele M. Caddeo et.al. 2301.13667v1 link
2023-01-19 Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain Chiara Di Vece et.al. 2301.08317v1 null
2023-01-19 RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation Leonard Bruns et.al. 2301.08147v1 link
2022-12-21 HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios HyunJun Jung et.al. 2212.10428v2 link
2022-12-13 MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare Yann Labbé et.al. 2212.06870v1 null
2022-12-11 Context-aware 6D Pose Estimation of Known Objects using RGB-D data Ankit Kumar et.al. 2212.05560v1 null
2023-01-30 Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation Wei Chen et.al. 2212.04632v2 null

(back to top)

Point Cloud Registration

Publish Date Title Authors PDF Code
2024-12-29 Towards Explaining Uncertainty Estimates in Point Cloud Registration Ziyuan Qin et.al. 2412.20612v1 null
2024-12-26 Resolving the Ambiguity of Complete-to-Partial Point Cloud Registration for Image-Guided Liver Surgery with Patches-to-Partial Matching Zixin Yang et.al. 2412.19328v1 null
2024-12-25 Cross-PCR: A Robust Cross-Source Point Cloud Registration Framework Guiyu Zhao et.al. 2412.18873v1 null
2024-12-23 PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging Mattias Paul Heinrich et.al. 2412.17390v1 null
2024-12-19 3D Registration in 30 Years: A Survey Jiaqi Yang et.al. 2412.13735v2 link
2024-12-13 TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes Yan Xia et.al. 2412.10308v1 null
2024-12-10 A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM Zongbo Liao et.al. 2412.07513v1 null
2024-12-07 AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration Jiong Lin et.al. 2412.05507v1 null
2024-12-06 GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration Yaojie Zhang et.al. 2412.04855v1 null
2024-12-04 AffordDP: Generalizable Diffusion Policy with Transferable Affordance Shijie Wu et.al. 2412.03142v1 null
2024-12-04 QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives Ji Wu et.al. 2412.02998v1 null
2024-12-01 FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting Phu Pham et.al. 2412.00682v1 null
2024-11-27 XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration Denys Rozumnyi et.al. 2411.18377v1 null
2024-11-22 EADReg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model for Outdoor Point Cloud Registration Linrui Gong et.al. 2411.15271v1 null
2024-11-20 Automatic marker-free registration based on similar tetrahedras for single-tree point clouds Jing Ren et.al. 2411.13069v1 null
2024-11-19 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality Hanbeom Chang et.al. 2411.12514v1 null
2024-11-16 Deep Loss Convexification for Learning Iterative Models Ziming Zhang et.al. 2411.10649v1 null
2024-11-12 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration Liyuan Zhang et.al. 2411.07740v1 null
2024-11-04 Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration Kezheng Xiong et.al. 2411.01870v1 link
2024-10-30 UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration Geng Li et.al. 2410.22909v1 null
2024-10-29 Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy Rongling Zhang et.al. 2410.21857v1 null
2024-10-29 Memory-Efficient Point Cloud Registration via Overlapping Region Sampling Tomoyasu Shimada et.al. 2410.21753v1 null
2024-10-21 RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration Pengcheng Shi et.al. 2410.15682v1 link
2024-10-14 A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration Renlang Huang et.al. 2410.10295v1 link
2024-10-14 Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces Tiziano Guadagnino et.al. 2410.10277v1 null
2024-10-10 LiPO: LiDAR Inertial Odometry for ICP Comparison Darwin Mick et.al. 2410.08097v1 null
2024-10-08 Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration Xueyang Kang et.al. 2410.05729v1 link
2024-10-07 Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection Ang He et.al. 2410.05017v1 null
2024-10-03 LoGDesc: Local geometric features aggregation for robust point cloud registration Karim Slimani et.al. 2410.02420v1 link
2024-10-01 GERA: Geometric Embedding for Efficient Point Registration Analysis Geng Li et.al. 2410.00589v1 null
2024-10-01 TFCT-I2P: Three stream fusion network with color aware transformer for image-to-point cloud registration Muyao Peng et.al. 2410.00360v1 link
2024-10-06 KISS-Matcher: Fast and Robust Point Cloud Registration Revisited Hyungtae Lim et.al. 2409.15615v2 link
2024-09-23 MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies Haojie Huang et.al. 2409.15517v1 null
2024-09-22 SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration Sara Monji-Azad et.al. 2409.14474v1 null
2024-09-27 FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator Bang-Shien Chen et.al. 2409.13978v2 link
2024-09-17 Enhancing the Reliability of LiDAR Point Cloud Sampling: A Colorization and Super-Resolution Approach Based on LiDAR-Generated Images Sier Ha et.al. 2409.11532v1 null
2024-09-14 Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent Xuesong Li et.al. 2409.09312v1 null
2024-09-11 Unsupervised Point Cloud Registration with Self-Distillation Christian Löwens et.al. 2409.07558v1 link
2024-09-10 Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations Tejas Anvekar et.al. 2409.06267v1 link
2024-09-09 From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models Tessa Pulli et.al. 2409.05413v1 null
2024-09-08 Sight View Constraint for Robust Point Cloud Registration Yaojie Zhang et.al. 2409.05065v1 null
2024-08-23 UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration Yuval Haitman et.al. 2408.12380v2 link
2024-08-21 Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild Turcan Tuna et.al. 2408.11809v1 null
2024-08-20 LoopSplat: Loop Closure by Registering 3D Gaussian Splats Liyuan Zhu et.al. 2408.10154v2 link
2024-08-05 CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration Gongxin Yao et.al. 2408.02394v1 null
2024-08-05 MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval Gongxin Yao et.al. 2408.02392v1 null
2024-07-29 Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning Ray Zhang et.al. 2407.20223v1 null
2024-07-24 Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model Lingjie Su et.al. 2407.17183v1 null
2024-07-23 SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration Chien Erh Lin et.al. 2407.16823v1 link
2024-07-19 PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training Suyi Chen et.al. 2407.14054v1 link
2024-07-19 GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation Bangyan Liao et.al. 2407.13537v2 link
2024-07-22 Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems Jianzhu Huai et.al. 2407.11705v2 null
2024-07-14 PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration Runzhao Yao et.al. 2407.10142v1 link
2024-07-13 ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency Shaocheng Yan et.al. 2407.09862v1 link
2024-07-11 BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration Stefanos Pertigkiozoglou et.al. 2407.08729v1 null
2024-07-10 Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval Shiqi Li et.al. 2407.07525v1 null
2024-07-08 SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration Guiyu Zhao et.al. 2407.06297v1 link
2024-07-08 GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields Weiyi Xue et.al. 2407.05597v1 null
2024-07-07 GaussReg: Fast 3D Registration with Gaussian Splatting Jiahao Chang et.al. 2407.05254v1 null
2024-07-06 Incremental Multiview Point Cloud Registration Xiaoya Cheng et.al. 2407.05021v1 link
2024-06-25 Point Tree Transformer for Point Cloud Registration Meiling Wang et.al. 2406.17530v1 null
2024-06-17 Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra Francisco Xavier Vasconcelos et.al. 2406.11732v1 link
2024-06-05 L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration Yibo Liu et.al. 2406.03298v1 link
2024-05-25 Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration Junjie Gao et.al. 2405.16085v1 null
2024-05-26 NV-LIO: LiDAR-Inertial Odometry using Normal Vectors Towards Robust SLAM in Multifloor Environments Dongha Chung et.al. 2405.12563v2 link
2024-05-13 RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration Congjia Chen et.al. 2405.07594v1 null
2024-05-10 Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration Li Ling et.al. 2405.06279v1 link
2024-05-09 Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration Yifan Duan et.al. 2405.05589v1 null
2024-05-07 Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform Zhijian Qiao et.al. 2405.03969v1 null
2024-05-06 Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery Maximilian Weber et.al. 2405.03314v1 null
2024-04-27 FRAME: A Modular Framework for Autonomous Map-merging: Advancements in the Field Nikolaos Stathoulopoulos et.al. 2404.18006v1 null
2024-04-22 PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer Rui She et.al. 2404.14034v1 null
2024-04-22 A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning Yu-Xin Zhang et.al. 2404.13830v1 link
2024-04-09 Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search Tianyu Huang et.al. 2404.06155v1 link
2024-04-08 Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes Yu Sheng et.al. 2404.05164v1 null
2024-04-06 Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes Zhiyuan Yu et.al. 2404.04557v1 link
2024-04-05 A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping Javier Rodriguez-Sanchez et.al. 2404.04404v1 null
2024-04-01 FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features Keisuke Sugiura et.al. 2404.01237v1 null
2024-03-28 SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks Yaxu Xie et.al. 2403.19474v1 link
2024-03-26 Global Point Cloud Registration Network for Large Transformations Hanz Cuevas-Velasquez et.al. 2403.18040v1 null
2024-03-28 Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields Junhong Zhao et.al. 2403.15981v2 null
2024-03-15 VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering Guiyu Zhao et.al. 2403.10085v1 link
2024-03-15 MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings Yu Du et.al. 2403.09996v1 null
2024-03-15 CLOSURE: Fast Quantification of Pose Uncertainty Sets Yihuai Gao et.al. 2403.09990v1 null
2024-03-13 FastMAC: Stochastic Spectral Sampling of Correspondence Graph Yifei Zhang et.al. 2403.08770v1 link
2024-03-13 NeRF-Supervised Feature Point Detection and Description Ali Youssef et.al. 2403.08156v1 link
2024-03-10 PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing Jianping Li et.al. 2403.06124v1 null
2024-03-27 Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension Quan Liu et.al. 2403.03532v2 link
2024-03-15 RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments Zhiqiang Chen et.al. 2402.18934v2 null
2024-02-28 PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers Seong Hun Lee et.al. 2402.16598v2 link
2024-02-23 CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration Kaveh Fathian et.al. 2402.15464v1 link
2024-02-11 CLIPPER: Robust Data Association without an Initial Guess Parker C. Lusk et.al. 2402.07284v1 null
2024-02-08 Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization Kenji Koide et.al. 2402.05540v1 null
2024-01-16 Registration of algebraic varieties using Riemannian optimization Florentin Goyens et.al. 2401.08562v1 link
2024-01-09 Iterative Feedback Network for Unsupervised Point Cloud Registration Yifan Xie et.al. 2401.04357v1 link
2024-01-06 PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations Rui She et.al. 2401.03167v1 null
2024-01-04 OptFlow: Fast Optimization-based Scene Flow Estimation without Supervision Rahul Ahuja et.al. 2401.02550v1 null
2024-01-17 Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration Qianliang Wu et.al. 2401.00436v4 null
2023-12-22 On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods Anh Duc Nguyen et.al. 2312.13970v2 link
2023-12-20 D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer Junjie Gao et.al. 2312.12970v1 null
2023-12-14 SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration Kezheng Xiong et.al. 2312.08664v1 null
2023-12-11 PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration Yue Wu et.al. 2312.06063v1 null
2023-12-05 DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration Zhi Chen et.al. 2312.03053v1 null
2023-12-08 Zero-Shot Point Cloud Registration Weijie Wang et.al. 2312.03032v2 null
2023-12-05 A Dynamic Network for Efficient Point Cloud Registration Yang Ai et.al. 2312.02877v1 null
2023-12-05 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation K. Samarawickrama et.al. 2312.02593v1 link
2023-12-04 Rotation-Invariant Rapid TRISO-Fueled Pebble Identification Based on Feature Matching and Point Cloud Registration Ming Fang et.al. 2312.02006v1 null
2023-12-27 E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning Xiuhong Lin et.al. 2311.18433v2 link
2023-11-15 Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change Tao Sun et.al. 2311.09346v1 null
2023-11-02 Transformation Decoupling Strategy based on Screw Theory for Deterministic Point Cloud Registration with Gravity Prior Xinyi Li et.al. 2311.01432v1 null
2023-11-02 Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration Yifan Xie et.al. 2311.01202v1 link
2023-10-29 HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration Weiyi Xue et.al. 2310.18874v1 null
2023-10-27 Do we need scan-matching in radar odometry? Vladimír Kubelka et.al. 2310.18117v1 link
2023-10-26 SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation Haobo Jiang et.al. 2310.17359v1 null
2023-10-18 DBDNet:Partial-to-Partial Point Cloud Registration with Dual Branches Decoupling Shiqi Li et.al. 2310.11733v1 null
2023-10-15 OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer Junjie Gao et.al. 2310.09817v1 null
2023-10-09 FeatSense -- A Feature-based Registration Algorithm with GPU-accelerated TSDF-Mapping Backend for NVIDIA Jetson Boards Julian Gaal et.al. 2310.05766v1 link
2023-10-09 Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration Chunge Bai et.al. 2310.05504v1 link
2023-10-06 Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching Shiquan Yi et.al. 2310.04162v1 link
2023-10-05 FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators Haiping Wang et.al. 2310.03420v1 link
2023-10-02 COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry Patrick Pfreundschuh et.al. 2310.01235v1 link
2023-09-27 Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature Shengze Jin et.al. 2309.16023v1 null
2023-09-27 Partial Transport for Point-Cloud Registration Yikun Bai et.al. 2309.15787v1 null
2023-09-27 KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping Renlang Huang et.al. 2309.15394v1 null
2023-09-26 CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration Shuhao Kang et.al. 2309.14660v1 null
2023-09-20 AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration Zheng Dang et.al. 2309.11170v1 null
2023-09-19 LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation Haizhou Zhang et.al. 2309.10436v1 link
2023-09-17 Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control Abdullah Altawaitan et.al. 2309.09163v1 link
2023-09-16 FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization Nan Ma et.al. 2309.08966v1 null
2023-09-16 Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning Pengyu Yin et.al. 2309.08914v1 link
2023-09-15 A Ground Segmentation Method Based on Point Cloud Map for Unstructured Roads Zixuan Li et.al. 2309.08164v1 null
2023-09-15 Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM Chenghao Shi et.al. 2309.08086v1 null
2023-09-14 EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization Minjung Kim et.al. 2309.07471v1 link
2023-09-12 SGFeat: Salient Geometric Feature for Point Cloud Registration Qianliang Wu et.al. 2309.06207v1 null
2023-09-01 Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning Ahmed Hatem et.al. 2308.16481v2 null
2023-08-21 In-Rack Test Tube Pose Estimation Using RGB-D Data Hao Chen et.al. 2308.10411v1 null
2023-08-18 DReg-NeRF: Deep Registration for Neural Radiance Fields Yu Chen et.al. 2308.09386v1 link
2023-08-18 Overlap Bias Matching is Necessary for Point Cloud Registration Pengcheng Shi et.al. 2308.09364v1 null
2023-08-10 Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds Registration Shaocong Liu et.al. 2308.05314v1 null
2023-08-09 PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration Mingzhi Yuan et.al. 2308.04782v1 link
2023-07-25 GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer Zheng Qin et.al. 2308.03768v1 link
2023-07-26 One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration Yongzhe Yuan et.al. 2307.14019v1 null
2023-07-22 Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap Zhijian Qiao et.al. 2307.12116v1 link
2023-09-12 ELiOT : End-to-end Lidar Odometry using Transformer Framework Daegyu Lee et.al. 2307.11998v4 null
2023-08-08 Density-invariant Features for Distant Point Cloud Registration Quan Liu et.al. 2307.09788v2 link
2023-07-18 SphereNet: Learning a Noise-Robust and General Descriptor for Point Cloud Registration Guiyu Zhao et.al. 2307.09351v1 null
2023-07-14 CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration Gongxin Yao et.al. 2307.07142v1 null
2023-07-11 Exact Point Cloud Downsampling for Fast and Accurate Global Trajectory Optimization Kenji Koide et.al. 2307.02948v2 link
2023-07-03 Direct Superpoints Matching for Fast and Robust Point Cloud Registration Aniket Gupta et.al. 2307.01362v1 link
2023-07-04 A denoised Mean Teacher for domain adaptive point cloud registration Alexander Bigalke et.al. 2306.14749v2 link
2023-06-20 End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization Guangming Wang et.al. 2306.11346v1 null
2023-06-14 ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching Matthew McDermott et.al. 2306.08690v1 link
2023-06-12 Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM Peter Stratton et.al. 2306.06850v1 link
2023-06-11 PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications Manorama Jha et.al. 2306.06717v1 null
2023-06-07 Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks Sara Monji-Azad et.al. 2306.04701v1 null
2023-05-23 Cross-source Point Cloud Registration: Challenges, Progress and Prospects Xiaoshui Huang et.al. 2305.13570v1 null
2023-05-19 Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration Xinyi Li et.al. 2305.11716v1 null
2023-05-18 3D Registration with Maximal Cliques Xiyu Zhang et.al. 2305.10854v1 link
2023-05-05 HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration Canhui Tang et.al. 2305.03487v1 link
2023-05-08 APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction Quan Liu et.al. 2305.02893v2 link
2023-04-27 RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object Shiyu Xing et.al. 2304.14092v1 link
2023-04-26 Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography Peng Liu et.al. 2304.13618v1 link
2023-04-25 BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization Harel Biggie et.al. 2304.13114v1 link
2023-04-18 SDFReg: Learning Signed Distance Functions for Point Cloud Registration Leida Zhang et.al. 2304.08929v1 null
2023-04-12 SiLK -- Simple Learned Keypoints Pierre Gleize et.al. 2304.06194v1 link
2023-04-11 TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain Alexey I. Boyko et.al. 2304.05342v1 null
2023-04-10 HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion Yu Wang et.al. 2304.04508v1 null
2023-04-09 Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos Shiyang Lu et.al. 2304.04325v1 null
2023-04-09 DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames Changjie Qiu et.al. 2304.04200v1 null
2023-04-02 Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting Haiping Wang et.al. 2304.00467v1 link
2023-03-31 kNN-Res: Residual Neural Network with kNN-Graph coherence for point cloud registration Muhammad S. Battikh et.al. 2304.00050v1 link
2023-03-31 RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving Chenghao Shi et.al. 2303.18084v1 null
2023-04-23 HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching Yiheng Li et.al. 2303.16526v2 link
2023-03-27 Learnable Graph Matching: A Practical Paradigm for Data Association Jiawei He et.al. 2303.15414v1 link
2023-03-23 Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration Guofeng Mei et.al. 2303.13290v1 link
2023-03-22 RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration Jiuming Liu et.al. 2303.12384v1 link
2023-03-17 Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration Zheng Qin et.al. 2303.09950v1 link
2023-03-14 RoCNet: 3D Robust Registration of Point-Clouds using Deep Learning Karim Slimani et.al. 2303.07963v1 null
2023-03-07 GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration Michael Gentner et.al. 2303.04032v1 null
2023-03-02 Neural Intrinsic Embedding for Non-rigid Point Cloud Matching Puhua Jiang et.al. 2303.01038v1 null
2023-03-14 A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation Lin Li et.al. 2302.14511v2 link
2023-02-28 PCR-CG: Point Cloud Registration via Deep Color and Geometry Yu Zhang et.al. 2302.14418v1 link
2023-02-28 Efficient Implicit Neural Reconstruction Using LiDAR Dongyu Yan et.al. 2302.14363v1 link
2023-02-25 Accurate Gaussian Process Distance Fields with applications to Echolocation and Mapping Cedric Le Gentil et.al. 2302.13005v1 null
2023-02-14 Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms Ningli Xu et.al. 2302.07184v1 link

(back to top)

Point Cloud Segmentation

Publish Date Title Authors PDF Code
2024-12-02 The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs Christina Kassab et.al. 2412.01539v1 null
2024-11-30 Density-aware Global-Local Attention Network for Point Cloud Segmentation Chade Li et.al. 2412.00489v1 null
2024-11-28 Textured As-Is BIM via GIS-informed Point Cloud Segmentation Mohamed S. H. Alabassy et.al. 2411.18898v1 null
2024-11-27 Towards Cross-device and Training-free Robotic Grasping in 3D Open World Weiguang Zhao et.al. 2411.18133v1 null
2024-11-20 BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation Umamaheswaran Raman Kumar et.al. 2411.13251v1 null
2024-11-13 Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model Yutao Shen et.al. 2411.08453v1 null
2024-11-13 Multiscale Graph Construction Using Non-local Cluster Features Reina Kaneko et.al. 2411.08371v1 null
2024-10-30 Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification Pengkun Liu et.al. 2410.23105v1 null
2024-11-03 Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation Zhaochong An et.al. 2410.22489v2 null
2024-10-28 Exploring contextual modeling with linear complexity for point cloud segmentation Yong Xien Chng et.al. 2410.21211v1 null
2024-10-14 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies Yanjie Ze et.al. 2410.10803v1 link
2024-10-09 Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy Qinfeng Zhu et.al. 2410.06725v1 null
2024-09-24 Underground Mapping and Localization Based on Ground-Penetrating Radar Jinchang Zhang et.al. 2409.16446v1 null
2024-09-22 Lidar Panoptic Segmentation in an Open World Anirudh S Chakravarthy et.al. 2409.14273v1 link
2024-09-03 When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels Yifan Liu et.al. 2409.01691v1 null
2024-09-03 Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation Haodong Wang et.al. 2409.01662v1 null
2024-08-29 Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment Liyao Tang et.al. 2408.16520v1 link
2024-08-21 GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation Abiao Li et.al. 2408.11558v1 link
2024-08-02 Trainable Pointwise Decoder Module for Point Cloud Segmentation Bike Chen et.al. 2408.01548v1 null
2024-07-31 Fine-grained Metrics for Point Cloud Semantic Segmentation Zhuheng Lu et.al. 2407.21289v1 null
2024-07-19 Scale Disparity of Instances in Interactive Point Cloud Segmentation Chenrui Han et.al. 2407.14009v1 null
2024-07-18 SegPoint: Segment Any Point Cloud via Large Language Model Shuting He et.al. 2407.13761v1 null
2024-07-17 Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation Ruijie Xu et.al. 2407.12489v1 link
2024-07-17 HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation Tianpei Zou et.al. 2407.12387v1 link
2024-07-17 Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model Tao Wang et.al. 2407.12319v1 null
2024-07-12 Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion Shiqi Tan et.al. 2407.09697v1 null
2024-07-01 fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence Francis Williams et.al. 2407.01781v1 null
2024-06-25 Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model Zhuoyuan Li et.al. 2406.17442v1 null
2024-08-04 Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes Yong-Qiang Mao et.al. 2405.19735v2 null
2024-05-24 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving Boyi Sun et.al. 2405.15286v1 link
2024-05-25 Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation Bike Chen et.al. 2405.10175v2 null
2024-04-16 ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation Iaroslav Melekhov et.al. 2404.10699v1 link
2024-04-04 OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views Francis Engelmann et.al. 2404.03650v1 null
2024-03-28 RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation Chongkai Gao et.al. 2403.19460v1 null
2024-05-30 CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation Guoyang Zhao et.al. 2403.16794v2 link
2024-03-18 EffiPerception: an Efficient Framework for Various Perception Tasks Xinhao Xiang et.al. 2403.12317v1 null
2024-03-11 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data Xiting Zhao et.al. 2403.06538v1 null
2024-03-11 Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation Peng Zhang et.al. 2403.06401v1 null
2024-03-03 Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation Dipesh Gyawali et.al. 2403.01407v1 null
2024-01-29 Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation Jie Liu et.al. 2401.16051v1 link
2024-01-19 Symbol as Points: Panoptic Symbol Spotting via Point-based Representation Wenlong Liu et.al. 2401.10556v1 link
2023-12-29 Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation Xiawei Li et.al. 2312.16578v2 link
2023-12-19 Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas Alperen Enes Bayar et.al. 2312.11880v1 null
2023-12-15 T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning Weijie Wei et.al. 2312.10217v1 link
2023-12-14 FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments Minghao Lu et.al. 2312.08743v1 null
2023-12-12 Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation Yuanbin Wang et.al. 2312.07221v1 null
2023-12-11 Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation Shaobo Xia et.al. 2312.06799v1 null
2024-01-15 Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More Jan Schuchardt et.al. 2312.02708v2 null
2023-11-24 OneFormer3D: One Transformer for Unified Point Cloud Segmentation Maxim Kolodiazhnyi et.al. 2311.14405v1 null
2023-11-18 DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields Yu Chi et.al. 2311.12063v1 link
2023-11-10 U3DS $^3$ : Unsupervised 3D Semantic Scene Segmentation Jiaxu Liu et.al. 2311.06018v1 null
2023-11-06 Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation Shichao Dong et.al. 2311.01989v2 null
2023-10-19 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision Cheng-Kun Yang et.al. 2310.12817v1 null
2023-10-11 PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation Haibo Qiu et.al. 2310.07743v1 link
2023-09-26 Addressing Data Misalignment in Image-LiDAR Fusion on Point Cloud Segmentation Wei Jong Yang et.al. 2309.14932v1 null
2023-09-20 Towards Robust Few-shot Point Cloud Semantic Segmentation Yating Xu et.al. 2309.11228v1 link
2023-09-20 Generalized Few-Shot Point Cloud Segmentation Via Geometric Words Yating Xu et.al. 2309.11222v1 link
2023-08-29 Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation Cristiano Saltori et.al. 2308.14619v2 link
2023-08-22 Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation Zongyi Xu et.al. 2308.11166v1 link
2023-08-14 Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid Alexander Kyuroson et.al. 2308.07283v1 null
2023-08-08 Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement Zhenhua Ning et.al. 2308.03177v2 link
2023-07-31 pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation Abhishek Kuriyal et.al. 2307.14777v2 link
2023-07-27 Clustering based Point Cloud Representation Learning for 3D Analysis Tuo Feng et.al. 2307.14605v1 link
2023-07-20 See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data Yuhang Lu et.al. 2307.10782v1 null
2023-07-14 Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar Runwei Guan et.al. 2307.07102v1 link
2023-07-08 BPNet: Bézier Primitive Segmentation on 3D Point Clouds Rao Fu et.al. 2307.04013v1 link
2023-06-28 Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction Athrva Atul Pandhare et.al. 2306.16306v1 null
2023-05-30 Dynamic Clustering Transformer Network for Point Cloud Segmentation Dening Lu et.al. 2306.08073v1 null
2023-05-23 Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation Shuting He et.al. 2305.14335v1 link
2023-05-22 Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning Xiaoxiao Sheng et.al. 2305.12959v1 null
2023-05-17 Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences Ahmed J. Afifi et.al. 2305.09928v1 null
2023-05-08 OctFormer: Octree-based Transformers for 3D Point Clouds Peng-Shuai Wang et.al. 2305.03045v2 link
2023-05-22 Urban GeoBIM construction by integrating semantic LiDAR point clouds with as-designed BIM models Jie Shao et.al. 2304.11719v2 null
2023-04-22 Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation Feng Jiang et.al. 2304.11393v1 link
2023-06-02 Transformer-Based Visual Segmentation: A Survey Xiangtai Li et.al. 2304.09854v2 link
2023-04-11 Feature-assisted interactive geometry reconstruction in 3D point clouds using incremental region growing Attila Szabo et.al. 2304.05109v1 null

(back to top)

Zero-shot

Publish Date Title Authors PDF Code
2025-01-03 GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Zhangyang Qi et.al. 2501.01428v2 null
2025-01-02 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Yuanpeng Tu et.al. 2501.01427v1 null
2025-01-02 Unifying Specialized Visual Encoders for Video Language Models Jihoon Chung et.al. 2501.01426v1 null
2025-01-03 AdaptVC: High Quality Voice Conversion with Adaptive Learning Jaehun Kim et.al. 2501.01347v2 null
2025-01-02 Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers? Manuel Weber et.al. 2501.01256v1 null
2025-01-02 Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction Alexander Brinkmann et.al. 2501.01237v1 null
2025-01-02 Symmetries-enhanced Multi-Agent Reinforcement Learning Nikolaos Bousias et.al. 2501.01136v1 null
2025-01-03 MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization Haina Zhu et.al. 2501.01108v2 null
2025-01-02 Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice Federico Ravenda et.al. 2501.00982v1 null
2025-01-01 Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model Chenyang Liu et.al. 2501.00895v1 null
2024-12-30 QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing Shlomo Kashani et.al. 2412.20956v1 null
2024-12-30 Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding Liuzhenghao Lv et.al. 2412.20888v1 link
2024-12-30 TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting Huanyu Zhang et.al. 2412.20810v1 null
2024-12-30 Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks Yuhe Ding et.al. 2412.20682v1 null
2024-12-29 Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) Tomer Garber et.al. 2412.20596v1 null
2024-12-27 Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark Lukas Picek et.al. 2412.19944v1 null
2024-12-27 EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs Daniil A. Berdyshev et.al. 2412.19725v1 link
2024-12-30 VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models Tao Wu et.al. 2412.19645v2 null
2024-12-27 MINIMA: Modality Invariant Image Matching Xingyu Jiang et.al. 2412.19412v1 link
2024-12-26 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Ziang Yan et.al. 2412.19326v1 link
2024-12-26 RecLM: Recommendation Instruction Tuning Yangqin Jiang et.al. 2412.19302v1 null
2024-12-26 Time Series Foundational Models: Their Role in Anomaly Detection and Prediction Chathurangi Shyalika et.al. 2412.19286v1 link
2024-12-26 Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval Yang Du et.al. 2412.19178v1 link
2024-12-26 CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting Siyu Jiao et.al. 2412.19142v1 null
2024-12-26 Semantic Residual for Multimodal Unified Discrete Representation Hai Huang et.al. 2412.19128v1 null
2024-12-26 Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing Inpyo Hong et.al. 2412.19125v1 link
2024-12-24 Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Zehan Wang et.al. 2412.18605v1 null
2024-12-24 ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation Hongjie Li et.al. 2412.18600v1 null
2024-12-24 Distilling Fine-grained Sentiment Understanding from Large Language Models Yice Zhang et.al. 2412.18552v1 link
2024-12-24 The Key of Understanding Vision Tasks: Explanatory Instructions Yang Shen et.al. 2412.18525v1 link
2024-12-24 Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English Avinash Anand et.al. 2412.18415v1 link
2024-12-24 Extract Free Dense Misalignment from CLIP JeongYeon Nam et.al. 2412.18404v1 null
2024-12-24 A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction Stefano Damiano et.al. 2412.18348v1 link
2024-12-24 Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model Yushu Li et.al. 2412.18303v1 null
2024-12-24 Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight Xi Ding et.al. 2412.18298v1 link
2024-12-24 Improved Feature Generating Framework for Transductive Zero-shot Learning Zihan Ye et.al. 2412.18282v1 null
2024-12-23 CiteBART: Learning to Generate Citations for Local Citation Recommendation Ege Yiğit Çelik et.al. 2412.17534v1 link
2024-12-23 Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio Gongyu Chen et.al. 2412.17306v1 null
2024-12-23 Discriminative Image Generation with Diffusion Models for Zero-Shot Learning Dingjie Fu et.al. 2412.17219v1 null
2024-12-22 Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis Ye-Xin Lu et.al. 2412.16977v1 null
2024-12-22 Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation Quan Dao et.al. 2412.16906v1 null
2024-12-22 Autoregressive Speech Synthesis with Next-Distribution Prediction Xinfa Zhu et.al. 2412.16846v1 null
2024-12-21 RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing Zhipeng Huang et.al. 2412.16778v1 null
2024-12-21 HyperCLIP: Adapting Vision-Language models with Hypernetworks Victor Akinwande et.al. 2412.16777v1 null
2024-12-21 Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval Luo Ji et.al. 2412.16615v1 null
2024-12-21 Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling Daichi Yashima et.al. 2412.16576v1 link
2024-12-20 Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts Muhammad Abdullah Sohail et.al. 2412.16119v1 link
2024-12-20 CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Songhua Liu et.al. 2412.16112v1 link
2024-12-20 Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers Yifan Yang et.al. 2412.16102v1 null
2024-12-20 Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs Lynn Greschner et.al. 2412.15993v1 null
2024-12-20 Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation Zhenghao Gao et.al. 2412.15924v1 null
2024-12-20 On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education Lorenz Wendlinger et.al. 2412.15902v1 null
2024-12-20 AutoLife: Automatic Life Journaling with Smartphones and LLMs Huatao Xu et.al. 2412.15714v1 null
2024-12-20 Cracking the Code: Evaluating Zero-Shot Prompting Methods for Providing Programming Feedback Niklas Ippisch et.al. 2412.15702v1 null
2024-12-20 SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training Wenxi Chen et.al. 2412.15649v1 null
2024-12-20 A New Method to Capturing Compositional Knowledge in Linguistic Space Jiahe Wan et.al. 2412.15632v1 null
2024-12-19 Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings Daniel Russo et.al. 2412.15189v1 link
2024-12-19 STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning Marius Memmel et.al. 2412.15182v1 null
2024-12-19 Adaptive Pruning for Large Language Models with Structural Importance Awareness Haotian Zheng et.al. 2412.15127v1 null
2024-12-19 Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling Leying Zhang et.al. 2412.14890v1 null
2024-12-19 Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data Shuang Li et.al. 2412.14873v1 link
2024-12-19 Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure Jeffrey Sardina et.al. 2412.14801v1 null
2024-12-19 Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning Kepu Zhang et.al. 2412.14588v1 null
2024-12-19 MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Junjie Zhou et.al. 2412.14475v1 null
2024-12-19 WildSAT: Learning Satellite Image Representations from Wildlife Observations Rangel Daroya et.al. 2412.14428v1 null
2024-12-18 I0T: Embedding Standardization Method Towards Zero Modality Gap Na Min An et.al. 2412.14384v1 link
2024-12-18 Autoregressive Video Generation without Vector Quantization Haoge Deng et.al. 2412.14169v1 link
2024-12-18 Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation Jianyu Zhang et.al. 2412.14145v1 null
2024-12-18 Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation Rémi Marsal et.al. 2412.14103v1 null
2024-12-18 FarExStance: Explainable Stance Detection for Farsi Majid Zarharan et.al. 2412.14008v1 link
2024-12-18 Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition Ethan Baron et.al. 2412.13947v1 null
2024-12-18 Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer Xinyuan Shao et.al. 2412.13908v1 link
2024-12-18 Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models Anna Scius-Bertrand et.al. 2412.13859v1 null
2024-12-18 SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor Chenyu Yang et.al. 2412.13786v1 null
2024-12-18 G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Tony Cheng Tong et.al. 2412.13647v1 link
2024-12-18 Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking Zhengfei Xu et.al. 2412.13614v1 null
2024-12-17 GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding Haoyi Jiang et.al. 2412.13193v1 link
2024-12-17 A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis Xiao Zhou et.al. 2412.13126v1 null
2024-12-17 Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO Umer Butt et.al. 2412.12997v1 null
2024-12-17 An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions Shreeyash Gowaikar et.al. 2412.12898v1 null
2024-12-17 Question: How do Large Language Models perform on the Question Answering tasks? Answer: Kevin Fischer et.al. 2412.12893v1 null
2024-12-17 MIVE: New Design and Benchmark for Multi-Instance Video Editing Samuel Teodoro et.al. 2412.12877v1 null
2024-12-17 Comparative Analysis of Zero-Shot Capability of Time-Series Foundation Models in Short-Term Load Prediction Nan Lin et.al. 2412.12834v1 null
2024-12-17 FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering Zheng Cheng et.al. 2412.12833v1 null
2024-12-17 Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages Robert Litschko et.al. 2412.12806v1 null
2024-12-17 ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation Shiqi Huang et.al. 2412.12798v1 link
2024-12-16 Causal Diffusion Transformers for Generative Modeling Chaorui Deng et.al. 2412.12095v1 link
2024-12-16 CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology Yuxuan Sun et.al. 2412.12077v1 null
2024-12-16 A LoRA is Worth a Thousand Pictures Chenxi Liu et.al. 2412.12048v1 null
2024-12-16 Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps Linfeng Zhao et.al. 2412.12024v1 null
2024-12-16 Cost-Effective Label-free Node Classification with LLMs Taiyan Zhang et.al. 2412.11983v1 null
2024-12-16 Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning Yuti Liu et.al. 2412.11952v1 null
2024-12-16 Stepwise Reasoning Error Disruption Attack of LLMs Jingyu Peng et.al. 2412.11934v1 null
2024-12-16 PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection Sepideh Mamooler et.al. 2412.11923v1 null
2024-12-16 Improved Models for Media Bias Detection and Subcategorization Tim Menzner et.al. 2412.11835v1 null
2024-12-16 A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation Tian-Yi Che et.al. 2412.11832v1 null
2024-12-13 UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities Muhammad Uzair Khattak et.al. 2412.10372v1 link
2024-12-13 Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media Jiaqing Yuan et.al. 2412.10266v1 null
2024-12-13 Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Jaehyeon Kim et.al. 2412.10208v1 null
2024-12-13 Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments Kehan Chen et.al. 2412.10137v1 null
2024-12-13 Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data Jonas Golde et.al. 2412.10121v1 null
2024-12-13 Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP Yating Yu et.al. 2412.09895v1 link
2024-12-13 CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection Qibo Chen et.al. 2412.09799v1 null
2024-12-12 Toward Foundation Model for Multivariate Wearable Sensing of Physiological Signals Yunfei Luo et.al. 2412.09758v1 link
2024-12-12 Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners? Huaijiang Zhu et.al. 2412.09743v1 null
2024-12-12 TransferLight: Zero-Shot Traffic Signal Control on any Road-Network Johann Schmidt et.al. 2412.09719v1 null
2024-12-12 EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Zhuofan Zong et.al. 2412.09618v1 null
2024-12-12 Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion Joseph Humphreys et.al. 2412.09440v1 null
2024-12-12 Distribution free uncertainty quantification in neuroscience-inspired deep operators Shailesh Garg et.al. 2412.09369v1 null
2024-12-12 Towards Open-Vocabulary Video Semantic Segmentation Xinhao Li et.al. 2412.09329v1 link
2024-12-12 T-SVG: Text-Driven Stereoscopic Video Generation Qiao Jin et.al. 2412.09323v1 null
2024-12-12 Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine Xiaoshuang Huang et.al. 2412.09278v1 link
2024-12-12 Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation Kirill Sirotkin et.al. 2412.09160v1 null
2024-12-12 Evaluating Pixel Language Models on Non-Standardized Languages Alberto Muñoz-Ortiz et.al. 2412.09084v1 null
2024-12-12 Cross-View Completion Models are Zero-shot Correspondence Estimators Honggyu An et.al. 2412.09072v1 null
2024-12-13 An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques Chunxiao Li et.al. 2412.09063v2 null
2024-12-11 RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation Mingfei Han et.al. 2412.08591v1 null
2024-12-11 SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting Pallavi Jain et.al. 2412.08536v1 link
2024-12-11 SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation Tapas Kumar Dutta et.al. 2412.08482v1 null
2024-12-11 Assessing Personalized AI Mentoring with Large Language Models in the Computing Field Xiao Luo et.al. 2412.08430v1 null
2024-12-11 Zero-Shot Mono-to-Binaural Speech Synthesis Alon Levkovitch et.al. 2412.08356v1 null
2024-12-11 BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language Nikolay Banar et.al. 2412.08329v1 null
2024-12-11 Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion Bingzhi Shen et.al. 2412.08315v1 null
2024-12-11 2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset Marta R. Costa-jussà et.al. 2412.08274v1 null
2024-12-11 Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field Tanay Aggarwal et.al. 2412.08258v1 link
2024-12-11 Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? Zihao Li et.al. 2412.08174v1 null
2024-12-10 Video Motion Transfer with Diffusion Transformers Alexander Pondaven et.al. 2412.07776v1 link
2024-12-10 From Slow Bidirectional to Fast Causal Video Generators Tianwei Yin et.al. 2412.07772v1 null
2024-12-11 Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting Zetong Yang et.al. 2412.07768v2 null
2024-12-10 SAT: Spatial Aptitude Training for Multimodal Language Models Arijit Ray et.al. 2412.07755v1 null
2024-12-10 Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Zijian Chen et.al. 2412.07743v1 null
2024-12-10 DriveMM: All-in-One Large Multimodal Model for Autonomous Driving Zhijian Huang et.al. 2412.07689v1 link
2024-12-10 Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions Anant Prakash Awasthi et.al. 2412.07687v1 null
2024-12-10 FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing Yingying Deng et.al. 2412.07517v1 link
2024-12-10 ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning Hongshu Guo et.al. 2412.07507v1 null
2024-12-10 Bilingual BSARD: Extending Statutory Article Retrieval to Dutch Ehsan Lotfi et.al. 2412.07462v1 null
2024-12-09 Visual Lexicon: Rich Image Features in Language Space XuDong Wang et.al. 2412.06774v1 null
2024-12-09 JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM Takuro Fujii et.al. 2412.06738v1 link
2024-12-09 You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Baorui Ma et.al. 2412.06699v1 link
2024-12-09 Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation Shun Zhang et.al. 2412.06664v1 null
2024-12-09 LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation Haihang Wu et.al. 2412.06419v1 null
2024-12-09 Continual Learning for Segment Anything Model Adaptation Jinglong Yang et.al. 2412.06418v1 link
2024-12-09 ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models Bingchen Gong et.al. 2412.06292v1 null
2024-12-09 No Annotations for Object Detection in Art through Stable Diffusion Patrick Ramos et.al. 2412.06286v1 link
2024-12-09 DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction Yunheng Li et.al. 2412.06244v1 null
2024-12-09 Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings Zhao Liu et.al. 2412.06134v1 null
2024-12-06 DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo Junzhe Zhu et.al. 2412.05268v1 null
2024-12-06 Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization Luca Masserano et.al. 2412.05244v1 null
2024-12-06 Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization Samuel Schapiro et.al. 2412.05169v1 null
2024-12-06 A Practical Examination of AI-Generated Text Detectors for Large Language Models Brian Tufts et.al. 2412.05139v1 null
2024-12-06 Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? Seyed Amin Tabatabaei et.al. 2412.05137v1 null
2024-12-06 The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation Ruoyu Wang et.al. 2412.05101v1 null
2024-12-06 HOLa: HoloLens Object Labeling Michael Schwimmbeck et.al. 2412.04945v1 link
2024-12-06 $S^3$ : Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models Xiaojie Yin et.al. 2412.04925v1 null
2024-12-06 StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching Jixun Yao et.al. 2412.04724v1 null
2024-12-06 LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs Xuan Chen et.al. 2412.04690v1 null
2024-12-05 Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail Luca Bartolomei et.al. 2412.04472v1 link
2024-12-05 Grounding Descriptions in Images informs Zero-Shot Visual Recognition Shaunak Halbe et.al. 2412.04429v1 link
2024-12-05 SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Rong Li et.al. 2412.04383v1 null
2024-12-05 Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting Edoardo Cetin et.al. 2412.04368v1 null
2024-12-05 Towards Zero-shot 3D Anomaly Localization Yizhou Wang et.al. 2412.04304v1 null
2024-12-05 3D Part Segmentation via Geometric Aggregation of 2D Visual Features Marco Garosi et.al. 2412.04247v1 null
2024-12-05 Quantifying the Limits of Segment Anything Model: Analyzing Challenges in Segmenting Tree-Like and Low-Contrast Structures Yixin Zhang et.al. 2412.04243v1 link
2024-12-05 Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image Shuang Xu et.al. 2412.04201v1 null
2024-12-05 Unified Framework for Open-World Compositional Zero-shot Learning Hirunima Jayasekara et.al. 2412.04083v1 link
2024-12-05 Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning Shicheng Zhou et.al. 2412.04078v1 link
2024-12-04 The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control Ruili Feng et.al. 2412.03568v1 null
2024-12-04 FLAIR: VLM with Fine-grained Language-informed Image Representations Rui Xiao et.al. 2412.03561v1 link
2024-12-04 Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression Junjie Wen et.al. 2412.03293v1 null
2024-12-04 Expanding Event Modality Applications through a Robust CLIP-Based Encoder Sungheon Jeong et.al. 2412.03093v1 null
2024-12-04 ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction Victor Junqiu Wei et.al. 2412.03075v1 null
2024-12-04 UTSD: Unified Time Series Diffusion Model Xiangkai Ma et.al. 2412.03068v1 null
2024-12-03 A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications Yixiang Qu et.al. 2412.02868v1 null
2024-12-03 Is Large-Scale Pretraining the Secret to Good Domain Generalization? Piotr Teterwak et.al. 2412.02856v1 null
2024-12-03 Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation Sarthak Kumar Maharana et.al. 2412.02837v1 null
2024-12-03 Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects Abdurrahman Zeybey et.al. 2412.02803v1 null
2024-12-03 FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation Kefan Chen et.al. 2412.02690v1 null
2024-12-03 Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks Jinjin Cai et.al. 2412.02531v1 null
2024-12-03 LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization Ethan Smith et.al. 2412.02352v1 null
2024-12-03 Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation Zhi Qu et.al. 2412.02101v1 link
2024-12-03 Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion Liu Liu et.al. 2412.02075v1 link
2024-12-02 PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving Xuewen Luo et.al. 2412.02025v1 null
2024-12-04 The use of large language models to enhance cancer clinical trial educational materials Mingye Gao et.al. 2412.01955v2 null
2024-12-02 RandAR: Decoder-only Autoregressive Visual Generation in Random Orders Ziqi Pang et.al. 2412.01827v1 null
2024-12-02 COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training Sanghwan Kim et.al. 2412.01814v1 link
2024-12-02 Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions Chaoran Cheng et.al. 2412.01786v1 null
2024-12-02 T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs Shukang Yin et.al. 2411.19951v2 link
2024-11-29 Reverse Thinking Makes LLMs Stronger Reasoners Justin Chih-Yao Chen et.al. 2411.19865v1 null
2024-11-29 Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures Alain Riou et.al. 2411.19806v1 null
2024-11-29 Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models Kaican Li et.al. 2411.19757v1 link
2024-11-29 Multimodal Whole Slide Foundation Model for Pathology Tong Ding et.al. 2411.19666v1 link
2024-11-29 LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification Taja Kuzman et.al. 2411.19638v1 link
2024-11-29 Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling Qirui Wu et.al. 2411.19492v1 null
2024-11-29 Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning Siddhant Agarwal et.al. 2411.19418v1 null
2024-11-28 CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections Mohamed Fazli Imam et.al. 2411.19346v1 link
2024-11-28 OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration Yiming Zuo et.al. 2411.19278v1 link
2024-11-27 Diffusion Self-Distillation for Zero-Shot Customized Image Generation Shengqu Cai et.al. 2411.18616v1 null
2024-11-27 Isolating authorship from content with semantic embeddings and contrastive learning Javier Huertas-Tato et.al. 2411.18472v1 null
2024-11-27 SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation Duc-Hai Pham et.al. 2411.18229v1 null
2024-11-27 DRS: Deep Question Reformulation With Structured Output Zhecheng Li et.al. 2411.17993v1 link
2024-11-26 Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Zigeng Chen et.al. 2411.17787v1 link
2024-11-26 MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation Harsh Singh et.al. 2411.17636v1 null
2024-11-26 ShowUI: One Vision-Language-Action Model for GUI Visual Agent Kevin Qinghong Lin et.al. 2411.17465v1 link
2024-11-26 FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval Jingyou Xie et.al. 2411.17454v1 null
2024-11-26 PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning Zhen Sun et.al. 2411.17453v1 null
2024-11-26 CoA: Chain-of-Action for Generative Semantic Labels Meng Wei et.al. 2411.17406v1 link
2024-11-26 vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation Bastian Wittmann et.al. 2411.17386v1 null
2024-11-26 2D Matryoshka Training for Information Retrieval Shuai Wang et.al. 2411.17299v1 link
2024-11-26 APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents Jun Yu Chen et.al. 2411.17255v1 link
2024-11-26 Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors Zhengfei Kuang et.al. 2411.17249v1 null
2024-11-26 Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration Junyuan Deng et.al. 2411.17240v1 link
2024-11-25 Diffusion Features for Zero-Shot 6DoF Object Pose Estimation Bernd Von Gimborn et.al. 2411.16668v1 null
2024-11-25 Generating Out-Of-Distribution Scenarios Using Language Models Erfan Aasi et.al. 2411.16554v1 null
2024-11-25 TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation Linqing Zhong et.al. 2411.16425v1 null
2024-11-25 Poster: Could Large Language Models Perform Network Management? Zine el abidine Kherroubi et.al. 2411.16232v1 null
2024-11-25 SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context Jungang Li et.al. 2411.16213v1 null
2024-11-25 Learn from Foundation Model: Fruit Detection Model without Manual Annotation Yanan Wang et.al. 2411.16196v1 link
2024-11-25 Language Driven Occupancy Prediction Zhu Yu et.al. 2411.16072v1 link
2024-11-25 Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models Niloufar Alipour Talemi et.al. 2411.16018v1 null
2024-11-24 PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making Jonathan Light et.al. 2411.15998v1 null
2024-11-24 Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition Klara Janouskova et.al. 2411.15933v1 null
2024-11-22 Context-Aware Multimodal Pretraining Karsten Roth et.al. 2411.15099v1 null
2024-11-22 Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models Aurel X. Appius et.al. 2411.14917v1 null
2024-11-22 Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation Huy Le et.al. 2411.14913v1 null
2024-11-22 Leveraging Hierarchical Prototypes as the Verbalizer for Implicit Discourse Relation Recognition Wanqiu Long et.al. 2411.14880v1 null
2024-11-22 VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models Camilo Chacón Sartori et.al. 2411.14832v1 null
2024-11-22 De-biased Multimodal Electrocardiogram Analysis Haitao Li et.al. 2411.14795v1 null
2024-11-22 Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers Hongbo Liu et.al. 2411.14789v1 null
2024-11-21 Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems Qihao Yuan et.al. 2411.14594v1 link
2024-11-21 Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding Yiming Zhang et.al. 2411.14401v1 null
2024-11-21 DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Tianhe Ren et.al. 2411.14347v1 link
2024-11-21 StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart Jian Shi et.al. 2411.14295v1 null
2024-11-21 Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models Iacopo Ghinassi et.al. 2411.14272v1 link
2024-11-21 Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs Zeyu Dong et.al. 2411.14256v1 null
2024-11-21 Evaluating the Robustness of Analogical Reasoning in Large Language Models Martha Lewis et.al. 2411.14215v1 link
2024-11-21 Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data Xianda Guo et.al. 2411.14053v1 link
2024-11-21 Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion Jinhong He et.al. 2411.13961v1 link
2024-11-21 Learning to Cooperate with Humans using Generative Agents Yancheng Liang et.al. 2411.13934v1 link
2024-11-21 CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation Lin Sun et.al. 2411.13836v1 link
2024-11-20 Find Any Part in 3D Ziqi Ma et.al. 2411.13550v1 null
2024-11-20 BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework Xu Zou et.al. 2411.13237v1 null
2024-11-20 Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding Nabeel Seedat et.al. 2411.13163v1 null
2024-11-20 Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM Jiawei Yu et.al. 2411.13159v1 null
2024-11-20 Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation Johannes Pitz et.al. 2411.13148v1 null
2024-11-20 TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models Xin Wang et.al. 2411.13136v1 null
2024-11-20 Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI Yaşar Utku Alçalar et.al. 2411.13022v1 null
2024-11-20 Evaluating LLMs Capabilities Towards Understanding Social Dynamics Anique Tahir et.al. 2411.13008v1 null
2024-11-19 Improving Controllability and Editability for Pretrained Text-to-Music Generation Models Yixiao Zhang et.al. 2411.12641v1 null
2024-11-19 Instant Policy: In-Context Imitation Learning via Graph Diffusion Vitalis Vosylius et.al. 2411.12633v1 null
2024-11-19 SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation Ron Keuth et.al. 2411.12602v1 link
2024-11-19 Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing Ruyi Ding et.al. 2411.12508v1 null
2024-11-19 Predicting User Intents and Musical Attributes from Music Discovery Conversations Daeyong Kwon et.al. 2411.12254v1 link
2024-11-19 Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings Iroro Orife et.al. 2411.12209v1 link
2024-11-19 A More Advanced Group Polarization Measurement Approach Based on LLM-Based Agents and Graphs Zixin Liu et.al. 2411.12196v1 null
2024-11-19 UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning Yuan Yuan et.al. 2411.12164v1 link
2024-11-19 HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments Shuijing Liu et.al. 2411.12150v1 null
2024-11-18 VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation Bangguo Yu et.al. 2411.11609v1 null
2024-11-18 Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting Hongjun Wang et.al. 2411.11448v1 link
2024-11-18 Scalable Autoregressive Monocular Depth Estimation Jinhong Wang et.al. 2411.11361v1 null
2024-11-18 Text-guided Zero-Shot Object Localization Jingjing Wang et.al. 2411.11357v1 null
2024-11-18 Visual-Semantic Graph Matching Net for Zero-Shot Learning Bowen Duan et.al. 2411.11351v1 link
2024-11-18 Zero-Shot Load Forecasting with Large Language Models Wenlong Liao et.al. 2411.11350v1 null
2024-11-18 Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation Peng Shu et.al. 2411.11295v1 null
2024-11-18 Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition Yang Chen et.al. 2411.11288v1 null
2024-11-18 Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development Ranjan Sapkota et.al. 2411.11285v1 null
2024-11-18 ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification Son T. Luu et.al. 2411.11247v1 link
2024-11-15 Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting Ziqi Xie et.al. 2411.10309v1 link
2024-11-15 CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation Dengke Zhang et.al. 2411.10086v1 null
2024-11-15 'What did the Robot do in my Absence?' Video Foundation Models to Enhance Intermittent Supervision Kavindie Katuwandeniya et.al. 2411.10016v1 null
2024-11-15 Zero-shot Voice Conversion with Diffusion Transformers Songting Liu et.al. 2411.09943v1 link
2024-11-14 LLM Hallucination Reasoning with Zero-shot Knowledge Test Seongmin Lee et.al. 2411.09689v1 null
2024-11-14 Script-centric behavior understanding for assisted autism spectrum disorder diagnosis Wenxing Liu et.al. 2411.09413v1 null
2024-11-14 Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures Shuzhi Gong et.al. 2411.09389v1 null
2024-11-14 Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet? Aldo Marzullo et.al. 2411.09310v1 null
2024-11-14 Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching Yuran Wang et.al. 2411.09151v1 null
2024-11-15 UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos Chengbo Yuan et.al. 2411.09145v2 null
2024-11-13 Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training Nghia Trung Ngo et.al. 2411.08785v1 null
2024-11-13 Measuring similarity between embedding spaces using induced neighborhood graphs Tiago F. Tavares et.al. 2411.08687v1 null
2024-11-13 Zero-shot capability of SAM-family models for bone segmentation in CT scans Caroline Magg et.al. 2411.08629v1 null
2024-11-13 Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent Leonidas Askianakis et.al. 2411.08566v1 null
2024-11-13 CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs Suhas S Kowshik et.al. 2411.08553v1 null
2024-11-13 An Information Theoretic Approach to Operationalize Right to Data Protection Abhinav Java et.al. 2411.08506v1 null
2024-11-13 Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval Yeong-Joon Ju et.al. 2411.08334v1 link
2024-11-12 Retrieval Augmented Time Series Forecasting Kutay Tire et.al. 2411.08249v1 link
2024-11-12 Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing Zitao Shuai et.al. 2411.08196v1 null
2024-11-12 LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models Anoop Cherian et.al. 2411.08027v1 null
2024-11-12 Semantic Sleuth: Identifying Ponzi Contracts via Large Language Models Cong Wu et.al. 2411.07498v1 null
2024-11-11 Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains Katerina Korre et.al. 2411.07417v1 null
2024-11-11 Warmstarting for Scaling Language Models Neeratyoy Mallik et.al. 2411.07340v1 null
2024-11-11 DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning Zecheng Zhang et.al. 2411.07239v1 null
2024-11-11 The Super Weight in Large Language Models Mengxia Yu et.al. 2411.07191v1 link
2024-11-11 NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics David Robinson et.al. 2411.07186v1 null
2024-11-11 SAMPart3D: Segment Any Part in 3D Objects Yunhan Yang et.al. 2411.07184v1 link
2024-11-11 Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models Yanchen Wang et.al. 2411.07121v1 link
2024-11-11 Transformer verbatim in-context retrieval across time and scale Kristijan Armeni et.al. 2411.07075v1 link
2024-11-11 MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps Xue Xia et.al. 2411.06971v1 null
2024-11-11 Robust Fine-tuning of Zero-shot Models via Variance Reduction Beier Zhu et.al. 2411.06966v1 link
2024-11-11 UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models Jiachen Liang et.al. 2411.06921v1 null
2024-11-11 Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning Hongsheng Zhang et.al. 2411.06764v1 null
2024-11-08 End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering Dylan Goetting et.al. 2411.05755v1 link
2024-11-08 Asterisk: Keep it Simple* Andrew Semenov et.al. 2411.05691v1 null
2024-11-08 Assessing Open-Source Large Language Models on Argumentation Mining Subtasks Mohammad Yeghaneh Abkenar et.al. 2411.05639v1 null
2024-11-08 An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking Zijian Chen et.al. 2411.05508v1 null
2024-11-08 WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Shengda Fan et.al. 2411.05451v1 link
2024-11-08 Enhancing Visual Classification using Comparative Descriptors Hankyeol Lee et.al. 2411.05357v1 link
2024-11-08 ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving Tao Ma et.al. 2411.05311v1 null
2024-11-07 Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities Shengzhi Li et.al. 2411.05232v1 link
2024-11-07 Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation Mu Yang et.al. 2411.05141v1 null
2024-11-07 SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Koichi Namekata et.al. 2411.04989v1 null
2024-11-07 DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Gaoyue Zhou et.al. 2411.04983v1 null
2024-11-07 Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games Usman Anwar et.al. 2411.04976v1 link
2024-11-07 In the Era of Prompt Learning with Vision-Language Models Ankit Jha et.al. 2411.04892v1 null
2024-11-07 Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks Sanja Karilanova et.al. 2411.04760v1 null
2024-11-07 Vision Language Models are In-Context Value Learners Yecheng Jason Ma et.al. 2411.04549v1 null
2024-11-07 Best Practices for Distilling Large Language Models into BERT for Web Search Ranking Dezhi Ye et.al. 2411.04539v1 null
2024-11-07 Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models Xinyu Zhang et.al. 2411.04530v1 null
2024-11-07 Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity Robby Costales et.al. 2411.04466v1 null
2024-11-07 AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering Yungeng Liu et.al. 2411.04440v1 link
2024-11-06 RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models Maya Varma et.al. 2411.04097v1 link
2024-11-06 Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models Minh Duc Bui et.al. 2411.03888v1 link
2024-11-06 SA3DIP: Segment Any 3D Instance with Potential 3D Priors Xi Yang et.al. 2411.03819v1 link
2024-11-06 No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages Youssef Mohamed et.al. 2411.03769v1 link
2024-11-06 Zero-shot Dynamic MRI Reconstruction with Global-to-local Diffusion Model Yu Guan et.al. 2411.03723v1 null
2024-11-06 Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction Muhammad Tayyab Khan et.al. 2411.03707v1 null
2024-11-06 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement Ziqi Lu et.al. 2411.03706v1 link
2024-11-06 Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual Question Evaluation in Engineering Rujun Gao et.al. 2411.03659v1 null
2024-11-05 Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry Anurag Acharya et.al. 2411.03542v1 null
2024-11-05 A Mamba Foundation Model for Time Series Forecasting Haoyu Ma et.al. 2411.02941v1 null
2024-11-05 DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark Haodong Li et.al. 2411.02733v1 link
2024-11-04 EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector Deok-Hyeon Cho et.al. 2411.02625v1 link
2024-11-04 MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Sheng-Chieh Lin et.al. 2411.02571v1 null
2024-11-04 TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives Maitreya Patel et.al. 2411.02545v1 null
2024-11-04 A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification Sorouralsadat Fatemi et.al. 2411.02476v1 null
2024-11-04 Do Advanced Language Models Eliminate the Need for Prompt Engineering in Software Engineering? Guoqing Wang et.al. 2411.02093v1 null
2024-11-04 CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching Yu Pan et.al. 2411.02026v1 null
2024-11-04 Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models Sharat Agarwal et.al. 2411.01925v1 null
2024-11-04 ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation Hengkai Tan et.al. 2411.01850v1 null
2024-11-04 DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability Bo Gao et.al. 2411.01819v1 null
2024-11-03 Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups Răzvan-Alexandru Smădu et.al. 2411.01706v1 link
2024-11-03 Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli Matthias Tangemann et.al. 2411.01505v1 link
2024-11-02 Task-Oriented Hierarchical Object Decomposition for Visuomotor Control Jianing Qian et.al. 2411.01284v1 null
2024-11-02 MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction Wang Zhao et.al. 2411.01226v1 link
2024-11-02 Transfer Learning for Finetuning Large Language Models Tobias Strangmann et.al. 2411.01195v1 null
2024-10-31 DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang et.al. 2410.24177v1 null
2024-11-02 $π_0$ : A Vision-Language-Action Flow Model for General Robot Control Kevin Black et.al. 2410.24164v2 null
2024-10-31 Scaling Concept With Text-Guided Diffusion Models Chao Huang et.al. 2410.24151v1 null
2024-10-31 Matchmaker: Self-Improving Large Language Model Programs for Schema Matching Nabeel Seedat et.al. 2410.24105v1 null
2024-10-31 In-Context Fine-Tuning for Time-Series Foundation Models Abhimanyu Das et.al. 2410.24087v1 null
2024-10-31 GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance Shuaihang Yuan et.al. 2410.23978v1 null
2024-10-31 Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model Hao Zhang et.al. 2410.23905v1 link
2024-10-31 EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection Qinqian Lei et.al. 2410.23904v1 link
2024-10-31 The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge Dake Guo et.al. 2410.23815v1 null
2024-10-31 RealMind: Zero-Shot EEG-Based Visual Decoding and Captioning Using Multi-Modal Models Dongyang Li et.al. 2410.23754v1 null
2024-10-30 Multi-student Diffusion Distillation for Better One-step Generators Yanke Song et.al. 2410.23274v1 null
2024-10-30 Partial Channel Dependence with Channel Masks for Time Series Foundation Models Seunghan Lee et.al. 2410.23222v1 null
2024-10-30 Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks Michael Matthews et.al. 2410.23208v1 link
2024-10-30 FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities Jingge Xiao et.al. 2410.23160v1 link
2024-10-30 DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes Jialiang Zhang et.al. 2410.23004v1 null
2024-10-30 SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset Ngoc Dung Huynh et.al. 2410.22648v1 null
2024-10-30 SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms Shuzhen Li et.al. 2410.22646v1 null
2024-10-29 RealCQA-V2 : Visual Premise Proving Saleem Ahmed et.al. 2410.22492v1 null
2024-10-29 Local Policies Enable Zero-shot Long-horizon Manipulation Murtaza Dalal et.al. 2410.22332v1 null
2024-10-29 Are Decoder-Only Large Language Models the Silver Bullet for Code Search? Yuxuan Chen et.al. 2410.22240v1 link
2024-10-29 Active Learning for Vision-Language Models Bardia Safaei et.al. 2410.22187v1 null
2024-10-29 Data Generation for Hardware-Friendly Post-Training Quantization Lior Dikstein et.al. 2410.22110v1 link
2024-10-29 PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement Shutong Jin et.al. 2410.22059v1 null
2024-10-29 Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation Halil Utku Unlu et.al. 2410.21926v1 null
2024-10-30 Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models Lu Yu et.al. 2410.21802v2 link
2024-10-29 Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer Zihan Pengmei et.al. 2410.21683v1 null
2024-10-28 SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval Isidora Chara Tourni et.al. 2410.21501v1 null
2024-10-28 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization Wanhua Li et.al. 2410.21411v1 link
2024-10-28 Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback Nour Jedidi et.al. 2410.21242v1 null
2024-10-28 Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments Marharyta Domnich et.al. 2410.21131v1 link
2024-10-28 Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model Yang Tan et.al. 2410.21127v1 link
2024-10-28 Zero-Shot Action Recognition in Surveillance Videos Joao Pereira et.al. 2410.21113v1 null
2024-10-28 Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation Shuaihang Yuan et.al. 2410.21037v1 null
2024-10-28 Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies Franck Djeumou et.al. 2410.20990v1 null
2024-10-28 DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning Xun Guo et.al. 2410.20964v1 link
2024-10-28 MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Julie Kallini et.al. 2410.20771v1 link
2024-10-28 Face-MLLM: A Large Face Perception Model Haomiao Sun et.al. 2410.20717v1 null
2024-10-28 Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design Xiangxin Zhou et.al. 2410.20688v1 link
2024-10-25 Adversarial Environment Design via Regret-Guided Diffusion Models Hojun Chung et.al. 2410.19715v1 null
2024-10-25 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Xiangyu Zeng et.al. 2410.19702v1 null
2024-10-25 IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation Kaixian Qu et.al. 2410.19697v1 null
2024-10-25 Context-Based Visual-Language Place Recognition Soojin Woo et.al. 2410.19341v1 link
2024-10-25 Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting Xingyu Zhu et.al. 2410.19294v1 null
2024-10-24 Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models Yue Li et.al. 2410.19195v1 null
2024-10-24 AlignCap: Aligning Speech Emotion Captioning to Human Preferences Ziqi Liang et.al. 2410.19134v1 null
2024-10-24 ConceptDrift: Uncovering Biases through the Lens of Foundational Models Cristian Daniel Păduraru et.al. 2410.18970v1 null
2024-10-24 BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning Yujuan Velvin Fu et.al. 2410.18955v1 null
2024-10-24 SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment Caelan Garrett et.al. 2410.18907v1 null
2024-10-24 Probabilistic Language-Image Pre-Training Sanghyuk Chun et.al. 2410.18857v1 link
2024-10-24 Task Calibration: Calibrating Large Language Models on Inference Tasks Yingjie Li et.al. 2410.18764v1 null
2024-10-24 Data Scaling Laws in Imitation Learning for Robotic Manipulation Fanqi Lin et.al. 2410.18647v1 null
2024-10-24 Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data Anup Shirgaonkar et.al. 2410.18588v1 null
2024-10-24 Zero-shot Object Navigation with Vision-Language Models Reasoning Congcong Wen et.al. 2410.18570v1 null
2024-10-24 Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics Jinghao Hu et.al. 2410.18537v1 null
2024-10-24 Scaling up Masked Diffusion Models on Text Shen Nie et.al. 2410.18514v1 link
2024-10-23 Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases Anna Glazkova et.al. 2410.18040v1 null
2024-10-23 Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models Nils Blank et.al. 2410.17772v1 null
2024-10-23 Learning Versatile Skills with Curriculum Masking Yao Tang et.al. 2410.17744v1 link
2024-10-23 Entity-based Reinforcement Learning for Autonomous Cyber Defence Isaac Symes Thompson et.al. 2410.17647v1 link
2024-10-23 Incremental Learning of Affordances using Markov Logic Networks George Potter et.al. 2410.17624v1 null
2024-10-23 Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective Rui Yang et.al. 2410.17600v1 null
2024-10-23 Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors Bang You et.al. 2410.17551v1 null
2024-10-23 Generalizable Motion Planning via Operator Learning Sharath Matada et.al. 2410.17547v1 null
2024-10-23 X-MOBILITY: End-To-End Generalizable Navigation via World Modeling Wei Liu et.al. 2410.17491v1 null
2024-10-22 Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval Yuanmin Tang et.al. 2410.17393v1 null
2024-10-22 Altogether: Image Captioning via Re-aligning Alt-text Hu Xu et.al. 2410.17251v1 link
2024-10-22 LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias Haian Jin et.al. 2410.17242v1 null
2024-10-22 Are Visual-Language Models Effective in Action Recognition? A Comparative Study Mahmoud Ali et.al. 2410.17149v1 null
2024-10-22 LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging Ke Wang et.al. 2410.17146v1 link
2024-10-22 SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine Xiaochen Wang et.al. 2410.17021v1 null
2024-10-22 Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations Cheng Lei et.al. 2410.16953v1 null
2024-10-22 DNAHLM -- DNA sequence and Human Language mixed large language Model Wang Liang et.al. 2410.16917v1 link
2024-10-22 AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models Yongjian Wu et.al. 2410.16820v1 link
2024-10-22 PLDR-LLM: Large Language Model from Power Law Decoder Representations Burc Gokden et.al. 2410.16703v1 link
2024-10-22 GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting Pai Zhu et.al. 2410.16647v1 null
2024-10-21 MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report Samrajya Thapa et.al. 2410.16239v1 link
2024-10-21 IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems Yihuan Mao et.al. 2410.16237v1 null
2024-10-21 Continuous Speech Synthesis using per-token Latent Diffusion Arnon Turetzky et.al. 2410.16048v1 null
2024-10-21 Few-shot target-driven instance detection based on open-vocabulary object detection models Ben Crulis et.al. 2410.16028v1 null
2024-10-21 Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly Junsheng Zhou et.al. 2410.15971v1 null
2024-10-21 Mitigating Object Hallucination via Concentric Causal Attention Yun Xing et.al. 2410.15926v1 link
2024-10-21 MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images Pablo Meseguer et.al. 2410.15881v1 null
2024-10-21 Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images Yiming Li et.al. 2410.15879v1 null
2024-10-21 FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL Woosung Koh et.al. 2410.15876v1 null
2024-10-21 Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment Yankai Jiang et.al. 2410.15744v1 null
2024-10-18 BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities Shaozhe Hao et.al. 2410.14672v1 link
2024-10-18 Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum Ryan Soh-Eun Shim et.al. 2410.14589v1 null
2024-10-18 SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning Magdalena Wysocka et.al. 2410.14399v1 null
2024-10-18 AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios Ziming Huang et.al. 2410.14379v1 link
2024-10-18 Zero-shot Action Localization via the Confidence of Large Vision-Language Models Josiah Aklilu et.al. 2410.14340v1 null
2024-10-18 Storyboard guided Alignment for Fine-grained Video Action Recognition Enqi Liu et.al. 2410.14238v1 null
2024-10-18 Assessing Open-world Forgetting in Generative Image Model Customization Héctor Laria et.al. 2410.14159v1 null
2024-10-17 Measuring and Modifying the Readability of English Texts with GPT-4 Sean Trott et.al. 2410.14028v1 link
2024-10-17 Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Lijie Fan et.al. 2410.13863v1 null
2024-10-17 VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding Runsen Xu et.al. 2410.13860v1 link
2024-10-17 DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Yujie Wei et.al. 2410.13830v1 null
2024-10-17 AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents Ke Yang et.al. 2410.13825v1 null
2024-10-17 Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers Yuchen Liang et.al. 2410.13746v1 null
2024-10-17 ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Shailaja Keyur Sampat et.al. 2410.13662v1 link
2024-10-17 Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? Shailaja Keyur Sampat et.al. 2410.13651v1 link
2024-10-18 Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything Joonhyeon Song et.al. 2410.13621v2 link
2024-10-17 Large Language Models as Narrative-Driven Recommenders Lukas Eberhard et.al. 2410.13604v1 null
2024-10-17 Representing Model Weights with Language using Tree Experts Eliahu Horwitz et.al. 2410.13569v1 null
2024-10-16 In-Context Learning Enables Robot Action Prediction in LLMs Yida Yin et.al. 2410.12782v1 null
2024-10-16 Towards Zero-Shot Camera Trap Image Categorization Jiří Vyskočil et.al. 2410.12769v1 null
2024-10-16 Towards Graph Foundation Models: The Perspective of Zero-shot Reasoning on Knowledge Graphs Kai Wang et.al. 2410.12609v1 null
2024-10-16 A Claim Decomposition Benchmark for Long-form Answer Verification Zhihao Zhang et.al. 2410.12558v1 link
2024-10-16 SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling Loris Gaven et.al. 2410.12481v1 null
2024-10-16 SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset Xuyuan Li et.al. 2410.12399v1 null
2024-10-16 ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs Rui-Chen Zheng et.al. 2410.12359v1 null
2024-10-16 MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation An-Sheng Lee et.al. 2410.12330v1 link
2024-10-16 Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety Lucas Choi et.al. 2410.12225v1 null
2024-10-15 Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming Yilun Hao et.al. 2410.12112v1 null
2024-10-15 FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting Zhe Li et.al. 2410.11802v1 null
2024-10-15 Time-Series Foundation Model for Value-at-Risk Anubha Goel et.al. 2410.11773v1 link
2024-10-15 Zero-shot Model-based Reinforcement Learning using Large Language Models Abdelhakim Benechehab et.al. 2410.11711v1 link
2024-10-15 PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning Man Liu et.al. 2410.11560v1 null
2024-10-15 AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data Xinjie Zhao et.al. 2410.11531v1 null
2024-10-15 Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction Renhang Liu et.al. 2410.11522v1 link
2024-10-15 Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement Zhi Wang et.al. 2410.11448v1 link
2024-10-15 DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM Yingjun Shen et.al. 2410.11373v1 null
2024-10-15 Enhance Graph Alignment for Large Language Models Haitong Luo et.al. 2410.11370v1 null
2024-10-15 In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions Alireza Shamshiri et.al. 2410.11265v1 null
2024-10-14 Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models Jingzhi Bao et.al. 2410.10821v1 link
2024-10-14 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Litu Rout et.al. 2410.10792v1 null
2024-10-14 SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators Rasoul Shafipour et.al. 2410.10714v1 null
2024-10-14 MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer Minghao Zhu et.al. 2410.10589v1 link
2024-10-14 Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios? Zeno Vandenbulcke et.al. 2410.10576v1 null
2024-10-14 Continual Learning Improves Zero-Shot Action Recognition Shreyank N Gowda et.al. 2410.10497v1 null
2024-10-14 Learning to Ground VLMs without Forgetting Aritra Bhowmik et.al. 2410.10491v1 null
2024-10-14 Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts Xu Liu et.al. 2410.10469v1 null
2024-10-14 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting Wanlin Liang et.al. 2410.10412v1 null
2024-10-14 GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation Taha Aksu et.al. 2410.10393v1 link
2024-10-11 Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures Evan Lucas et.al. 2410.08971v1 null
2024-10-11 NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models Zheng Yi Ho et.al. 2410.08970v1 null
2024-10-11 Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images Virmarie Maquiling et.al. 2410.08926v1 null
2024-10-11 SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation Haosheng Li et.al. 2410.08901v1 null
2024-10-11 A Benchmark for Cross-Domain Argumentative Stance Classification on Social Media Jiaqing Yuan et.al. 2410.08900v1 null
2024-10-11 RoRA-VLM: Robust Retrieval-Augmented Vision Language Models Jingyuan Qi et.al. 2410.08876v1 null
2024-10-11 One-shot Generative Domain Adaptation in 3D GANs Ziqiang Li et.al. 2410.08824v1 link
2024-10-11 Zero-Shot Offline Imitation Learning via Optimal Transport Thomas Rupf et.al. 2410.08751v1 link
2024-10-11 Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers Jin Cao et.al. 2410.08688v1 link
2024-10-11 Boosting Open-Vocabulary Object Detection by Handling Background Samples Ruizhe Zeng et.al. 2410.08645v1 null
2024-10-10 LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts Anh-Quan Cao et.al. 2410.08211v1 null
2024-10-10 SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation Hang Yin et.al. 2410.08189v1 null
2024-10-10 On the Evaluation of Generative Robotic Simulations Feng Chen et.al. 2410.08172v1 null
2024-10-10 ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion Zitian Zhang et.al. 2410.08168v1 null
2024-10-10 Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning Vassil Atanassov et.al. 2410.07877v1 null
2024-10-10 RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation Songming Liu et.al. 2410.07864v1 null
2024-10-10 Rewriting Conversational Utterances with Instructed Large Language Models Elnara Galimzhanova et.al. 2410.07797v1 null
2024-10-10 The Power of Input: Benchmarking Zero-Shot Sim-To-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control Alberto Dionigi et.al. 2410.07686v1 null
2024-10-10 Parallel Digital Twin-driven Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks Zhenyu Tao et.al. 2410.07611v1 null
2024-10-10 CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features Po-han Li et.al. 2410.07610v1 null
2024-10-09 AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation Yukang Cao et.al. 2410.07164v1 null
2024-10-09 Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy Tagore Rao Kosireddy et.al. 2410.07118v1 link
2024-10-09 Collusion Detection with Graph Neural Networks Lucas Gomes et.al. 2410.07091v1 null
2024-10-09 Stanceformer: Target-Aware Transformer for Stance Detection Krishna Garg et.al. 2410.07083v1 link
2024-10-09 Compositional Entailment Learning for Hyperbolic Vision-Language Models Avik Pal et.al. 2410.06912v1 null
2024-10-09 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Yushen Chen et.al. 2410.06885v1 link
2024-10-09 K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images Mohamed Deriche et.al. 2410.06825v1 null
2024-10-09 Toward Physics-guided Time Series Embedding Jiaxi Hu et.al. 2410.06651v1 null
2024-10-09 Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments Meng Yu et.al. 2410.06626v1 null
2024-10-09 DCP: Learning Accelerator Dataflow for Neural Network via Propagation Peng Xu et.al. 2410.06553v1 null
2024-10-07 Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality Youngtaek Oh et.al. 2410.05210v1 link
2024-10-07 ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering Francesco Maria Molfese et.al. 2410.05077v1 link
2024-10-07 PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing Feng Tian et.al. 2410.04844v1 null
2024-10-07 LPZero: Language Model Zero-cost Proxy Search from Zero Peijie Dong et.al. 2410.04808v1 null
2024-10-07 Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data Matteo Risso et.al. 2410.04802v1 null
2024-10-07 Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering Kazumoto Nakamura et.al. 2410.04801v1 null
2024-10-07 Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering Zimu Wang et.al. 2410.04752v1 null
2024-10-07 ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction Hyungjin Chung et.al. 2410.04721v1 null
2024-10-07 Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks Yu-Hua Chen et.al. 2410.04702v1 null
2024-10-07 SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech Minchan Kim et.al. 2410.04690v1 null
2024-10-04 GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs Pu Hua et.al. 2410.03645v1 null
2024-10-04 What Matters for Model Merging at Scale? Prateek Yadav et.al. 2410.03617v1 null
2024-10-04 Table Question Answering for Low-resourced Indic Languages Vaishali Pal et.al. 2410.03576v1 link
2024-10-04 STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls Ali Rabiee et.al. 2410.03486v1 null
2024-10-04 Zero-Shot Fact Verification via Natural Logic and Large Language Models Marek Strong et.al. 2410.03341v1 link
2024-10-04 Selective Test-Time Adaptation for Unsupervised Anomaly Detection using Neural Implicit Representations Sameer Ambekar et.al. 2410.03306v1 link
2024-10-04 Comparing zero-shot self-explanations with human rationales in multilingual text classification Stephanie Brandl et.al. 2410.03296v1 null
2024-10-04 Enhanced Transformer architecture for in-context learning of dynamical systems Matteo Rufolo et.al. 2410.03291v1 null
2024-10-04 What do Large Language Models Need for Machine Translation Evaluation? Shenbin Qian et.al. 2410.03278v1 link
2024-10-04 PersoBench: Benchmarking Personalized Response Generation in Large Language Models Saleh Afzoon et.al. 2410.03198v1 null
2024-10-03 Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Nick Jiang et.al. 2410.02762v1 link
2024-10-03 Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Ulyana Piterbarg et.al. 2410.02749v1 link
2024-10-03 Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers Shijie Chen et.al. 2410.02642v1 null
2024-10-03 Plots Unlock Time-Series Understanding in Multimodal Models Mayank Daswani et.al. 2410.02637v1 null
2024-10-03 LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Duy M. H. Nguyen et.al. 2410.02615v1 null
2024-10-03 Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment Kai Liu et.al. 2410.02505v1 link
2024-10-03 Cross-Embodiment Dexterous Grasping with Reinforcement Learning Haoqi Yuan et.al. 2410.02479v1 null
2024-10-03 Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations Bohan Zhou et.al. 2410.02477v1 null
2024-10-03 Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification Yunchuan Guan et.al. 2410.02267v1 link
2024-10-03 Visual Prompting in LLMs for Enhancing Emotion Recognition Qixuan Zhang et.al. 2410.02244v1 null
2024-10-02 An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings Soham Govande et.al. 2410.01704v1 link
2024-10-02 Saliency-Guided DETR for Moment Retrieval and Highlight Detection Aleksandr Gordeev et.al. 2410.01615v1 link
2024-10-02 Coordinate-Based Neural Representation Enabling Zero-Shot Learning for 3D Multiparametric Quantitative MRI Guoyan Lao et.al. 2410.01577v1 null
2024-10-03 EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections Francesc Net et.al. 2410.01536v2 link
2024-10-02 Toward a Holistic Evaluation of Robustness in CLIP Models Weijie Tu et.al. 2410.01534v1 null
2024-10-02 SinkSAM: A Monocular Depth-Guided SAM Framework for Automatic Sinkhole Segmentation Osher Rafaeli et.al. 2410.01473v1 link
2024-10-02 The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs Hong Li et.al. 2410.01417v1 null
2024-10-02 AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment Umair Nawaz et.al. 2410.01407v1 link
2024-10-02 Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots Renkai Wu et.al. 2410.01395v1 link
2024-10-02 Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling Yuguang Yang et.al. 2410.01350v1 null
2024-09-30 Uni $^2$ Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection Yubin Wang et.al. 2409.20558v1 null
2024-09-30 Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos Md Mohaiminul Islam et.al. 2409.20557v1 null
2024-09-30 Robi Butler: Remote Multimodal Interactions with Household Robot Assistant Anxing Xiao et.al. 2409.20548v1 null
2024-09-30 FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing Lingling Cai et.al. 2409.20500v1 null
2024-10-01 Instance-adaptive Zero-shot Chain-of-Thought Prompting Xiaosong Yuan et.al. 2409.20441v2 null
2024-09-30 VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs Ruotong Liao et.al. 2409.20365v1 link
2024-09-30 CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset Akshatha Arodi et.al. 2409.20353v1 link
2024-09-30 RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning Yuxuan Wu et.al. 2409.20291v1 null
2024-09-30 Analysing Zero-Shot Readability-Controlled Sentence Simplification Abdullah Barayan et.al. 2409.20246v1 null
2024-09-30 VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection Huilin Deng et.al. 2409.20146v1 null
2024-09-27 Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs Yanyuan Qiao et.al. 2409.18794v1 null
2024-09-27 When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation Yuli Zhou et.al. 2409.18653v1 link
2024-09-27 Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations Nicolò Penzo et.al. 2409.18602v1 link
2024-09-27 "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models Ricardo Knauer et.al. 2409.18594v1 null
2024-09-27 EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis Haoyu Wang et.al. 2409.18512v1 null
2024-09-27 Exploring Language Model Generalization in Low-Resource Extractive QA Saptarshi Sengupta et.al. 2409.18446v1 link
2024-09-26 AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models Xin Hong et.al. 2409.18339v1 null
2024-09-26 Learning to Drive via Asymmetric Self-Play Chris Zhang et.al. 2409.18218v1 null
2024-09-26 Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Jing He et.al. 2409.18124v1 null
2024-09-26 GSON: A Group-based Social Navigation Framework with Large Multimodal Model Shangyi Luo et.al. 2409.18084v1 null
2024-09-26 FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction Runze He et.al. 2409.18071v1 null
2024-09-26 DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving Dingrui Wang et.al. 2409.18053v1 link
2024-09-26 IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Soeun Lee et.al. 2409.18046v1 link
2024-09-26 Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy Owen Henkel et.al. 2409.17904v1 null
2024-09-26 Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models Hui-Po Wang et.al. 2409.17836v1 link
2024-09-27 Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model Nilanjan Sinhababu et.al. 2409.17745v2 null
2024-09-26 AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status Jinghao Zhang et.al. 2409.17740v1 null
2024-09-26 Robust Ladder Climbing with a Quadrupedal Robot Dylan Vogel et.al. 2409.17731v1 null
2024-09-25 Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? Bowen Zhao et.al. 2409.17080v1 link
2024-09-25 ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis Fangshuo Zhou et.al. 2409.17049v1 link
2024-09-25 Detecting Temporal Ambiguity in Questions Bhawna Piryani et.al. 2409.17046v1 link
2024-09-25 Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness Shixuan Ma et.al. 2409.16914v1 link
2024-09-25 Pruning Multilingual Large Language Models for Multilingual Inference Hwichan Kim et.al. 2409.16911v1 link
2024-09-25 Multi-objective Evolution of Heuristic Using Large Language Model Shunyu Yao et.al. 2409.16867v1 null
2024-09-25 Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation Yulin Wang et.al. 2409.16818v1 link
2024-09-25 Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification Ming Li et.al. 2409.16718v1 link
2024-09-24 Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval Qiuhai Zeng et.al. 2409.16497v1 null
2024-09-24 BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes Kasun Weerakoon et.al. 2409.16484v1 null
2024-09-24 Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Homanga Bharadhwaj et.al. 2409.16283v1 null
2024-09-24 Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation Hannah Kerner et.al. 2409.16252v1 link
2024-09-24 Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech Yunji Chu et.al. 2409.16203v1 null
2024-09-24 HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection Yuqi Ma et.al. 2409.16136v1 null
2024-09-24 Evaluation of state-of-the-art ASR Models in Child-Adult Interactions Aditya Ashvin et.al. 2409.16135v1 null
2024-09-24 Bridging Environments and Language with Rendering Functions and Vision-Language Models Theo Cachet et.al. 2409.16024v1 null
2024-09-24 Finetuning LLMs for Comparative Assessment Tasks Vatsal Raina et.al. 2409.15979v1 null
2024-09-24 StyleSinger 2: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control Yu Zhang et.al. 2409.15977v1 link
2024-09-24 SLIMER-IT: Zero-Shot NER on Italian Language Andrew Zamai et.al. 2409.15933v1 link
2024-09-24 Zero-Shot Detection of AI-Generated Images Davide Cozzolino et.al. 2409.15875v1 null
2024-09-24 Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Sijing Chen et.al. 2409.12139v3 null
2024-09-18 IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition Rui Liu et.al. 2409.12092v1 null
2024-09-18 Efficacy of Synthetic Data as a Benchmark Gaurav Maheshwari et.al. 2409.11968v1 null
2024-09-18 GauTOAO: Gaussian-based Task-Oriented Affordance of Objects Jiawen Wang et.al. 2409.11941v1 null
2024-09-18 LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models Amaia Cardiel et.al. 2409.11919v1 null
2024-09-18 ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images Abhinaw Jagtap et.al. 2409.11874v1 null
2024-09-18 One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation Finn Lukas Busch et.al. 2409.11764v1 null
2024-09-18 Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation Haohan Guo et.al. 2409.11630v1 null
2024-09-17 Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification Frederik Hagelskjær et.al. 2409.11512v1 null
2024-09-17 Enriching Datasets with Demographics through Large Language Models: What's in a Name? Khaled AlNuaimi et.al. 2409.11491v1 null
2024-09-17 Says Who? Effective Zero-Shot Annotation of Focalization Rebecca M. M. Hicke et.al. 2409.11390v1 null
2024-09-17 Towards Time Series Reasoning with LLMs Winnie Chow et.al. 2409.11376v1 null
2024-09-17 Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Gonzalo Martin Garcia et.al. 2409.11355v1 link
2024-09-17 Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora Francesco Nespoli et.al. 2409.11107v1 null
2024-09-17 TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation Yansong Wu et.al. 2409.11047v1 null
2024-09-18 GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models Hanjun Luo et.al. 2409.11022v2 link
2024-09-17 Relative Representations: Topological and Geometric Perspectives Alejandro García-Castellanos et.al. 2409.10967v1 link
2024-09-17 Multi-Floor Zero-Shot Object Navigation Policy Lingfeng Zhang et.al. 2409.10906v1 null
2024-09-17 Implicit Reasoning in Deep Time Series Forecasting Willa Potosnak et.al. 2409.10840v1 null
2024-09-18 Context-Dependent Interactable Graphical User Interface Element Detection for Spatial Computing Applications Shuqing Li et.al. 2409.10811v2 null
2024-09-16 Do Pre-trained Vision-Language Models Encode Object States? Kaleb Newman et.al. 2409.10488v1 null
2024-09-16 Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation Hanbo Bi et.al. 2409.10389v1 null
2024-09-16 beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems Vojtěch Vančura et.al. 2409.10309v1 link
2024-09-16 SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps Jakub Gregorek et.al. 2409.10202v1 null
2024-09-16 SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting Mohammad Nomaan Qureshi et.al. 2409.10161v1 null
2024-09-16 StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Yinghao Aaron Li et.al. 2409.10058v1 null
2024-09-16 A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models Ryandhimas E. Zezario et.al. 2409.09914v1 null
2024-09-15 GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion Vitor Guizilini et.al. 2409.09896v1 null
2024-09-15 PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics Yuxuan Liu et.al. 2409.09811v1 null
2024-09-15 Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models Yuan-Hong Liao et.al. 2409.09788v1 null
2024-09-13 Data Efficient Child-Adult Speaker Diarization with Simulated Conversations Anfeng Xu et.al. 2409.08881v1 link
2024-09-13 A RAG Approach for Generating Competency Questions in Ontology Engineering Xueli Pan et.al. 2409.08820v1 null
2024-09-13 Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling Jialu Tang et.al. 2409.08788v1 null
2024-09-13 HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit Yang Li et.al. 2409.08767v1 null
2024-09-13 DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset Jiawei Du et.al. 2409.08731v1 link
2024-09-13 Eir: Thai Medical Large Language Models Yutthakorn Thiprak et.al. 2409.08523v1 null
2024-09-13 GroundingBooth: Grounding Text-to-Image Customization Zhexiao Xiong et.al. 2409.08520v1 null
2024-09-13 Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Haoxuan Wang et.al. 2409.08513v1 link
2024-09-12 SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Helin Wang et.al. 2409.08425v1 link
2024-09-12 Sequential Discrete Action Selection via Blocking Conditions and Resolutions Liam Merz Hoffmeister et.al. 2409.08410v1 null
2024-09-12 DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Thomas Hanwen Zhu et.al. 2409.08278v1 null
2024-09-12 AnySkin: Plug-and-play Skin Sensing for Robotic Touch Raunaq Bhirangi et.al. 2409.08276v1 null
2024-09-12 Fine-tuning Large Language Models for Entity Matching Aaron Steiner et.al. 2409.08185v1 link
2024-09-12 The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal Huiyuan Xie et.al. 2409.08098v1 null
2024-09-12 EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance Zicheng Duan et.al. 2409.08091v1 link
2024-09-12 Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations Wangjin Zhou et.al. 2409.08039v1 null
2024-09-12 From Explanations to Action: A Zero-Shot, Theory-Driven LLM Framework for Student Performance Feedback Vinitra Swamy et.al. 2409.08027v1 null
2024-09-11 Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models Matthieu Dubois et.al. 2409.07615v1 null
2024-09-11 Minimizing Embedding Distortion for Robust Out-of-Distribution Performance Tom Shaked et.al. 2409.07582v1 null
2024-09-11 SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Helin Wang et.al. 2409.07556v1 link
2024-09-11 Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence Luo Ji et.al. 2409.07341v1 null
2024-09-11 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Weixi Weng et.al. 2409.07331v1 null
2024-09-11 PaveSAM Segment Anything for Pavement Distress Neema Jakisa Owor et.al. 2409.07295v1 null
2024-09-11 A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study Faiz Ali Shah et.al. 2409.07162v1 link
2024-09-11 Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment Tien-Hong Lo et.al. 2409.07151v1 null
2024-09-11 Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations Keumgang Cha et.al. 2409.07048v1 null
2024-09-10 ExIQA: Explainable Image Quality Assessment Using Distortion Attributes Sepehr Kazemi Ranjbar et.al. 2409.06853v1 null
2024-09-10 Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts Eleftheria Briakou et.al. 2409.06790v1 null
2024-09-11 EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis Danli Shi et.al. 2409.06644v2 null
2024-09-10 DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots Maria Bauza et.al. 2409.06613v1 null
2024-09-10 An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition Yi-Cheng Wang et.al. 2409.06468v1 null
2024-09-10 SpeechTaxi: On Multilingual Semantic Speech Classification Lennart Keller et.al. 2409.06372v1 null
2024-09-10 MAGDA: Multi-agent guideline-driven diagnostic assistance David Bani-Harouni et.al. 2409.06351v1 null
2024-09-10 PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching Daniel Rose et.al. 2409.06316v1 null
2024-09-10 Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings Sakshi Deo Shukla et.al. 2409.06222v1 link
2024-09-10 Revisiting Prompt Pretraining of Vision-Language Models Zhenyuan Chen et.al. 2409.06166v1 null
2024-09-09 Differentiable programming across the PDE and Machine Learning barrier Nacime Bouziani et.al. 2409.06085v1 null
2024-09-09 FairHome: A Fair Housing and Fair Lending Dataset Anusha Bagalkotkar et.al. 2409.05990v1 null
2024-09-09 Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Haritheja Etukuru et.al. 2409.05865v1 link
2024-09-10 Evaluating Multiview Object Consistency in Humans and Image Models Tyler Bonnen et.al. 2409.05862v2 link
2024-09-09 A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation Qi Jiang et.al. 2409.05809v1 null
2024-09-09 AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations Jingtao Li et.al. 2409.05679v1 null
2024-09-09 Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! Yuchen Shen et.al. 2409.05672v1 null
2024-09-09 CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning Jinwei He et.al. 2409.05559v1 null
2024-09-09 EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels Qingyao Tian et.al. 2409.05442v1 link
2024-09-09 From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models Tessa Pulli et.al. 2409.05413v1 null
2024-09-09 NLLB-E5: A Scalable Multilingual Retrieval Model Arkadeep Acharya et.al. 2409.05401v1 null
2024-09-09 IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS Ashwin Sankar et.al. 2409.05356v1 link
2024-09-06 FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning Yunhao Bai et.al. 2409.04298v1 link
2024-09-06 Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering Jan Hofmann et.al. 2409.04122v1 null
2024-09-06 UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Yicheng Fu et.al. 2409.04081v1 null
2024-09-06 AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model Zeyu Zhang et.al. 2409.04073v1 link
2024-09-06 Refining Wikidata Taxonomy using Large Language Models Yiwen Peng et.al. 2409.04056v1 link
2024-09-05 Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning Isaac Ray et.al. 2409.03938v1 null
2024-09-05 A deep learning approach to wall-shear stress quantification: From numerical training to zero-shot experimental application Esther Lagemann et.al. 2409.03933v1 null
2024-09-05 Few-shot Adaptation of Medical Vision-Language Models Fereshteh Shakeri et.al. 2409.03868v1 link
2024-09-05 View-Invariant Policy Learning via Zero-Shot Novel View Synthesis Stephen Tian et.al. 2409.03685v1 null
2024-09-05 Text-Guided Mixup Towards Long-Tailed Image Categorization Richard Franklin et.al. 2409.03583v1 link
2024-09-05 FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation Xi Chen et.al. 2409.03525v1 null
2024-09-05 Have Large Vision-Language Models Mastered Art History? Ombretta Strafforello et.al. 2409.03521v1 null
2024-09-05 RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning Lawrence Yunliang Chen et.al. 2409.03403v1 null
2024-09-05 Bringing the RT-1-X Foundation Model to a SCARA robot Jonathan Salzer et.al. 2409.03299v1 null
2024-09-05 LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts Henrique Da Silva Gameiro et.al. 2409.03291v1 link
2024-09-05 iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models Yassir Lairgi et.al. 2409.03284v1 link
2024-09-05 FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications Hao-Han Guo et.al. 2409.03283v1 null
2024-09-04 Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection Kaiqing Lin et.al. 2409.02664v1 null
2024-09-04 Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation Tiantian Zhang et.al. 2409.02567v1 link
2024-09-04 StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models Wen Li et.al. 2409.02543v1 link
2024-09-04 Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts Arianna Muti et.al. 2409.02519v1 null
2024-09-04 Dispelling Four Challenges in Inertial Motion Tracking with One Recurrent Inertial Graph-based Estimator (RING) Simon Bachhuber et.al. 2409.02502v1 null
2024-09-04 Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization Cho-Ying Wu et.al. 2409.02486v1 null
2024-09-04 Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning Guanwen Xie et.al. 2409.02428v1 null
2024-09-03 Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems Sanjita Prajapati et.al. 2409.02278v1 null
2024-09-05 LinFusion: 1 GPU, 1 Minute, 16K Image Songhua Liu et.al. 2409.02097v2 link
2024-09-03 DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Wenbo Hu et.al. 2409.02095v1 link
2024-08-30 Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding Gueter Josmy Faure et.al. 2408.17443v1 link
2024-08-30 VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Mouxiang Chen et.al. 2408.17253v1 link
2024-08-30 Reasoning AI Performance Degradation in 6G Networks with Large Language Models Liming Huang et.al. 2408.17097v1 null
2024-08-30 Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning Fengyuan Dai et.al. 2408.17083v1 null
2024-08-29 Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD Ondřej Pražák et.al. 2408.16893v1 link
2024-08-29 Fluent and Accurate Image Captioning with a Self-Trained Reward Model Nicholas Moratelli et.al. 2408.16827v1 null
2024-08-29 PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning Noor Hussein et.al. 2408.16769v1 link
2024-08-29 SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Ziyu Guo et.al. 2408.16768v1 link
2024-08-29 Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge Beidi Dong et.al. 2408.16749v1 null
2024-08-29 LLMs generate structurally realistic social networks but overestimate political homophily Serina Chang et.al. 2408.16629v1 link
2024-08-29 Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning Zhengqing Gao et.al. 2408.16486v1 link
2024-08-29 WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding Mohan Li et.al. 2408.16423v1 null
2024-08-29 Text-Enhanced Zero-Shot Action Recognition: A training-free approach Massimo Bosetti et.al. 2408.16412v1 null
2024-08-29 Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning Luyao Tang et.al. 2408.16310v1 link
2024-08-29 Training-free Video Temporal Grounding using Large-scale Pre-trained Models Minghang Zheng et.al. 2408.16219v1 link
2024-08-28 CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases Yannis Chronis et.al. 2408.16170v1 null
2024-08-29 Spatio-Temporal Context Prompting for Zero-Shot Action Detection Wei-Jhe Huang et.al. 2408.15996v2 null
2024-08-28 Multi-modal Adversarial Training for Zero-Shot Voice Cloning John Janiczek et.al. 2408.15916v1 null
2024-08-28 Visual Prompt Engineering for Medical Vision Language Models in Radiology Stefan Denner et.al. 2408.15802v1 null
2024-08-28 Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions Huachuan Qiu et.al. 2408.15787v1 link
2024-08-28 LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models Max Ploner et.al. 2408.15729v1 null
2024-08-28 Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas Fabio Quattrini et.al. 2408.15660v1 link
2024-08-28 Learning dynamics models for velocity estimation in autonomous racing Jan Węgrzynowski et.al. 2408.15610v1 null
2024-08-28 Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation Ziqian Ning et.al. 2408.15474v1 null
2024-08-28 Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance Kunpeng Wang et.al. 2408.15063v2 link
2024-08-26 MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs Ye Qiao et.al. 2408.15034v1 null
2024-08-27 Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning Sakhinana Sagar Srinivas et.al. 2408.14964v1 null
2024-08-27 ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning Wenjin Hou et.al. 2408.14868v1 null
2024-08-27 Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics Yixuan Huang et.al. 2408.14769v1 null
2024-08-26 Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning Xinyang Gu et.al. 2408.14472v1 link
2024-08-28 Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study Liuchang Xu et.al. 2408.14438v2 null
2024-08-26 Uncertainties of Latent Representations in Computer Vision Michael Kirchhof et.al. 2408.14281v1 null
2024-08-26 Self-supervised Speech Representations Still Struggle with African American Vernacular English Kalvin Chang et.al. 2408.14262v1 link
2024-08-26 AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework Jie Feng et.al. 2408.13986v1 link
2024-08-25 OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation Muhammad Rameez ur Rahman et.al. 2408.13936v1 link
2024-08-25 Infrared Domain Adaptation with Zero-Shot Quantization Burak Sevsay et.al. 2408.13925v1 null
2024-08-25 LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback Tanushree Banerjee et.al. 2408.13915v1 null
2024-08-25 Splatt3R: Zero-shot Gaussian Splatting from Uncalibarated Image Pairs Brandon Smart et.al. 2408.13912v1 null
2024-08-25 Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization Jia-Run Du et.al. 2408.13777v1 link
2024-08-23 On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning Tiago Tavares et.al. 2408.13068v1 null
2024-08-23 WildFusion: Individual Animal Identification with Calibrated Similarity Fusion Vojtěch Cermak et.al. 2408.12934v1 link
2024-08-23 Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey Yichi Zhang et.al. 2408.12889v1 link
2024-08-23 Predicting Affective States from Screen Text Sentiment Songyan Teng et.al. 2408.12844v1 null
2024-08-23 Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery Zhenyuan Yang et.al. 2408.12821v1 null
2024-08-23 VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models Purushothaman Natarajan et.al. 2408.12808v1 link
2024-08-23 Cap2Sum: Learning to Summarize Videos by Generating Captions Cairong Zhao et.al. 2408.12800v1 null
2024-08-22 Segment Anything Model for Grain Characterization in Hard Drive Design Kai Nichols et.al. 2408.12732v1 null
2024-08-22 Cell-ontology guided transcriptome foundation model Xinyu Yuan et.al. 2408.12373v1 null
2024-08-22 SAM-SP: Self-Prompting Makes SAM Great Again Chunpeng Zhou et.al. 2408.12364v1 null
2024-08-22 Adapt CLIP as Aggregation Instructor for Image Dehazing Xiaozhe Zhang et.al. 2408.12317v1 null
2024-08-22 Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations Kai Tzu-iunn Ong et.al. 2408.12315v1 null
2024-08-23 Tactile-Morph Skills: Energy-Based Control Meets Data-Driven Learning Anran Zhang et.al. 2408.12285v2 null
2024-08-22 Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning Ziming Liu et.al. 2408.12253v1 null
2024-08-22 LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction Aishik Nagar et.al. 2408.12249v1 null
2024-08-22 PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph Yijin Xu et.al. 2408.12248v1 null
2024-08-22 OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion Guoting Wei et.al. 2408.12246v1 link
2024-08-23 Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment Kun Luo et.al. 2408.12194v2 null
2024-08-21 Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction Anthony GX-Chen et.al. 2408.11816v1 null
2024-08-21 EmbodiedSAM: Online Segment Any 3D Thing in Real Time Xiuwei Xu et.al. 2408.11811v1 null
2024-08-21 Iterative Object Count Optimization for Text-to-image Diffusion Models Oz Zafar et.al. 2408.11721v1 null
2024-08-21 Memorization In In-Context Learning Shahriar Golchin et.al. 2408.11546v1 null
2024-08-21 Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech Anastasia Avdeeva et.al. 2408.11528v1 null
2024-08-21 XDT-CXR: Investigating Cross-Disease Transferability in Zero-Shot Binary Classification of Chest X-Rays Umaima Rahman et.al. 2408.11493v1 link
2024-08-21 Enabling Small Models for Zero-Shot Classification through Model Label Learning Jia Zhang et.al. 2408.11449v1 null
2024-08-21 EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning Bohao Xing et.al. 2408.11424v1 link
2024-08-21 Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies Sai Koneru et.al. 2408.11327v1 null
2024-08-21 Towards Evaluating Large Language Models on Sarcasm Understanding Yazhou Zhang et.al. 2408.11319v1 null
2024-08-21 CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network Zijian Zhao et.al. 2408.10919v2 null
2024-08-20 ViLReF: A Chinese Vision-Language Retinal Foundation Model Shengzhu Yang et.al. 2408.10894v1 link
2024-08-20 Open 3D World in Autonomous Driving Xinlong Cheng et.al. 2408.10880v1 null
2024-08-20 SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS Karl El Hajal et.al. 2408.10771v1 null
2024-08-20 Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian Cem Üyük et.al. 2408.10724v1 null
2024-08-20 AnyGraph: Graph Foundation Model in the Wild Lianghao Xia et.al. 2408.10700v1 link
2024-08-20 Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches Yanjie Dong et.al. 2408.10691v1 null
2024-08-20 A Review of Human-Object Interaction Detection Yuxiao Wang et.al. 2408.10641v1 null
2024-08-20 LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models Yupeng Su et.al. 2408.10631v1 link
2024-08-20 Generalizable Facial Expression Recognition Yuhang Zhang et.al. 2408.10614v1 link
2024-08-19 SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models Anke Tang et.al. 2408.10174v1 link
2024-08-19 Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track Feiyu Pan et.al. 2408.10125v1 null
2024-08-19 GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization Ran Liu et.al. 2408.10115v1 link
2024-08-19 Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision Zhijun Jia et.al. 2408.10096v1 null
2024-08-19 CLIPCleaner: Cleaning Noisy Labels with CLIP Chen Feng et.al. 2408.10012v1 link
2024-08-19 Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype Yadong Lu et.al. 2408.09984v1 null
2024-08-19 Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision Dario Zanca et.al. 2408.09948v1 null
2024-08-19 DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery Corentin Dumery et.al. 2408.09928v1 null
2024-08-19 SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images Sihan Yang et.al. 2408.09886v1 link
2024-08-19 Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving Jun Yan et.al. 2408.09839v1 link
2024-08-16 DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models Eman Ali et.al. 2408.08855v1 null
2024-08-16 ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis Yubao Zhao et.al. 2408.08849v1 link
2024-08-16 EasyRec: Simple yet Effective Language Models for Recommendation Xubin Ren et.al. 2408.08821v1 link
2024-08-16 ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language Yongkang Liu et.al. 2408.08724v1 null
2024-08-16 TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning Miaoge Li et.al. 2408.08703v1 null
2024-08-16 A Mean Field Ansatz for Zero-Shot Weight Transfer Xingyuan Chen et.al. 2408.08681v1 null
2024-08-16 GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model Xavier Riley et.al. 2408.08653v1 null
2024-08-16 Persona is a Double-edged Sword: Enhancing the Zero-shot Reasoning by Ensembling the Role-playing and Neutral Prompts Junseok Kim et.al. 2408.08631v1 null
2024-08-16 Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation Tri Ton et.al. 2408.08591v1 null
2024-08-16 CommunityKG-RAG: Leveraging Community Structures in Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking Rong-Ching Chang et.al. 2408.08535v1 null
2024-08-15 ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws Ruihang Li et.al. 2408.08310v1 null
2024-08-16 Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion Abeer Aldayel et.al. 2408.08212v2 null
2024-08-15 Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging Stefano Woerner et.al. 2408.08058v1 link
2024-08-15 LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning Jiajie Li et.al. 2408.07981v1 null
2024-08-15 Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training Yiming Li et.al. 2408.07919v1 link
2024-08-15 DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions Ryosuke Korekata et.al. 2408.07910v1 null
2024-08-15 A Spitting Image: Modular Superpixel Tokenization in Vision Transformers Marius Aasan et.al. 2408.07680v2 link
2024-08-14 Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques Ivory Yang et.al. 2408.07676v1 null
2024-08-14 SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning Jianye Xu et.al. 2408.07644v1 link
2024-08-14 Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health Yongquan Hu et.al. 2408.07313v1 null
2024-08-14 MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing Yongquan Hu et.al. 2408.07311v1 null
2024-08-14 GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval Zechen Bai et.al. 2408.07249v1 null
2024-08-13 Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Pranav Putta et.al. 2408.07199v1 null
2024-08-13 PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping Subash Khanal et.al. 2408.07050v1 link
2024-08-15 Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 Osher Rafaeli et.al. 2408.06970v2 null
2024-08-13 How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts Huichen Will Wang et.al. 2408.06837v1 null
2024-08-13 PRESENT: Zero-Shot Text-to-Prosody Control Perry Lam et.al. 2408.06827v1 link
2024-08-13 Visual Neural Decoding via Improved Visual-EEG Semantic Consistency Hongzhou Chen et.al. 2408.06788v1 null
2024-08-13 Do Vision-Language Foundational models show Robust Visual Perception? Shivam Chandhok et.al. 2408.06781v1 link
2024-08-13 DC3DO: Diffusion Classifier for 3D Objects Nursena Koprucu et.al. 2408.06693v1 link
2024-08-13 CROME: Cross-Modal Adapters for Efficient Multimodal LLM Sayna Ebrahimi et.al. 2408.06610v1 null
2024-08-12 UniT: Unified Tactile Representation for Robot Learning Zhengtong Xu et.al. 2408.06481v1 link
2024-08-12 From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model Athulya Sundaresan Geetha et.al. 2408.06305v1 null
2024-08-12 Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM Trisha Das et.al. 2408.06285v1 null
2024-08-12 A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution Sampath Rajapaksha et.al. 2408.06272v1 null
2024-08-12 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) Jaydeep Rade et.al. 2408.06244v1 null
2024-08-12 Zero-shot 3D Segmentation of Abdominal Organs in CT Scans Using Segment Anything Model 2: Adapting Video Tracking Capabilities for 3D Medical Imaging Yosuke Yamagishi et.al. 2408.06170v1 null
2024-08-12 OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning Mushui Liu et.al. 2408.06158v1 link
2024-08-12 Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction Jakob Thumm et.al. 2408.06105v1 link
2024-08-12 Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces Junrui Zhang et.al. 2408.06083v1 null
2024-08-12 Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games Chiu-Chou Lin et.al. 2408.06051v1 link
2024-08-12 Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection Yixin Guo et.al. 2408.05974v1 link
2024-08-09 Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement Weiqing Yang et.al. 2408.05006v1 null
2024-08-09 SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement Chaofan Li et.al. 2408.04919v1 null
2024-08-09 Towards a Generative Approach for Emotion Detection and Reasoning Ankita Bhaumik et.al. 2408.04906v1 null
2024-08-09 ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation Mengcheng Lan et.al. 2408.04883v1 link
2024-08-09 On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey Jingcai Guo et.al. 2408.04879v1 link
2024-08-09 ChatGPT Meets Iris Biometrics Parisa Farmanifard et.al. 2408.04868v1 null
2024-08-09 An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting Rui Cao et.al. 2408.04867v1 link
2024-08-09 One Shot is Enough for Sequential Infrared Small Target Segmentation Bingbing Dan et.al. 2408.04823v1 link
2024-08-09 FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers Joshua Nathaniel Williams et.al. 2408.04816v1 link
2024-08-08 Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 Andrew Seohwan Yu et.al. 2408.04762v1 null
2024-08-08 Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics Ruining Li et.al. 2408.04631v1 null
2024-08-08 SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation Jieming Yu et.al. 2408.04593v1 null
2024-08-08 SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals Haoran Zheng et.al. 2408.04575v1 null
2024-08-08 Conversational Prompt Engineering Liat Ein-Dor et.al. 2408.04560v1 null
2024-08-08 Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation Daniele Rege Cambrin et.al. 2408.04523v1 link
2024-08-08 Model-Based Transfer Learning for Contextual Reinforcement Learning Jung-Hoon Cho et.al. 2408.04498v1 link
2024-08-08 Towards Synergistic Deep Learning Models for Volumetric Cirrhotic Liver Segmentation in MRIs Vandan Gorade et.al. 2408.04491v1 null
2024-08-08 KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination Yin Gu et.al. 2408.04336v1 null
2024-08-08 Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP François Remy et.al. 2408.04303v1 link
2024-08-08 Learning to Rewrite: Generalized LLM-Generated Text Detection Wei Hao et.al. 2408.04237v1 null
2024-08-07 Achieving Human Level Competitive Robot Table Tennis David B. D'Ambrosio et.al. 2408.03906v1 null
2024-08-07 Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond Beomseok Lee et.al. 2408.03900v1 link
2024-08-07 Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning Zi-Yi Dou et.al. 2408.03567v1 null
2024-08-07 Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving Amirhosein Chahe et.al. 2408.03516v1 null
2024-08-07 Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval Iman Azimi et.al. 2408.02964v2 link
2024-08-06 Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps Yifan Zhu et.al. 2408.02949v1 null
2024-08-05 Interactive 3D Medical Image Segmentation with SAM 2 Chuyun Shen et.al. 2408.02635v1 link
2024-08-05 Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection Ting Lei et.al. 2408.02484v1 link
2024-08-07 TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments Daeun Song et.al. 2408.02454v2 null
2024-08-05 Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages Carlos Mullov et.al. 2408.02290v1 null
2024-08-05 Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes Dimitris Angelis et.al. 2408.02275v1 null
2024-08-05 Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts Andong Tan et.al. 2408.02265v1 null
2024-08-05 Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets Lucas Choi et.al. 2408.02244v1 null
2024-08-05 Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings Md. Arid Hasan et.al. 2408.02237v1 null
2024-08-05 ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning Yuxuan Wang et.al. 2408.02210v1 null
2024-08-05 Large-scale Deployment of Vision-based Tactile Sensors on Multi-fingered Grippers Meng Wang et.al. 2408.02206v1 null
2024-08-02 Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features Mengyu Bu et.al. 2408.01394v1 link
2024-08-02 Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation Jheng-Hong Yang et.al. 2408.01363v1 null
2024-08-02 Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks Anders Giovanni Møller et.al. 2408.01346v1 null
2024-08-02 Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework Liuyuan Wen et.al. 2408.01284v1 link
2024-08-02 HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling YiFan Hao et.al. 2408.01230v1 link
2024-08-05 Agentic LLM Workflows for Generating Patient-Friendly Medical Reports Malavikha Sudarshan et.al. 2408.01112v2 link
2024-08-02 An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search Hung-Nghiep Tran et.al. 2408.01094v1 null
2024-08-02 UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents Yi Tu et.al. 2408.01038v1 null
2024-08-01 Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper) Bin Han et.al. 2408.00932v1 null
2024-08-01 Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation Siyu Jiao et.al. 2408.00744v1 link
2024-08-01 Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions Guangzhi Xiong et.al. 2408.00727v1 link
2024-08-01 SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data Yichen Lu et.al. 2408.00624v1 link
2024-08-01 A new approach for encoding code and assisting code understanding Mengdan Fan et.al. 2408.00521v1 null
2024-08-01 GalleryGPT: Analyzing Paintings with Large Multimodal Models Yi Bin et.al. 2408.00491v1 link
2024-08-01 SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement Ze Wang et.al. 2408.00486v1 null
2024-08-01 Few-shot Defect Image Generation based on Consistency Modeling Qingfeng Shi et.al. 2408.00372v1 link
2024-08-01 IN-Sight: Interactive Navigation through Sight Philipp Schoch et.al. 2408.00343v1 null
2024-07-31 Open-Vocabulary Audio-Visual Semantic Segmentation Ruohao Guo et.al. 2407.21721v1 null
2024-07-31 Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation Xiang Luo et.al. 2407.21633v1 link
2024-07-31 EZSR: Event-based Zero-Shot Recognition Yan Yang et.al. 2407.21616v1 null
2024-07-31 Fine-gained Zero-shot Video Sampling Dengsheng Chen et.al. 2407.21475v1 null
2024-07-31 Generalized Tampered Scene Text Detection in the era of Generative AI Chenfan Qu et.al. 2407.21422v1 null
2024-07-31 Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs Elan Markowitz et.al. 2407.21358v1 link
2024-07-31 DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations Dongwon Son et.al. 2407.21267v1 null
2024-07-30 Learning Stable Robot Grasping with Transformer-based Tactile Control Policies En Yen Puang et.al. 2407.21172v1 link
2024-07-30 Zero Shot Health Trajectory Prediction Using Transformer Pawel Renc et.al. 2407.21124v1 link
2024-07-30 Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian Serena Auriemma et.al. 2407.20654v1 null
2024-07-30 Pruning Large Language Models with Semi-Structural Adaptive Sparse Training Weiyu Huang et.al. 2407.20584v1 link
2024-07-29 Evaluating Large Language Models for automatic analysis of teacher simulations David de-Fitero-Dominguez et.al. 2407.20360v1 null
2024-07-29 Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing Ekaterina Iakovleva et.al. 2407.20232v1 null
2024-07-29 QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval Hongming Tan et.al. 2407.20207v1 null
2024-07-29 Diffusion Feedback Helps CLIP See Better Wenxuan Wang et.al. 2407.20171v1 link
2024-07-29 Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations Fangyijie Wang et.al. 2407.20072v1 link
2024-07-29 Leveraging Foundation Models for Zero-Shot IoT Sensing Dinghao Xue et.al. 2407.19893v1 link
2024-07-29 Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model Zhenyu Tao et.al. 2407.19765v1 null
2024-07-29 Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation Manish Bhattarai et.al. 2407.19619v1 null
2024-07-29 AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs Muhammad Arbab Arshad et.al. 2407.19617v1 null
2024-07-28 XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training Biao Wu et.al. 2407.19546v1 link
2024-07-28 Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis Fatema Tuj Johora Faria et.al. 2407.19528v1 link
2024-07-26 Automatic Detection of Moral Values in Music Lyrics Vjosa Preniqi et.al. 2407.18787v1 link
2024-07-26 Adversarial Robustification via Text-to-Image Diffusion Models Daewon Choi et.al. 2407.18658v1 link
2024-07-29 Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks Mahmoud Salhab et.al. 2407.18571v2 null
2024-07-26 Is larger always better? Evaluating and prompting large language models for non-generative medical tasks Yinghao Zhu et.al. 2407.18525v1 link
2024-07-26 Lensless fiber endomicroscopic phase imaging with speckle-conditioned diffusion model Zhaoqing Chen et.al. 2407.18456v1 null
2024-07-26 HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors Ashkan Ganj et.al. 2407.18443v1 link
2024-07-25 HDL-GPT: High-Quality HDL is All You Need Bhuvnesh Kumar et.al. 2407.18423v1 null
2024-07-25 Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation Lining Yu et.al. 2407.18390v1 null
2024-07-25 Robust Claim Verification Through Fact Detection Nazanin Jafari et.al. 2407.18367v1 link
2024-07-25 SSTD: Stripe-Like Space Target Detection using Single-Point Supervision Zijian Zhu et.al. 2407.18097v1 null
2024-07-25 Audio Entailment: Assessing Deductive Reasoning for Audio Understanding Soham Deshmukh et.al. 2407.18062v1 link
2024-07-25 Difficulty Estimation and Simplification of French Text Using LLMs Henri Jamet et.al. 2407.18061v1 null
2024-07-25 I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition Yannis Vasilakis et.al. 2407.18058v1 link
2024-07-25 Amortized Active Learning for Nonparametric Functions Cen-You Li et.al. 2407.17992v1 null
2024-07-25 BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Xiang Zhang et.al. 2407.17952v1 null
2024-07-25 DAM: Towards A Foundation Model for Time Series Forecasting Luke Darlow et.al. 2407.17880v1 null
2024-07-25 Exploring Description-Augmented Dataless Intent Classification Ruoyu Hu et.al. 2407.17862v1 link
2024-07-25 Scaling A Simple Approach to Zero-Shot Speech Recognition Jinming Zhao et.al. 2407.17852v1 link
2024-07-24 Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning Hongwei Jin et.al. 2407.17545v1 link
2024-07-24 3D Question Answering for City Scene Understanding Penglei Sun et.al. 2407.17398v1 null
2024-07-24 Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition Ke Bao et.al. 2407.17344v1 null
2024-07-24 Multi-label Cluster Discrimination for Visual Representation Learning Xiang An et.al. 2407.17331v1 link
2024-07-24 DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture Akshaya Athwale et.al. 2407.17328v1 null
2024-07-24 Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification? Johannes Kiechle et.al. 2407.17219v1 link
2024-07-24 Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model Jan Lehečka et.al. 2407.17167v1 null
2024-07-23 PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer Samhita Marri et.al. 2407.16829v1 null
2024-07-23 Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition Abhi Kamboj et.al. 2407.16803v1 null
2024-07-23 Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions Kai Liu et.al. 2407.16725v1 link
2024-07-23 Lawma: The Power of Specialization for Legal Tasks Ricardo Dominguez-Olmedo et.al. 2407.16615v1 null
2024-07-23 Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning Xinwei Liu et.al. 2407.16307v1 link
2024-07-23 PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment Jiahuan Li et.al. 2407.16222v1 link
2024-07-23 No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation Shuai Chen et.al. 2407.16182v1 null
2024-07-23 Improved Few-Shot Image Classification Through Multiple-Choice Questions Dipika Khullar et.al. 2407.16145v1 null
2024-07-22 Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models Raza Imam et.al. 2407.15913v1 link
2024-07-22 AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description Junyu Xie et.al. 2407.15850v1 link
2024-07-22 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Vikash Sehwag et.al. 2407.15811v1 null
2024-07-22 AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection Yunkang Cao et.al. 2407.15795v1 link
2024-07-22 CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning Emanuele Frascaroli et.al. 2407.15793v1 link
2024-07-22 Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders Laura Niss et.al. 2407.15731v1 null
2024-07-23 Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition Jinfu Liu et.al. 2407.15706v2 link
2024-07-22 SLVideo: A Sign Language Video Moment Retrieval Framework Gonçalo Vinagre Martins et.al. 2407.15668v1 null
2024-07-23 Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning Xiangyan Qu et.al. 2407.15613v2 link
2024-07-22 High-flexibility reconstruction of small-scale motions in wall turbulence using a generalized zero-shot learning Haokai Wu et.al. 2407.15604v1 null
2024-07-22 X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images Yunpeng Wang et.al. 2407.15356v1 link
2024-07-19 Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models Xuenan Xu et.al. 2407.14355v1 link
2024-07-19 Multimodal Misinformation Detection using Large Vision-Language Models Sahar Tahmasebi et.al. 2407.14321v1 null
2024-07-19 Foundation Models for Autonomous Robots in Unstructured Environments Hossein Naderi et.al. 2407.14296v1 null
2024-07-19 OpenSU3D: Open World 3D Scene Understanding using Foundation Models Rafay Mohiuddin et.al. 2407.14279v1 null
2024-07-19 ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation Qing Xu et.al. 2407.14153v1 link
2024-07-19 Zero-Shot Underwater Gesture Recognition Sandipan Sarma et.al. 2407.14103v1 link
2024-07-19 Multi-modal Relation Distillation for Unified 3D Representation Learning Huiqun Wang et.al. 2407.14007v1 null
2024-07-19 Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models Quan Li et.al. 2407.13989v1 null
2024-07-18 Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning Ans Munir et.al. 2407.13715v1 link
2024-07-18 MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis Ziming Zhong et.al. 2407.13675v1 link
2024-07-18 Robust Calibration of Large Vision-Language Adapters Balamurali Murugesan et.al. 2407.13588v1 link
2024-07-18 Towards Zero-Shot Multimodal Machine Translation Matthieu Futeral et.al. 2407.13579v1 link
2024-07-18 Pushing the Limits of Reactive Planning: Learning to Escape Local Minima Isar Meijer et.al. 2407.13530v1 null
2024-07-18 INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages Abhishek Kumar Singh et.al. 2407.13522v1 null
2024-07-18 Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks Samy Ateia et.al. 2407.13511v1 link
2024-07-18 SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders Sheng-Wei Li et.al. 2407.13460v1 link
2024-07-18 BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models Moon Ye-Bin et.al. 2407.13442v1 null
2024-07-18 Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols Gertjan Burghouts et.al. 2407.13382v1 null
2024-07-17 Zero-shot Text-guided Infinite Image Synthesis with LLM guidance Soyeong Kwon et.al. 2407.12642v1 null
2024-07-17 Evaluating the transferability potential of deep learning models for climate downscaling Ayush Prasad et.al. 2407.12517v1 null
2024-07-17 Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning Mustafa Dogan et.al. 2407.12498v1 null
2024-07-17 TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish Arda Yüksel et.al. 2407.12402v1 link
2024-07-17 Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection Zhenni Yu et.al. 2407.12339v1 link
2024-07-17 ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map Yilin Ye et.al. 2407.12315v1 link
2024-07-17 VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation Zhen Qu et.al. 2407.12276v1 link
2024-07-17 Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge Xuxiong Liu et.al. 2407.12257v1 null
2024-07-17 Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Haibin Wu et.al. 2407.12229v1 link
2024-07-16 Scaling Sign Language Translation Biao Zhang et.al. 2407.11855v1 null
2024-07-16 Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection Gaetan Lopez Latouche et.al. 2407.11854v1 null
2024-07-16 Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model Dominik Winter et.al. 2407.11664v1 null
2024-07-16 A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting He Chang et.al. 2407.11638v1 null
2024-07-16 DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training Guillermo Jimenez-Perez et.al. 2407.11594v1 null
2024-07-16 Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval Yubao Tang et.al. 2407.11504v1 null
2024-07-16 Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes Zhi Cai et.al. 2407.11464v1 link
2024-07-16 InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains Yinzhu Quan et.al. 2407.11384v1 link
2024-07-16 Large Vision-Language Models as Emotion Recognizers in Context Awareness Yuxuan Lei et.al. 2407.11300v1 null
2024-07-16 Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems Yaşar Utku Alçalar et.al. 2407.11288v1 null
2024-07-15 Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation Friedhelm Hamann et.al. 2407.10802v1 link
2024-07-15 Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education Rui Yang et.al. 2407.10794v1 link
2024-07-15 Codebook LLMs: Adapting Political Science Codebooks for LLM Use and Adapting LLMs to Follow Codebooks Andrew Halterman et.al. 2407.10747v1 null
2024-07-15 Anticipating Future Object Compositions without Forgetting Youssef Zahran et.al. 2407.10723v1 null
2024-07-16 Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning Yulong Wang et.al. 2407.10718v2 link
2024-07-15 $\texttt{MixGR}$ : Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity Fengyu Cai et.al. 2407.10691v1 link
2024-07-15 OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer Yu Wang et.al. 2407.10655v1 link
2024-07-16 Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics Yuang Zhang et.al. 2407.10648v2 null
2024-07-15 Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control Yu-Hua Chen et.al. 2407.10646v1 null
2024-07-15 Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection Barah Fazili et.al. 2407.10582v1 link
2024-07-12 Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting Jinning Li et.al. 2407.09475v1 null
2024-07-12 From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation Hanrong Shi et.al. 2407.09191v1 null
2024-07-12 STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs Yiheng Huang et.al. 2407.09096v1 null
2024-07-12 OVExp: Open Vocabulary Exploration for Object-Oriented Navigation Meng Wei et.al. 2407.09016v1 null
2024-07-15 Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation Biqing Qi et.al. 2407.08940v2 link
2024-07-11 DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement Benjamin A. Newman et.al. 2407.08876v1 null
2024-07-11 Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification Wenshuo Peng et.al. 2407.08787v1 null
2024-07-11 Real-Time Anomaly Detection and Reactive Planning with Large Language Models Rohan Sinha et.al. 2407.08735v1 null
2024-07-11 Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data Cherie Ho et.al. 2407.08726v1 null
2024-07-11 HACMan++: Spatially-Grounded Motion Primitives for Manipulation Bowen Jiang et.al. 2407.08585v1 null
2024-07-11 Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models Ying Zhang et.al. 2407.08532v1 null
2024-07-11 Emergent Visual-Semantic Hierarchies in Image-Text Representations Morris Alper et.al. 2407.08521v1 link
2024-07-11 Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization Jinlong Li et.al. 2407.08374v1 null
2024-07-11 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation Tong Shao et.al. 2407.08268v1 link
2024-07-11 Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling Noam Elata et.al. 2407.08256v1 null
2024-07-11 Leveraging LLMs to Predict Affective States via Smartphone Sensor Features Tianyi Zhang et.al. 2407.08240v1 null
2024-07-11 Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning Wenrui Li et.al. 2407.08130v1 null
2024-07-10 Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing Jessica Yin et.al. 2407.07885v1 null
2024-07-11 Toto: Time Series Optimized Transformer for Observability Ben Cohen et.al. 2407.07874v2 null
2024-07-10 OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion Hao Wang et.al. 2407.07844v1 link
2024-07-10 Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR Nandan Thakur et.al. 2407.07790v1 link
2024-07-11 SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis Zihao Wang et.al. 2407.07728v2 link
2024-07-10 Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data Motoshige Sato et.al. 2407.07595v1 null
2024-07-10 Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction Yili Liu et.al. 2407.07587v1 null
2024-07-11 InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior Chenguo Lin et.al. 2407.07580v2 null
2024-07-10 Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search Kirill Paramonov et.al. 2407.07541v1 link
2024-07-10 IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection Mingjin Zhang et.al. 2407.07520v1 link
2024-07-09 Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning J. Crosbie et.al. 2407.07011v1 null
2024-07-09 Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning Mayank Singh et.al. 2407.06893v1 null
2024-07-09 Rethinking Image-to-Video Adaptation: An Object-centric Perspective Rui Qian et.al. 2407.06871v1 null
2024-07-09 PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations Zhanhong Ye et.al. 2407.06664v1 null
2024-07-09 Variational Zero-shot Multispectral Pansharpening Xiangyu Rui et.al. 2407.06633v1 link
2024-07-09 CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding Wenhao Xu et.al. 2407.06611v1 null
2024-07-09 VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving Yibo Liu et.al. 2407.06516v1 null
2024-07-08 CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings Anthony Varkey et.al. 2407.06360v1 link
2024-07-08 CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation Xinying Guo et.al. 2407.06188v1 null
2024-07-08 C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition Rongchang Li et.al. 2407.06113v1 link
2024-07-08 Pseudo-triplet Guided Few-shot Composed Image Retrieval Bohan Hou et.al. 2407.06001v1 null
2024-07-08 Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation Jiaqi Chen et.al. 2407.05890v1 null
2024-07-08 HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels Yingying Jiang et.al. 2407.05795v1 null
2024-07-08 When is the consistent prediction likely to be a correct prediction? Alex Nguyen et.al. 2407.05778v1 null
2024-07-08 Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition Seungju Kim et.al. 2407.05733v1 null
2024-07-08 Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification Jiaying Shi et.al. 2407.05647v1 null
2024-07-08 GenFollower: Enhancing Car-Following Prediction with Large Language Models Xianda Chen et.al. 2407.05611v1 null
2024-07-08 Open-world Multi-label Text Classification with Extremely Weak Supervision Xintong Li et.al. 2407.05609v1 link
2024-07-05 LaRa: Efficient Large-Baseline Radiance Fields Anpei Chen et.al. 2407.04699v1 null
2024-07-05 ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models Yuzhe Gu et.al. 2407.04693v1 link
2024-07-05 RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation Yuxuan Kuang et.al. 2407.04689v1 link
2024-07-05 Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework Reza Averly et.al. 2407.04629v1 null
2024-07-05 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation Yuhan Zhu et.al. 2407.04603v1 link
2024-07-05 GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning Aleksander Ficek et.al. 2407.04528v1 null
2024-07-05 AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents Petr Anokhin et.al. 2407.04363v1 link
2024-07-05 Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning Mainak Singha et.al. 2407.04207v1 link
2024-07-04 Query-Guided Self-Supervised Summarization of Nursing Notes Ya Gao et.al. 2407.04125v1 null
2024-07-04 FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Tongyi SpeechTeam et.al. 2407.04051v1 link
2024-07-03 Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation Marco Mistretta et.al. 2407.03056v1 link
2024-07-03 SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning Bac Nguyen et.al. 2407.03036v1 null
2024-07-03 FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Xiaochen Wang et.al. 2407.02964v1 null
2024-07-03 LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation Hongke Zhao et.al. 2407.02833v1 null
2024-07-03 ZEAL: Surgical Skill Assessment with Zero-shot Tool Inference Using Unified Foundation Model Satoshi Kondo et.al. 2407.02738v1 null
2024-07-02 LLM-Select: Feature Selection with Large Language Models Daniel P. Jeong et.al. 2407.02694v1 null
2024-07-02 Open Panoramic Segmentation Junwei Zheng et.al. 2407.02685v1 link
2024-07-02 Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images Furqan Shaukat et.al. 2407.02625v1 null
2024-07-02 Open Scene Graphs for Open World Object-Goal Navigation Joel Loo et.al. 2407.02473v1 null
2024-07-02 SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation Sayan Nag et.al. 2407.02389v1 null
2024-07-02 Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts Chunlan Ma et.al. 2407.02320v1 null
2024-07-02 Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Yuchen Hu et.al. 2407.02243v1 null
2024-07-02 FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs Haodong Chen et.al. 2407.02157v1 null
2024-07-02 Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model Cong Cao et.al. 2407.01960v1 null
2024-07-02 Text-Aware Diffusion for Policy Learning Calvin Luo et.al. 2407.01903v1 null
2024-07-01 DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Chang-Han Yeh et.al. 2407.01519v1 link
2024-07-01 Semantic Compositions Enhance Vision-Language Contrastive Learning Maxwell Aladago et.al. 2407.01408v1 null
2024-07-01 PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction Xuan Yu et.al. 2407.01349v1 null
2024-06-28 STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Guohao Sun et.al. 2406.19973v1 link
2024-06-28 Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies Pingcheng Jian et.al. 2406.19971v1 null
2024-06-28 Untangling the Unrestricted Web: Automatic Identification of Multilingual Registers Erik Henriksson et.al. 2406.19892v1 link
2024-06-28 Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood Yang Xu et.al. 2406.19874v1 link
2024-06-27 Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations Ritam Dutt et.al. 2406.19545v1 link
2024-06-27 The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models Xiliang Zhu et.al. 2406.19358v1 null
2024-06-27 IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language Lucky Susanto et.al. 2406.19349v1 null
2024-06-27 Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment Hao Fei et.al. 2406.19255v1 null
2024-06-30 Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO Fuseini Mumuni et.al. 2406.19057v2 null
2024-06-27 Zero-shot domain adaptation based on dual-level mix and contrast Yu Zhe et.al. 2406.18996v1 null
2024-06-28 Manipulate-Anything: Automating Real-World Robots using Vision-Language Models Jiafei Duan et.al. 2406.18915v2 null
2024-06-27 DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment Ke-Han Lu et.al. 2406.18871v1 null
2024-06-27 Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models Yicheng Xu et.al. 2406.18868v1 link
2024-06-27 Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach Yuxiang Huang et.al. 2406.18837v1 null
2024-06-27 Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs Huaying Zhang et.al. 2406.18836v1 null
2024-06-26 Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation Ahmed Njifenjou et.al. 2406.18460v1 null
2024-06-26 Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets Simon Münker et.al. 2406.18239v1 null
2024-06-26 Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps Dicong Qiu et.al. 2406.18115v1 null
2024-06-26 Boosting Soft Q-Learning by Bounding Jacob Adamczyk et.al. 2406.18033v1 link
2024-06-26 E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Sefik Emre Eskimez et.al. 2406.18009v1 link
2024-06-26 Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model Zhuo Zheng et.al. 2406.17998v1 link
2024-06-25 Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts Xuyang Wu et.al. 2406.17974v1 link
2024-06-25 Efficient Document Ranking with Learnable Late Interactions Ziwei Ji et.al. 2406.17968v1 null
2024-06-25 The Overcooked Generalisation Challenge Constantin Ruhdorfer et.al. 2406.17949v1 null
2024-06-25 CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design Nafis Neehal et.al. 2406.17888v1 link
2024-06-25 Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity Chih-Hsuan Yang et.al. 2406.17720v1 link
2024-06-25 LaTable: Towards Large Tabular Models Boris van Breugel et.al. 2406.17673v1 null
2024-06-26 SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond Marco Comunità et.al. 2406.17672v2 null
2024-06-26 Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP Sedigheh Eslami et.al. 2406.17639v2 link
2024-06-25 Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images Boyu Chen et.al. 2406.17577v1 link
2024-06-25 High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model Joun Yeop Lee et.al. 2406.17310v1 null
2024-06-25 Zero-Shot Long-Form Video Understanding through Screenplay Yongliang Wu et.al. 2406.17309v1 null
2024-06-24 CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation Abe Bohan Hou et.al. 2406.17186v1 link
2024-06-24 Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models Nisarg Patel et.al. 2406.17169v1 link
2024-06-24 Vastextures: Vast repository of textures and PBR materials extracted from real-world images using unsupervised methods Sagi Eppel et.al. 2406.17146v1 null
2024-06-24 Can Quantum Computers Do Nothing? Alexander Nico-Katz et.al. 2406.16861v1 null
2024-06-24 USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$ onversations Mounika Marreddy et.al. 2406.16833v1 null
2024-06-25 Towards Zero-Shot Text-To-Speech for Arabic Dialects Khai Duy Doan et.al. 2406.16751v2 null
2024-06-24 Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings Andrea Posada et.al. 2406.16611v1 link
2024-06-24 eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure Hoorieh Sabzevari et.al. 2406.16490v1 link
2024-06-24 UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding Dongyang Li et.al. 2406.16372v1 link
2024-06-24 EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records Yeonsu Kwon et.al. 2406.16341v1 link
2024-06-24 DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task Wenhan Liu et.al. 2406.16332v1 link
2024-06-24 Anomaly Detection of Tabular Data Using LLMs Aodong Li et.al. 2406.16308v1 null
2024-06-24 LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments Zixia Jia et.al. 2406.16294v1 link
2024-06-21 Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild Nadav Orzech et.al. 2406.15331v1 null
2024-06-21 LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Ziyan Jiang et.al. 2406.15319v1 null
2024-06-21 Retrieval Augmented Zero-Shot Text Classification Tassallah Abdullahi et.al. 2406.15241v1 link
2024-06-21 A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation Irune Zubiaga et.al. 2406.15227v1 link
2024-06-21 How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy? Subhankar Maity et.al. 2406.15211v1 null
2024-06-21 Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding Mohan Li et.al. 2406.15209v1 null
2024-06-21 Latent Space Translation via Inverse Relative Projection Valentino Maiorca et.al. 2406.15057v1 null
2024-06-21 Behaviour Distillation Andrei Lupu et.al. 2406.15042v1 link
2024-06-21 Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning Suyi Li et.al. 2406.14962v1 link
2024-06-21 Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video Zhengbang Yang et.al. 2406.14877v1 null
2024-06-20 Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps Nikita Starodubcev et.al. 2406.14539v1 null
2024-06-20 APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking Can Jin et.al. 2406.14449v1 null
2024-06-20 Transferable Boltzmann Generators Leon Klein et.al. 2406.14426v1 null
2024-06-20 Zero-Shot Image Denoising for High-Resolution Electron Microscopy Xuanyu Tian et.al. 2406.14264v1 link
2024-06-20 SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots Weixing Wang et.al. 2406.14208v1 null
2024-06-20 A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning Panagiotis Kaliosis et.al. 2406.14164v1 link
2024-06-20 One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging Linhan Yang et.al. 2406.14136v1 null
2024-06-20 An Investigation of Prompt Variations for Zero-shot LLM-based Rankers Shuoqi Sun et.al. 2406.14117v1 link
2024-06-20 Understanding Different Design Choices in Training Large Time Series Models Yu-Neng Chuang et.al. 2406.14045v1 null
2024-06-20 Taxonomy-Guided Zero-Shot Recommendations with LLMs Yueqing Liang et.al. 2406.14043v1 link
2024-06-18 Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Ning-Hsu Wang et.al. 2406.12849v1 null
2024-06-18 Generating Educational Materials with Different Levels of Readability using LLMs Chieh-Yang Huang et.al. 2406.12787v1 null
2024-06-18 MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning Shuo Xu et.al. 2406.12757v1 null
2024-06-19 Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA Miaoyu Li et.al. 2406.12746v2 link
2024-06-18 Large Language Model as a Universal Clinical Multi-task Decoder Yujiang Wu et.al. 2406.12738v1 null
2024-06-18 BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity Zahra Gharaee et.al. 2406.12723v1 link
2024-06-18 GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models Yongtao Ge et.al. 2406.12671v1 link
2024-06-18 Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jiang-Xin Shi et.al. 2406.12638v1 link
2024-06-18 News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation Andreea Iana et.al. 2406.12634v1 link
2024-06-18 SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation Yixia Li et.al. 2406.12629v1 link
2024-06-17 Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity Bingxiang He et.al. 2406.11721v1 link
2024-06-17 TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy Yiqun Chen et.al. 2406.11678v1 link
2024-06-17 A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4 Ming Gu et.al. 2406.11651v1 link
2024-06-17 AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection Lingjie Kong et.al. 2406.11643v1 link
2024-06-17 Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better! Mingyang Song et.al. 2406.11629v1 link
2024-06-17 Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency Vasiliki Kougia et.al. 2406.11486v1 link
2024-06-17 How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment Heyan Huang et.al. 2406.11474v1 null
2024-06-17 Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction Shilong Li et.al. 2406.11429v1 link
2024-06-17 DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer Keon Lee et.al. 2406.11427v1 null
2024-06-17 BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM Zhewen Shen et.al. 2406.11418v1 null
2024-06-14 Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation Nameer Hirschkind et.al. 2406.10223v1 null
2024-06-14 Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models Carson Denison et.al. 2406.10162v1 link
2024-06-14 Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition Guinan Li et.al. 2406.10152v1 null
2024-06-14 Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection Mehar Khurana et.al. 2406.10115v1 link
2024-06-14 dGrasp: NeRF-Informed Implicit Grasp Policies with Supervised Optimization Slopes Gergely Sóti et.al. 2406.09939v1 null
2024-06-14 POWN: Prototypical Open-World Node Classification Marcel Hoffmann et.al. 2406.09926v1 link
2024-06-14 CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions Mingyu Derek Ma et.al. 2406.09923v1 link
2024-06-14 Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy Linhan Ma et.al. 2406.09844v1 null
2024-06-14 Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting Ce Hao et.al. 2406.09767v1 null
2024-06-14 Learning Language Structures through Grounding Freda Shi et.al. 2406.09662v1 null
2024-06-13 VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding Muhammad Maaz et.al. 2406.09418v1 link
2024-06-13 Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition Youngtaek Oh et.al. 2406.09388v1 link
2024-06-13 Scale-Invariant Monocular Depth Estimation via SSI Depth S. Mahdi H. Miangoleh et.al. 2406.09374v1 null
2024-06-13 **Learning from Nat

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages