Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-01-03 | Free-Form Motion Control: A Synthetic Video Generation Dataset with Controllable Camera and Object Motions | Xincheng Shuai et.al. | 2501.01425v2 | null |
2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409v1 | null |
2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174v1 | null |
2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657v1 | null |
2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510v1 | null |
2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976v1 | null |
2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830v1 | link |
2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803v1 | null |
2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767v1 | null |
2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733v1 | null |
2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538v1 | link |
2024-12-28 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | Shuo Wang et.al. | 2412.20082v1 | null |
2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056v1 | link |
2024-12-27 | Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation | Guangsheng Xu et.al. | 2412.19676v1 | link |
2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518v1 | null |
2024-12-26 | Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos | Changwoon Choi et.al. | 2412.19089v1 | null |
2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806v1 | null |
2024-12-22 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | Zhaoxing Zhang et.al. | 2412.16923v1 | null |
2024-12-21 | EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose | Yung-Hong Sun et.al. | 2412.16742v1 | null |
2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454v1 | null |
2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155v1 | null |
2024-12-20 | Monkey Transfer Learning Can Improve Human Pose Estimation | Bradley Scott et.al. | 2412.15966v1 | null |
2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212v1 | null |
2024-12-13 | IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education | Roberto Daza et.al. | 2412.14195v1 | link |
2024-12-18 | Level-Set Parameters: Novel Representation for 3D Shape Analysis | Huan Lei et.al. | 2412.13502v1 | null |
2024-12-18 | Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation | Xiaoqi An et.al. | 2412.13454v1 | null |
2024-12-17 | CondiMen: Conditional Multi-Person Mesh Recovery | Brégier Romain et.al. | 2412.13058v1 | null |
2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675v1 | null |
2024-12-16 | Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion | Adam Bethell et.al. | 2412.11420v1 | null |
2024-12-13 | ExeChecker: Where Did I Go Wrong? | Yiwen Gu et.al. | 2412.10573v1 | null |
2024-12-11 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty | Harry Zhang et.al. | 2412.10431v1 | null |
2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868v1 | null |
2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621v1 | null |
2024-12-12 | FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction | Jiale Xu et.al. | 2412.09573v1 | null |
2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640v1 | null |
2024-12-12 | Drift-free Visual SLAM using Digital Twins | Roxane Merat et.al. | 2412.08496v2 | null |
2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376v1 | link |
2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746v1 | null |
2024-12-09 | MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Zhenggang Tang et.al. | 2412.06974v1 | null |
2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488v1 | link |
2024-12-09 | Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation | Marsha Mariya Kappan et.al. | 2412.06227v1 | null |
2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821v1 | null |
2024-12-05 | ProPLIKS: Probablistic 3D human body pose estimation | Karthik Shetty et.al. | 2412.04665v1 | null |
2024-12-05 | DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction | Ben Kaye et.al. | 2412.04464v1 | null |
2024-12-05 | Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation | Alan Li et.al. | 2412.04279v1 | null |
2024-12-04 | Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis | Qitao Zhao et.al. | 2412.03570v1 | null |
2024-12-06 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517v2 | null |
2024-12-05 | A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks | Proma Hossain Progga et.al. | 2412.03498v2 | null |
2024-12-04 | MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras | Huai Yu et.al. | 2412.03146v1 | link |
2024-12-04 | An indoor DSO-based ceiling-vision odometry system for indoor industrial environments | Abdelhak Bougouffa et.al. | 2412.02950v1 | null |
2024-12-03 | EgoCast: Forecasting Egocentric Human Pose in the Wild | Maria Escobar et.al. | 2412.02903v1 | null |
2024-12-02 | emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Sasha Salter et.al. | 2412.02725v1 | link |
2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Miroslav Purkrabek et.al. | 2412.02254v1 | null |
2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | Xiangyong Lu et.al. | 2412.02197v1 | link |
2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066v1 | null |
2024-12-02 | Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Miroslav Purkrabek et.al. | 2412.01562v1 | link |
2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543v1 | null |
2024-12-02 | HandOS: 3D Hand Reconstruction in One Stage | Xingyu Chen et.al. | 2412.01537v1 | null |
2024-12-02 | SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Yuxuan Zhou et.al. | 2412.01500v1 | link |
2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422v1 | null |
2024-12-02 | Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures | Qiyuan Shen et.al. | 2412.01299v1 | null |
2024-12-02 | CRISP: Object Pose and Shape Estimation with Test-Time Adaptation | Jingnan Shi et.al. | 2412.01052v1 | null |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492v1 | null |
2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458v1 | null |
2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289v1 | null |
2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167v1 | null |
2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162v1 | link |
2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033v1 | null |
2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944v1 | null |
2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673v2 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-27 | Manual-PA: Learning 3D Part Assembly from Instruction Diagrams | Jiahao Zhang et.al. | 2411.18011v1 | null |
2024-11-26 | Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors | Ziang Xu et.al. | 2411.17790v1 | null |
2024-11-26 | Geometric Point Attention Transformer for 3D Shape Reassembly | Jiahan Li et.al. | 2411.17788v1 | null |
2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662v1 | null |
2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432v1 | null |
2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240v1 | link |
2024-11-28 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190v3 | null |
2024-11-26 | GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation | Xin Liu et.al. | 2411.17174v1 | null |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668v1 | null |
2024-11-25 | Edge Weight Prediction For Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2411.16665v1 | link |
2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443v1 | link |
2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318v1 | link |
2024-11-25 | UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image | Xingyu Liu et.al. | 2411.16106v1 | null |
2024-11-24 | Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Yujing Sun et.al. | 2411.15860v1 | link |
2024-11-24 | PEnG: Pose-Enhanced Geo-Localisation | Tavis Shore et.al. | 2411.15742v1 | null |
2024-11-22 | Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications | Changseob Song et.al. | 2411.15366v1 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913v1 | null |
2024-11-22 | mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect | Shuting Hu et.al. | 2411.14656v1 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347v1 | link |
2024-11-21 | SEMPose: A Single End-to-end Network for Multi-object Pose Estimation | Xin Liu et.al. | 2411.14002v1 | null |
2024-11-21 | Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain | Vidya Sudevan et.al. | 2411.13988v1 | null |
2024-11-21 | Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework | Vidya Sudevan et.al. | 2411.13962v1 | null |
2024-11-20 | Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation | Rahm Ranjan et.al. | 2411.13716v1 | null |
2024-11-20 | Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction | Yi Gu et.al. | 2411.13620v1 | null |
2024-11-19 | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference | Seong Jong Yoo et.al. | 2411.13607v1 | link |
2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291v1 | null |
2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | Yuchen Yang et.al. | 2411.13026v1 | link |
2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676v1 | null |
2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592v1 | link |
2024-11-19 | GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Teli Ma et.al. | 2411.12286v1 | null |
2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Yunong Liu et.al. | 2411.11409v1 | link |
2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504v1 | link |
2024-11-13 | ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening | Hojun Jang et.al. | 2411.09435v1 | null |
2024-11-13 | Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis | Dominik Borer et.al. | 2411.08603v1 | null |
2024-11-13 | DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization | Yueming Xu et.al. | 2411.08373v1 | null |
2024-11-16 | RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation | Shuocheng Yang et.al. | 2411.07699v2 | link |
2024-11-12 | Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction | Rotem Atari et.al. | 2411.07644v1 | null |
2024-11-12 | Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions | Shuwei Xing et.al. | 2411.07495v1 | null |
2024-11-08 | Acoustic-based 3D Human Pose Estimation Robust to Human Position | Yusuke Oumi et.al. | 2411.07165v1 | null |
2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869v1 | null |
2024-11-11 | GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting | Daehan Lee et.al. | 2411.06766v1 | link |
2024-11-11 | GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction | Shizhe Yuan et.al. | 2411.06725v1 | null |
2024-11-10 | Magnetic Field Aided Vehicle Localization with Acceleration Correction | Mrunmayee Deshpande et.al. | 2411.06543v1 | null |
2024-11-10 | Visuotactile-Based Learning for Insertion with Compliant Hands | Osher Azulay et.al. | 2411.06408v1 | link |
2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734v1 | null |
2024-11-08 | DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Rafael Berral-Soler et.al. | 2411.05552v1 | link |
2024-11-08 | Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map | Chanuk Yang et.al. | 2411.05497v1 | null |
2024-11-08 | Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements | Kunrui Ze et.al. | 2411.05481v1 | null |
2024-11-07 | Social EgoMesh Estimation | Luca Scofano et.al. | 2411.04598v1 | link |
2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory | Ali K. AlShami et.al. | 2411.04501v1 | null |
2024-11-08 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386v2 | null |
2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807v3 | null |
2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724v1 | null |
2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561v1 | null |
2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086v1 | null |
2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804v1 | null |
2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443v1 | link |
2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543v2 | null |
2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196v1 | null |
2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207v1 | link |
2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643v2 | null |
2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715v1 | null |
2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213v1 | null |
2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128v1 | link |
2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079v1 | null |
2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743v1 | link |
2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153v1 | null |
2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731v2 | link |
2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358v2 | null |
2024-10-27 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions | Rawal Khirodkar et.al. | 2410.20294v1 | null |
2024-10-26 | Neural Fields in Robotics: A Survey | Muhammad Zubair Irshad et.al. | 2410.20220v1 | link |
2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336v1 | null |
2024-10-24 | Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Junyi Chen et.al. | 2410.18962v1 | null |
2024-10-24 | VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | Daniel Bermuth et.al. | 2410.18723v1 | link |
2024-10-23 | Robust Two-View Geometry Estimation with Implicit Differentiation | Vladislav Pyatov et.al. | 2410.17983v1 | link |
2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725v1 | link |
2024-10-21 | Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers | Andrea Berra et.al. | 2410.15802v1 | null |
2024-10-21 | ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | Tao Tang et.al. | 2410.15582v1 | link |
2024-10-20 | Neural Active Structure-from-Motion in Dark and Textureless Environment | Kazuto Ichimaru et.al. | 2410.15378v1 | null |
2024-10-20 | POSE: Pose estimation Of virtual Sync Exhibit system | Hao-Tang Tsui et.al. | 2410.15343v1 | link |
2024-10-18 | Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2410.14565v1 | null |
2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540v1 | null |
2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419v1 | null |
2024-10-18 | Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping | Renguang Chen et.al. | 2410.14161v1 | null |
2024-10-15 | From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images | unyang Wu et.al. | 2410.13896v1 | null |
2024-10-17 | DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions | Edison P. Velasco-Sánchez et.al. | 2410.13541v1 | null |
2024-10-17 | Object Pose Estimation Using Implicit Representation For Transparent Objects | Varun Burde et.al. | 2410.13465v1 | null |
2024-10-16 | Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation | Francesco Evangelisti et.al. | 2410.12679v1 | null |
2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834v1 | null |
2024-10-18 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | Xinyan Chen et.al. | 2410.10167v2 | null |
2024-10-13 | Occluded Human Pose Estimation based on Limb Joint Augmentation | Gangtao Han et.al. | 2410.09885v1 | null |
2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467v1 | null |
2024-10-12 | Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis | Qianyi Deng et.al. | 2410.09312v1 | link |
2024-10-11 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation | Jianyu Zhao et.al. | 2410.09010v1 | link |
2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743v1 | link |
2024-10-10 | Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation | Felix Petersen et.al. | 2410.08125v1 | null |
2024-10-10 | Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation | Maria Makarova et.al. | 2410.07801v1 | null |
2024-10-10 | Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos | Cuong Le et.al. | 2410.07795v1 | link |
2024-10-12 | Autonomous Driving in Unstructured Environments: How Far Have We Come? | Chen Min et.al. | 2410.07701v2 | link |
2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670v1 | null |
2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Yunzhi Lin et.al. | 2410.06694v1 | null |
2024-10-08 | SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging | Ziyang Chen et.al. | 2410.06028v1 | link |
2024-10-08 | AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry | Thomas Jantos et.al. | 2410.05996v1 | null |
2024-10-08 | Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? | Charalambos Tzamos et.al. | 2410.05984v1 | link |
2024-10-08 | FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance | Ruocheng Wang et.al. | 2410.05791v1 | null |
2024-10-07 | Comparison of marker-less 2D image-based methods for infant pose estimation | Lennart Jahn et.al. | 2410.04980v1 | null |
2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | Mehwish Ghafoor et.al. | 2410.04574v1 | link |
2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419v1 | null |
2024-10-05 | Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis | Juan Ignacio Bravo Pérez-Villar et.al. | 2410.04298v1 | link |
2024-10-05 | A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems | Nikola Radulov et.al. | 2410.04242v1 | link |
2024-10-04 | Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos | Ziyu Wang et.al. | 2410.03858v1 | null |
2024-10-04 | Universal Global State Estimation for Inertial Navigation Systems | Sifeddine Benahmed et.al. | 2410.03846v1 | null |
2024-10-04 | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Junyi Zhang et.al. | 2410.03825v1 | null |
2024-10-04 | Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Ci Li et.al. | 2410.03438v1 | null |
2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174v1 | null |
2024-10-04 | CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization | Shigemichi Matsuzaki et.al. | 2410.03054v1 | null |
2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643v1 | null |
2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237v1 | null |
2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618v1 | null |
2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293v1 | null |
2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061v1 | null |
2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469v1 | null |
2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237v1 | null |
2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127v1 | link |
2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111v1 | null |
2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986v1 | null |
2024-09-29 | PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond | Chen Song et.al. | 2409.19772v1 | link |
2024-09-29 | GelSlim 4.0: Focusing on Touch and Reproducibility | Andrea Sipos et.al. | 2409.19770v1 | null |
2024-09-27 | Robust Proximity Operations using Probabilistic Markov Models | Deep Parikh et.al. | 2409.19062v1 | null |
2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673v1 | null |
2024-09-27 | DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences | Jingwei Song et.al. | 2409.18457v1 | null |
2024-09-30 | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Mengchen Zhang et.al. | 2409.18261v2 | link |
2024-09-26 | AI-Powered Augmented Reality for Satellite Assembly, Integration and Test | Alvaro Patricio et.al. | 2409.18101v1 | null |
2024-09-27 | Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes | Katja Ludwig et.al. | 2409.17671v2 | null |
2024-09-25 | Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits | Shaoxiong Yao et.al. | 2409.17389v1 | null |
2024-09-25 | Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots | Zhichao Liu et.al. | 2409.17116v1 | null |
2024-09-25 | Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles | Ran Jing et.al. | 2409.17111v1 | null |
2024-09-25 | Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation | Lucas Carvalho de Lima et.al. | 2409.16680v1 | null |
2024-09-25 | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation | Jingyi Tang et.al. | 2409.16600v1 | null |
2024-09-25 | Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots | Masoud Dayani Najafabadi et.al. | 2409.16595v1 | link |
2024-09-24 | PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings | Sutharsan Mahendren et.al. | 2409.15832v1 | null |
2024-09-24 | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Ruida Zhang et.al. | 2409.15727v1 | link |
2024-09-23 | Framework for Robust Localization of UUVs and Mapping of Net Pens | David Botta et.al. | 2409.15475v1 | null |
2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054v1 | link |
2024-09-23 | BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach | Stefano Puliti et.al. | 2409.14755v1 | link |
2024-09-23 | ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps | Haiming Gao et.al. | 2409.14723v1 | null |
2024-09-22 | Tactile Functasets: Neural Implicit Representations of Tactile Datasets | Sikai Li et.al. | 2409.14592v1 | null |
2024-09-22 | AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way | Sining Huang et.al. | 2409.14577v1 | null |
2024-09-22 | DROP: Dexterous Reorientation via Online Planning | Albert H. Li et.al. | 2409.14562v1 | null |
2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269v1 | null |
2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870v1 | link |
2024-09-18 | End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Thomas Pöllabauer et.al. | 2409.11819v1 | null |
2024-09-18 | Bridging Domain Gap for Flight-Ready Spaceborne Vision | Tae Ha Park et.al. | 2409.11661v1 | null |
2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512v1 | null |
2024-09-17 | Training Datasets Generation for Machine Learning: Application to Vision Based Navigation | Jérémy Lebreton et.al. | 2409.11383v1 | null |
2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340v1 | link |
2024-09-17 | ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges | Thien-Minh Nguyen et.al. | 2409.11122v1 | link |
2024-09-17 | Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB | Alessandro Simoni et.al. | 2409.11104v1 | null |
2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925v2 | null |
2024-09-17 | Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter | Deep Parikh et.al. | 2409.10815v1 | null |
2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441v1 | null |
2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419v1 | null |
2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357v1 | null |
2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095v1 | null |
2024-09-15 | Precise Pick-and-Place using Score-Based Diffusion Networks | Shih-Wei Guo et.al. | 2409.09725v1 | null |
2024-09-15 | Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild | Nie Lin et.al. | 2409.09714v1 | null |
2024-09-15 | Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision | Deep Parikh et.al. | 2409.09665v1 | null |
2024-09-15 | A Scalable Tabletop Satellite Automation Testbed:Design And Experiments | Deep Parikh et.al. | 2409.09633v1 | null |
2024-09-14 | MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry | Yuheng Qiu et.al. | 2409.09479v1 | null |
2024-09-14 | Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM | Haoying Li et.al. | 2409.09410v1 | null |
2024-09-13 | Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry | Yunus Bilge Kurt et.al. | 2409.08769v1 | link |
2024-09-13 | WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users | Yunzhi Li et.al. | 2409.08494v1 | link |
2024-09-12 | Bayesian Inverse Graphics for Few-Shot Concept Learning | Octavio Arriaga et.al. | 2409.08351v1 | link |
2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269v1 | null |
2024-09-12 | Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation | Haoying Li et.al. | 2409.07933v1 | null |
2024-09-12 | GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions | Liang Feng et.al. | 2409.07798v1 | null |
2024-09-12 | GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution | Liang Feng et.al. | 2409.07752v1 | null |
2024-09-11 | FaVoR: Features via Voxel Rendering for Camera Relocalization | Vincenzo Polizzi et.al. | 2409.07571v1 | null |
2024-09-11 | Benchmarking 2D Egocentric Hand Pose Datasets | Olga Taran et.al. | 2409.07337v1 | null |
2024-09-11 | iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation | Shuolong Chen et.al. | 2409.07116v1 | link |
2024-09-11 | Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry | Anbo Tao et.al. | 2409.06948v1 | null |
2024-09-13 | A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch | Haodong Zheng et.al. | 2409.06912v2 | null |
2024-09-11 | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Shishir Reddy Vutukur et.al. | 2409.06683v2 | link |
2024-09-10 | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation | Ginger Delmas et.al. | 2409.06535v1 | null |
2024-09-10 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | Mohsi Jawaid et.al. | 2409.06240v1 | null |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions | Jianping Li et.al. | 2409.05006v1 | null |
2024-09-06 | Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands | Yotam Erel et.al. | 2409.04397v1 | null |
2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196v1 | null |
2024-09-06 | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics | Woojin Cho et.al. | 2409.04033v1 | null |
2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998v1 | null |
2024-09-09 | The Influence of Faulty Labels in Data Sets on Human Pose Estimation | Arnold Schwarz et.al. | 2409.03887v2 | null |
2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556v1 | null |
2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245v1 | null |
2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715v1 | null |
2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581v1 | null |
2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224v1 | null |
2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011v1 | null |
2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879v1 | link |
2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002v1 | null |
2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869v1 | null |
2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744v1 | link |
2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736v1 | null |
2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449v1 | null |
2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373v3 | null |
2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297v1 | null |
2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168v1 | null |
2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690v2 | null |
2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547v1 | link |
2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540v1 | link |
2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536v1 | null |
2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810v1 | link |
2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761v2 | link |
2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750v1 | null |
2024-08-28 | Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction | Salma Salimi et.al. | 2408.15717v1 | null |
2024-08-26 | Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model | Abu Saleh Musa Miah et.al. | 2408.14111v1 | null |
2024-08-25 | InterTrack: Tracking Human Object Interaction without Object Templates | Xianghui Xie et.al. | 2408.13953v1 | null |
2024-08-24 | Temporally-consistent 3D Reconstruction of Birds | Johannes Hägerlind et.al. | 2408.13629v1 | null |
2024-08-24 | Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation | Jianing Song et.al. | 2408.13587v1 | null |
2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569v3 | null |
2024-08-21 | GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting | Wanshui Gan et.al. | 2408.11447v1 | link |
2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085v1 | null |
2024-08-20 | ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data | Elia Bonetto et.al. | 2408.10831v1 | null |
2024-08-20 | MPL: Lifting 3D Human Pose from Multi-view 2D Poses | Seyed Abolfazl Ghasemzadeh et.al. | 2408.10805v1 | link |
2024-08-19 | RUMI: Rummaging Using Mutual Information | Sheng Zhong et.al. | 2408.10450v1 | null |
2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195v1 | null |
2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037v1 | link |
2024-08-19 | Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation | Qianhui Men et.al. | 2408.09931v1 | null |
2024-08-18 | OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare | Chen Long-fei et.al. | 2408.09409v1 | null |
2024-08-17 | An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface | Kevin Jose Thomas et.al. | 2408.09311v1 | link |
2024-08-16 | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation | Hao Tang et.al. | 2408.09042v1 | null |
2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723v1 | null |
2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | Xingyue Lin et.al. | 2408.08623v1 | null |
2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312v1 | null |
2024-08-15 | Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation | Varun Burde et.al. | 2408.08234v1 | link |
2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202v1 | null |
2024-08-15 | Your Turn: Real-World Turning Angle Estimation for Parkinson's Disease Severity Assessment | Qiushuo Cheng et.al. | 2408.08182v1 | null |
2024-08-15 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang et.al. | 2408.07975v1 | null |
2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917v1 | link |
2024-08-13 | Grasping by Hanging: a Learning-Free Grasping Detection Method for Previously Unseen Objects | Wanze Li et.al. | 2408.06734v1 | null |
2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648v1 | null |
2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481v1 | link |
2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336v1 | null |
2024-08-12 | CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments | Yanpeng Jia et.al. | 2408.05981v1 | null |
2024-08-12 | PAFormer: Part Aware Transformer for Person Re-identification | Hyeono Jung et.al. | 2408.05918v1 | null |
2024-08-11 | SABER-6D: Shape Representation Based Implicit Object Pose Estimation | Shishir Reddy Vutukur et.al. | 2408.05867v1 | null |
2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635v1 | null |
2024-08-10 | Anticipation through Head Pose Estimation: a preliminary study | Federico Figari Tomenotti et.al. | 2408.05516v1 | null |
2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979v1 | null |
2024-08-07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Yunlong Huang et.al. | 2408.03540v1 | null |
2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225v1 | link |
2024-08-06 | Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW | Elia Cereda et.al. | 2408.03168v1 | null |
2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078v1 | link |
2024-08-07 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Xinyi Zhang et.al. | 2408.02922v2 | null |
2024-08-05 | Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises | Aleksa Marusic et.al. | 2408.02855v1 | null |
2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285v1 | null |
2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110v1 | null |
2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945v1 | null |
2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850v1 | null |
2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841v1 | link |
2024-08-03 | E |
Yunshan Qi et.al. | 2408.01840v1 | null |
2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728v1 | null |
2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655v1 | null |
2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566v1 | null |
2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178v1 | null |
2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117v1 | null |
2024-07-30 | StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset | Chaofan Huo et.al. | 2407.20545v1 | link |
2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | Wencan Cheng et.al. | 2407.20542v1 | link |
2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | Batu Candan et.al. | 2407.20515v1 | null |
2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437v1 | null |
2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497v1 | link |
2024-07-26 | Flexible graph convolutional network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2407.19077v1 | link |
2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590v1 | null |
2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438v2 | link |
2024-07-24 | Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments | Wei Gao et.al. | 2407.17078v1 | null |
2024-07-30 | DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Xiaobiao Du et.al. | 2407.16988v2 | link |
2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961v1 | null |
2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560v1 | link |
2024-07-23 | Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features | Romeo Valentin et.al. | 2407.16223v1 | null |
2024-07-23 | Optimal camera-robot pose estimation in linear time from points and lines | Guangyang Zeng et.al. | 2407.16151v1 | null |
2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137v1 | null |
2024-07-21 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Zheng Chong et.al. | 2407.15886v1 | link |
2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791v1 | null |
2024-07-22 | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection | Kangqi Ma et.al. | 2407.15771v1 | null |
2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484v1 | null |
2024-07-23 | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions | Yihao Ai et.al. | 2407.15451v2 | link |
2024-07-22 | avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality | Dizhi Ma et.al. | 2407.15373v1 | null |
2024-07-20 | From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM | Lorenzo Montano-Oliván et.al. | 2407.14797v1 | null |
2024-07-19 | ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation | Luke Bidulka et.al. | 2407.14605v1 | null |
2024-07-19 | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry | Sungho Chun et.al. | 2407.14136v1 | link |
2024-07-18 | RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Yuan-Hao Ho et.al. | 2407.13930v1 | null |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-18 | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator | Yujia Liang et.al. | 2407.13483v1 | link |
2024-07-17 | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization | Yiyang Chen et.al. | 2407.12667v1 | link |
2024-07-17 | Invertible Neural Warp for NeRF | Shin-Fang Chng et.al. | 2407.12354v1 | null |
2024-07-16 | NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models | Francesco Milano et.al. | 2407.12207v1 | link |
2024-07-16 | Monocular pose estimation of articulated surgical instruments in open surgery | Robert Spektor et.al. | 2407.12138v1 | null |
2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736v2 | link |
2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321v1 | link |
2024-07-15 | A BlueROV2-based platform for underwater mapping experiments | Tudor Alinei-Poiana et.al. | 2407.10901v1 | link |
2024-07-15 | LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning | Zhuozhu Jian et.al. | 2407.10782v1 | null |
2024-07-15 | Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis | Antoine Legrand et.al. | 2407.10762v1 | null |
2024-07-16 | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation | Haonan Wang et.al. | 2407.10756v2 | null |
2024-07-15 | Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs | Nicholas Carlotti et.al. | 2407.10661v1 | null |
2024-07-15 | Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function | Giulia Panconi et.al. | 2407.10590v1 | null |
2024-07-14 | 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects | Weiming Zhi et.al. | 2407.10331v1 | null |
2024-07-16 | psifx -- Psychological and Social Interactions Feature Extraction Package | Guillaume Rochette et.al. | 2407.10266v2 | null |
2024-07-14 | PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation | Nermin Samet et.al. | 2407.10220v1 | link |
2024-07-14 | 3DEgo: 3D Editing on the Go! | Umar Khalid et.al. | 2407.10102v1 | null |
2024-07-12 | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning | Tom Fischer et.al. | 2407.09271v1 | link |
2024-07-12 | HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation | Manuel Birlo et.al. | 2407.09215v1 | null |
2024-07-12 | KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting | Andrew Jeong et.al. | 2407.08909v1 | null |
2024-07-11 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation | Tao Jiang et.al. | 2407.08634v1 | link |
2024-07-11 | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints | Rui Yin et.al. | 2407.08199v1 | link |
2024-07-11 | SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM | Neng Wang et.al. | 2407.08106v1 | link |
2024-07-10 | RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects | Jiahao Nick Li et.al. | 2407.08081v1 | null |
2024-07-10 | Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization | Jinjie Mai et.al. | 2407.08023v1 | link |
2024-07-10 | Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation | Junjia Han et.al. | 2407.07389v1 | null |
2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984v1 | null |
2024-07-09 | Computer vision tasks for intelligent aerospace missions: An overview | Huilin Chen et.al. | 2407.06513v1 | null |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-10 | On the power of data augmentation for head pose estimation | Michael Welter et.al. | 2407.05357v2 | link |
2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283v1 | link |
2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384v1 | link |
2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041v1 | link |
2024-07-04 | Markerless Multi-view 3D Human Pose Estimation: a survey | Ana Filipa Rodrigues Nogueira et.al. | 2407.03817v1 | null |
2024-07-04 | A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios | Zikang Yuan et.al. | 2407.03590v1 | link |
2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990v1 | null |
2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918v1 | link |
2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455v1 | null |
2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129v1 | null |
2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104v1 | null |
2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811v1 | null |
2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303v1 | link |
2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013v1 | link |
2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848v1 | null |
2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518v1 | link |
2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots'oehli et.al. | 2407.00252v1 | null |
2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726v1 | null |
2024-06-28 | CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services | DongKi Noh et.al. | 2406.19634v1 | null |
2024-06-27 | Multimodal Visual-haptic pose estimation in the presence of transient occlusion | Michael Zechmair et.al. | 2406.19323v1 | null |
2024-06-27 | Human Modelling and Pose Estimation Overview | Pawel Knap et.al. | 2406.19290v1 | null |
2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453v1 | link |
2024-06-27 | Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods | Filipe Gama et.al. | 2406.17382v2 | null |
2024-06-24 | High-resolution open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2406.16384v1 | null |
2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204v1 | link |
2024-06-21 | Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe | Sandeep Singh Sengar et.al. | 2406.15649v1 | link |
2024-06-24 | Investigating the impact of 2D gesture representation on co-speech gesture generation | Teo Guichoux et.al. | 2406.15111v2 | null |
2024-06-20 | Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Moira Shooter et.al. | 2406.14412v1 | null |
2024-06-20 | PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Sihan Ma et.al. | 2406.14367v1 | null |
2024-06-19 | NeRF-Feat: 6D Object Pose Estimation using Feature Rendering | Shishir Reddy Vutukur et.al. | 2406.13796v1 | null |
2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588v1 | null |
2024-06-19 | MVSBoost: An Efficient Point Cloud-based 3D Reconstruction | Umair Haroon et.al. | 2406.13515v1 | null |
2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464v1 | null |
2024-06-18 | Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings | Ruijie Tang et.al. | 2406.13048v1 | null |
2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766v1 | null |
2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743v1 | null |
2024-06-17 | SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking | Tianhong Catherine Yu et.al. | 2406.11645v1 | null |
2024-06-14 | Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization | Wonho Song et.al. | 2406.11599v1 | null |
2024-06-15 | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception | M. Mahbubur Rahman et.al. | 2406.10708v1 | link |
2024-06-15 | Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference | Shayan Shekarforoush et.al. | 2406.10455v1 | null |
2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences | Bria Long et.al. | 2406.10447v1 | null |
2024-06-14 | OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics | Yoni Gozlan et.al. | 2406.09788v1 | null |
2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613v1 | link |
2024-06-13 | Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV | Maneesha Wickramasuriya et.al. | 2406.09260v1 | link |
2024-06-14 | Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning | Huy Hoang Nguyen et.al. | 2406.09039v2 | null |
2024-06-14 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394v2 | link |
2024-06-12 | Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization | Jiaxin Deng et.al. | 2406.08001v1 | null |
2024-06-12 | IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes | Fengtian Lang et.al. | 2406.07937v1 | link |
2024-06-12 | From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers | Swaminathan Gurumurthy et.al. | 2406.07785v1 | link |
2024-06-12 | SPIN: Spacecraft Imagery for Navigation | Javier Montalvo et.al. | 2406.07500v2 | link |
2024-06-11 | Realistic Data Generation for 6D Pose Estimation of Surgical Instruments | Juan Antonio Barragan et.al. | 2406.07328v1 | link |
2024-06-11 | SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale | Shester Gueuwou et.al. | 2406.06907v1 | null |
2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374v1 | link |
2024-06-08 | A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks | Muhammad Suhail Saleem et.al. | 2406.05522v1 | null |
2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340v1 | link |
2024-06-06 | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jiyao Zhang et.al. | 2406.04316v1 | null |
2024-06-05 | Hi5: 2D Hand Pose Estimation with Zero Human Annotation | Masum Hasan et.al. | 2406.03599v1 | null |
2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977v1 | null |
2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509v1 | null |
2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914v1 | null |
2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832v1 | link |
2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630v1 | null |
2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196v1 | null |
2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384v1 | link |
2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084v1 | null |
2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614v1 | null |
2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531v1 | null |
2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173v1 | null |
2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940v1 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421v1 | link |
2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397v1 | null |
2024-05-27 | Weiquan Wang et.al. | 2405.17016v1 | null | |
2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867v1 | null |
2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464v1 | link |
2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008v1 | null |
2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731v1 | link |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467v1 | link |
2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235v1 | link |
2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728v1 | null |
2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646v1 | link |
2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247v1 | null |
2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070v1 | link |
2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677v1 | link |
2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448v1 | null |
2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257v1 | null |
2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129v1 | link |
2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557v1 | null |
2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423v1 | null |
2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320v2 | null |
2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059v1 | null |
2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483v1 | link |
2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434v1 | null |
2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801v1 | link |
2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429v1 | link |
2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027v1 | link |
2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959v1 | null |
2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845v1 | link |
2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241v1 | null |
2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858v2 | null |
2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817v1 | null |
2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807v1 | null |
2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526v1 | null |
2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428v1 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216v1 | link |
2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164v1 | null |
2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890v1 | null |
2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609v1 | null |
2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959v1 | link |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689v1 | null |
2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545v1 | link |
2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055v1 | null |
2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880v1 | null |
2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241v1 | link |
2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114v1 | link |
2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918v1 | null |
2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673v2 | null |
2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472v1 | null |
2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284v1 | null |
2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112v1 | null |
2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107v1 | null |
2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066v2 | null |
2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600v1 | null |
2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541v1 | link |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401v1 | link |
2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279v1 | link |
2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174v1 | null |
2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628v1 | null |
2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395v1 | null |
2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394v1 | null |
2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837v1 | null |
2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685v1 | null |
2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215v1 | null |
2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063v1 | link |
2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802v1 | null |
2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558v1 | null |
2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552v1 | link |
2024-04-25 | COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471v1 | link |
2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370v1 | null |
2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136v1 | link |
2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276v1 | link |
2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885v2 | link |
2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835v1 | null |
2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634v1 | null |
2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025v1 | link |
2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896v2 | null |
2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698v1 | link |
2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346v1 | link |
2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440v1 | null |
2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183v1 | null |
2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144v1 | link |
2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205v1 | null |
2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139v1 | null |
2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111v1 | link |
2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880v1 | null |
2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687v1 | null |
2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438v1 | null |
2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213v1 | null |
2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212v1 | link |
2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748v1 | null |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308v1 | link |
2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928v1 | link |
2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504v2 | null |
2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649v1 | null |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603v1 | null |
2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124v1 | null |
2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094v1 | null |
2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926v1 | null |
2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337v1 | link |
2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050v1 | null |
2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705v1 | null |
2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626v1 | null |
2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518v1 | link |
2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414v1 | null |
2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151v1 | null |
2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193v1 | null |
2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518v1 | link |
2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256v1 | null |
2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159v1 | link |
2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567v1 | null |
2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544v1 | link |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125v1 | null |
2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041v1 | link |
2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891v1 | null |
2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691v1 | null |
2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676v1 | null |
2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658v2 | link |
2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132v1 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251v1 | null |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031v1 | null |
2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926v2 | link |
2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527v1 | link |
2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791v1 | link |
2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259v1 | null |
2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104v1 | null |
2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080v1 | link |
2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893v1 | link |
2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837v1 | link |
2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827v1 | null |
2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788v1 | null |
2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103v1 | link |
2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490v1 | null |
2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428v1 | link |
2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411v1 | null |
2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400v1 | link |
2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238v1 | null |
2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198v1 | null |
2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705v1 | link |
2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612v1 | link |
2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571v1 | null |
2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333v1 | null |
2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272v1 | null |
2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203v1 | null |
2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048v1 | null |
2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973v1 | link |
2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743v1 | link |
2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559v1 | null |
2024-03-23 | Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset | Andrea Avogaro et.al. | 2403.14447v2 | null |
2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326v1 | null |
2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279v1 | null |
2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683v1 | link |
2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647v1 | link |
2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434v1 | null |
2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405v1 | null |
2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365v1 | null |
2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348v1 | null |
2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960v1 | link |
2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959v1 | link |
2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728v1 | link |
2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682v1 | null |
2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676v1 | null |
2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440v1 | null |
2024-03-20 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434v2 | link |
2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370v1 | null |
2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011v1 | null |
2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665v1 | null |
2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639v1 | null |
2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627v1 | link |
2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510v1 | null |
2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310v1 | link |
2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247v1 | link |
2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874v1 | null |
2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773v1 | null |
2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-14 | ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image | Fangqiang Ding et.al. | 2403.09871v1 | null |
2024-03-14 | BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects | Tomas Hodan et.al. | 2403.09799v1 | null |
2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596v1 | null |
2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437v1 | null |
2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407v1 | null |
2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317v1 | link |
2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309v1 | null |
2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650v1 | null |
2024-03-15 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586v2 | null |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125v1 | null |
2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019v1 | link |
2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741v1 | null |
2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535v1 | link |
2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437v1 | null |
2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219v1 | null |
2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862v1 | null |
2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577v1 | null |
2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164v1 | link |
2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090v1 | link |
2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666v1 | null |
2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450v2 | null |
2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936v1 | null |
2024-03-07 | That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755v1 | null |
2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444v1 | link |
2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381v2 | link |
2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221v1 | null |
2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122v1 | null |
2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111v1 | null |
2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751v1 | null |
2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913v1 | link |
2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813v1 | null |
2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517v1 | null |
2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263v1 | null |
2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110v1 | null |
2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988v1 | null |
2024-03-04 | TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049v2 | null |
2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062v2 | null |
2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844v1 | link |
2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330v1 | link |
2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320v1 | null |
2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196v1 | link |
2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066v1 | link |
2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504v1 | null |
2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062v1 | link |
2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640v1 | null |
2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607v1 | null |
2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308v1 | null |
2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175v1 | null |
2024-02-25 | VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Xudong Cai et.al. | 2402.15961v1 | link |
2024-02-24 | CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge | Xiao Lin et.al. | 2402.15726v1 | null |
2024-02-23 | Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones | Matteo Risso et.al. | 2402.15273v1 | null |
2024-02-22 | Cameras as Rays: Pose Estimation via Ray Diffusion | Jason Y. Zhang et.al. | 2402.14817v1 | null |
2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | Jialun Pei et.al. | 2402.14461v1 | link |
2024-02-22 | VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning | Jingyao Li et.al. | 2402.14456v1 | null |
2024-02-22 | Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks | Daniel Holmberg et.al. | 2402.14400v1 | link |
2024-02-22 | Secure Navigation using Landmark-based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2402.14280v1 | null |
2024-02-21 | SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings | Rishabh Bajpai et.al. | 2402.14143v1 | null |
2024-02-21 | High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks | Luca Crupi et.al. | 2402.13756v1 | null |
2024-02-21 | EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization | Zhendong Xiao et.al. | 2402.13537v1 | null |
2024-02-20 | DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation | Takuya Ikeda et.al. | 2402.12647v1 | link |
2024-02-19 | Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment | Ganesh Sapkota et.al. | 2402.12551v1 | null |
2024-02-18 | Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training | Huayi Zhou et.al. | 2402.11566v1 | link |
2024-02-17 | Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review | Merryn D. Constable et.al. | 2402.11288v1 | null |
2024-02-17 | Dense Matchers for Dense Tracking | Tomáš Jelínek et.al. | 2402.11287v1 | null |
2024-02-16 | Occlusion Resilient 3D Human Pose Estimation | Soumava Kumar Roy et.al. | 2402.11036v1 | null |
2024-02-16 | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Tsung-Wei Ke et.al. | 2402.10885v1 | null |
2024-02-15 | Lester: rotoscope animation through video object segmentation and tracking | Ruben Tous et.al. | 2402.09883v1 | link |
2024-02-15 | Foul prediction with estimated poses from soccer broadcast video | Jiale Fang et.al. | 2402.09650v1 | null |
2024-02-16 | IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture | Varun Ramani et.al. | 2402.08923v2 | null |
2024-02-13 | Are Semi-Dense Detector-Free Methods Good at Matching Local Features? | Matthieu Vilain et.al. | 2402.08671v1 | null |
2024-02-13 | Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities | Syed S. Ahmed et.al. | 2402.08566v1 | null |
2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359v1 | link |
2024-02-12 | Extending 3D body pose estimation for robotic-assistive therapies of autistic children | Laura Santos et.al. | 2402.08006v1 | null |
2024-02-12 | GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance | Shiyu Li et.al. | 2402.07677v1 | link |
2024-02-12 | UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments | Ahmed Radwan et.al. | 2402.07537v1 | null |
2024-02-09 | Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Peter Hönig et.al. | 2402.06436v1 | null |
2024-02-08 | Real-time Holistic Robot Pose Estimation with Unknown States | Shikun Ban et.al. | 2402.05655v1 | link |
2024-02-08 | Extending 6D Object Pose Estimators for Stereo Vision | Thomas Pöllabauer et.al. | 2402.05610v1 | null |
2024-02-09 | NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction | Zhongqun Zhang et.al. | 2402.05532v2 | null |
2024-02-07 | Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training | Thomas Pöllabauer et.al. | 2402.04979v1 | null |
2024-02-07 | 4-Dimensional deformation part model for pose estimation using Kalman filter constraints | Enrique Martinez-Berti et.al. | 2402.04953v1 | null |
2024-02-07 | STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation | Peter Hönig et.al. | 2402.04878v1 | link |
2024-02-05 | A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model | Murad Hasan et.al. | 2402.03417v1 | null |
2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Mingrui Li et.al. | 2402.03246v1 | link |
2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | Yujing Sun et.al. | 2402.02800v1 | link |
2024-02-04 | Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation | Ti Wang et.al. | 2402.02339v1 | null |
2024-02-01 | mmID: High-Resolution mmWave Imaging for Human Identification | Sakila S. Jayaweera et.al. | 2402.00996v1 | null |
2024-02-01 | In-Bed Pose Estimation: A Review | Ziya Ata Yazıcı et.al. | 2402.00700v1 | null |
2024-02-01 | WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness | Mateus Valverde Gasparino et.al. | 2402.00683v1 | link |
2024-02-02 | CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration | Daniele Cattaneo et.al. | 2402.00129v2 | null |
2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083v1 | link |
2024-01-30 | Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics | Nastaran Darabi et.al. | 2401.17481v1 | null |
2024-01-30 | MESA: Matching Everything by Segmenting Anything | Yesheng Zhang et.al. | 2401.16741v1 | null |
2024-01-30 | Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers | Jianbin Jiao et.al. | 2401.16700v1 | link |
2024-01-29 | Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation | Jaewoo Park et.al. | 2401.16284v1 | null |
2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173v1 | link |
2024-01-28 | Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras | Yu-Jhe Li et.al. | 2401.15616v1 | null |
2024-01-30 | Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization | Kihoon Shin et.al. | 2401.15313v2 | null |
2024-01-26 | Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones | Beatrice Alessandra Motetti et.al. | 2401.15236v1 | null |
2024-01-26 | SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras | Hanz Cuevas-Velasquez et.al. | 2401.14785v1 | null |
2024-01-24 | Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter | Dongmyoung Lee et.al. | 2401.13405v1 | null |
2024-01-24 | Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry | Qi Cai et.al. | 2401.13357v1 | null |
2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076v1 | link |
2024-01-24 | RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos | Hongchi Xia et.al. | 2401.12592v2 | null |
2024-01-26 | MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR | Changkun Liu et.al. | 2401.11511v2 | null |
2024-01-19 | SCENES: Subpixel Correspondence Estimation With Epipolar Supervision | Dominik A. Kloepfer et.al. | 2401.10886v1 | null |
2024-01-19 | Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation | Prakhar Kaushik et.al. | 2401.10848v1 | null |
2024-01-22 | TEXterity: Tactile Extrinsic deXterity | Antonia Bronars et.al. | 2401.10230v2 | null |
2024-01-18 | Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework | Junkun Jiang et.al. | 2401.09836v1 | link |
2024-01-17 | DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing | Hao Qu et.al. | 2401.09160v1 | null |
2024-01-17 | PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Yue Pan et.al. | 2401.09101v1 | link |
2024-01-16 | AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization | Qi Liao et.al. | 2401.08360v1 | null |
2024-01-16 | S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera | Thanh Nguyen Canh et.al. | 2401.08134v1 | null |
2024-01-15 | Collaboratively Self-supervised Video Representation Learning for Action Recognition | Jie Zhang et.al. | 2401.07584v1 | null |
2024-01-14 | 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework | Fan Zhang et.al. | 2401.07251v1 | null |
2024-01-11 | On the representation and methodology for wide and short range head pose estimation | Alejandro Cobo et.al. | 2401.05807v1 | link |
2024-01-10 | Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects | Tianhang Cheng et.al. | 2401.05236v1 | link |
2024-01-10 | Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits | Helena Russello et.al. | 2401.05202v1 | null |
2024-01-10 | Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton | Hongbo Kang et.al. | 2401.04921v1 | link |
2024-01-15 | Towards Real-World Aerial Vision Guidance with Categorical 6D Pose Tracker | Jingtao Sun et.al. | 2401.04377v2 | link |
2024-01-07 | RHOBIN Challenge: Reconstruction of Human Object Interaction | Xianghui Xie et.al. | 2401.04143v1 | null |
2024-01-08 | D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement | Danqi Yan et.al. | 2401.03914v1 | null |
2024-01-07 | Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems | Victor Adewopo et.al. | 2401.03587v1 | null |
2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383v1 | null |
2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher et.al. | 2401.02357v1 | null |
2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer et.al. | 2401.02281v1 | link |
2024-01-03 | Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique | Ekram Alam et.al. | 2401.01587v1 | link |
2024-01-05 | PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization | Jiaming He et.al. | 2401.01081v2 | link |
2023-12-30 | 3D Human Pose Perception from Egocentric Stereo Videos | Hiroyasu Akada et.al. | 2401.00889v1 | null |
2024-01-01 | Geometry Depth Consistency in RGBD Relative Pose Estimation | Sourav Kumar et.al. | 2401.00639v1 | null |
2023-12-30 | A comprehensive framework for occluded human pose estimation | Linhao Xu et.al. | 2401.00155v1 | null |
2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029v2 | null |
2023-12-29 | MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments | Andrew Fishberg et.al. | 2312.17731v1 | link |
2023-12-28 | iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views | Chin-Hsuan Wu et.al. | 2312.17250v1 | link |
2023-12-28 | EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion | Jianping Jiang et.al. | 2312.16933v1 | null |
2023-12-28 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Zikang Yuan et.al. | 2312.16800v1 | link |
2023-12-28 | L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry | Feiya Li et.al. | 2312.16787v1 | null |
2023-12-27 | HMP: Hand Motion Priors for Pose and Shape Estimation from Video | Enes Duran et.al. | 2312.16737v1 | null |
2023-12-27 | Camera calibration for the surround-view system: a benchmark and dataset | L Qin et.al. | 2312.16499v1 | null |
2023-12-24 | TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions | Rohit Lal et.al. | 2312.16221v1 | link |
2023-12-26 | Graph Context Transformation Learning for Progressive Correspondence Pruning | Junwen Guo et.al. | 2312.15971v1 | link |
2023-12-25 | Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation | Feng Zhou et.al. | 2312.15636v1 | null |
2023-12-25 | APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Yuxiang Yang et.al. | 2312.15612v1 | link |
2023-12-23 | PACE: Pose Annotations in Cluttered Environments | Yang You et.al. | 2312.15130v1 | link |
2023-12-22 | PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF | Mohsen Gholami et.al. | 2312.14915v1 | link |
2023-12-22 | Harnessing Diffusion Models for Visual Perception with Meta Prompts | Qiang Wan et.al. | 2312.14733v1 | link |
2023-12-22 | Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization | Joaquin Rodriguez et.al. | 2312.14697v1 | link |
2023-12-22 | PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer | Neha Sengar et.al. | 2312.14577v1 | null |
2023-12-22 | Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning | Jay Shenoy et.al. | 2312.14432v1 | null |
2023-12-21 | 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera | Christen Millerdurai et.al. | 2312.14157v1 | null |
2023-12-21 | DUSt3R: Geometric 3D Vision Made Easy | Shuzhe Wang et.al. | 2312.14132v1 | link |
2023-12-20 | NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields | Jens Naumann et.al. | 2312.13471v1 | null |
2023-12-20 | Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach | Habib Boloorchi Tabrizi et.al. | 2312.13162v1 | link |
2023-12-18 | Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics | Yesukhei Jagvaral et.al. | 2312.11707v1 | null |
2023-12-18 | Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface | Vicu-Mihalis Maer et.al. | 2312.11401v1 | null |
2023-12-17 | SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation | Xiaoqi An et.al. | 2312.10758v1 | link |
2023-12-17 | PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields | Boming Zhao et.al. | 2312.10649v1 | null |
2023-12-15 | SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation | David C. Jeong et.al. | 2312.10195v1 | link |
2023-12-14 | iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching | Yuan Sun et.al. | 2312.09031v1 | null |
2023-12-14 | Scene 3-D Reconstruction System in Scattering Medium | Zhuoyifan Zhang et.al. | 2312.09005v1 | null |
2023-12-14 | CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming | Kian Eng Ong et.al. | 2312.08764v1 | link |
2023-12-20 | PnP for Two-Dimensional Pose Estimation | Joshua Wang et.al. | 2312.08488v2 | link |
2023-12-13 | Pose and shear-based tactile servoing | John Lloyd et.al. | 2312.08411v1 | null |
2023-12-13 | FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | Bowen Wen et.al. | 2312.08344v1 | link |
2023-12-13 | Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation | Arul Selvam Periyasamy et.al. | 2312.08268v1 | null |
2023-12-13 | CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation | Eugenio Chisari et.al. | 2312.08240v1 | null |
2023-12-13 | C-BEV: Contrastive Bird's Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation | Florian Fervers et.al. | 2312.08060v1 | null |
2023-12-13 | Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation | Jingwei Yang et.al. | 2312.07964v1 | null |
2023-12-13 | Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users | Tianxun Zhou et.al. | 2312.07854v1 | null |
2023-12-12 | RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Peng Lu et.al. | 2312.07526v1 | link |
2023-12-12 | COLMAP-Free 3D Gaussian Splatting | Yang Fu et.al. | 2312.07504v1 | null |
2023-12-12 | RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments | Pavel Petracek et.al. | 2312.07337v1 | link |
2023-12-12 | Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs | Sunghwan Hong et.al. | 2312.07246v1 | link |
2023-12-12 | Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation | Yuchen Yang et.al. | 2312.07051v1 | link |
2023-12-12 | Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation | Nikhil Kashyap et.al. | 2312.06965v1 | null |
2023-12-12 | Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer's Mice | Soham Bafana et.al. | 2312.06914v1 | link |
2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865v1 | link |
2023-12-11 | Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input | Trung-Hieu Hoang et.al. | 2312.06797v1 | null |
2023-12-11 | 3D Hand Pose Estimation in Egocentric Images in the Wild | Aditya Prakash et.al. | 2312.06583v1 | null |
2023-12-11 | PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation | Zhiyu Pan et.al. | 2312.06409v1 | null |
2023-12-11 | ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation | Cédric Rommel et.al. | 2312.06386v1 | link |
2023-12-10 | From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation | Javier Tirado-Garín et.al. | 2312.05995v1 | link |
2023-12-09 | You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception | Sheng Jin et.al. | 2312.05525v1 | link |
2023-12-07 | Image and AIS Data Fusion Technique for Maritime Computer Vision Applications | Emre Gülsoylu et.al. | 2312.05270v1 | link |
2023-12-07 | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection | Kohei Yamashita et.al. | 2312.04527v1 | null |
2023-12-07 | Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images | Yiqun Zhang et.al. | 2312.04236v1 | null |
2023-12-06 | Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning | Xinshun Wang et.al. | 2312.03703v1 | link |
2023-12-06 | Cooperative Probabilistic Trajectory Forecasting under Occlusion | Anshul Nayak et.al. | 2312.03296v1 | null |
2023-12-05 | A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis | Niccolò Bisagno et.al. | 2312.02613v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-05 | PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation | Geonhyup Lee et.al. | 2312.02531v1 | null |
2023-12-04 | GenEM: Physics-Informed Generative Cryo-Electron Microscopy | Jiakai Zhang et.al. | 2312.02235v1 | null |
2023-12-02 | Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors | Yu Zhang et.al. | 2312.02196v1 | link |
2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141v1 | link |
2023-12-04 | SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Nikhil Keetha et.al. | 2312.02126v1 | link |
2023-12-04 | Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection | Xubin Zhong et.al. | 2312.01713v1 | null |
2023-12-05 | Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Yizhou Wang et.al. | 2312.01697v2 | link |
2023-12-04 | Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks | Yan Xu et.al. | 2312.01561v1 | null |
2023-12-01 | Object 6D pose estimation meets zero-shot learning | Andrea Caraffa et.al. | 2312.00947v1 | null |
2023-12-01 | Open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2312.00690v1 | null |
2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500v1 | null |
2023-12-01 | Learning Unorthogonalized Matrices for Rotation Estimation | Kerui Gu et.al. | 2312.00462v1 | null |
2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836v1 | null |
2023-11-30 | FoundPose: Unseen Object Pose Estimation with Foundation Features | Evin Pınar Örnek et.al. | 2311.18809v1 | null |
2023-11-30 | Pose Estimation and Tracking for ASIST | Ari Goodman et.al. | 2311.18665v1 | null |
2023-11-29 | A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem | Wolfgang Hoegele et.al. | 2311.18107v1 | null |
2023-11-29 | Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2311.17891v1 | link |
2023-11-29 | Cinematic Behavior Transfer via NeRF-based Differentiable Filming | Xuekun Jiang et.al. | 2311.17754v1 | null |
2023-11-29 | PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens | Sebastian Stapf et.al. | 2311.17504v1 | null |
2023-11-28 | On the Calibration of Human Pose Estimation | Kerui Gu et.al. | 2311.17105v1 | null |
2023-11-28 | Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence | Junyi Zhang et.al. | 2311.17034v1 | link |
2023-11-28 | HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors | Shutong Zhang et.al. | 2311.16552v1 | null |
2023-11-28 | Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement | Jian Wang et.al. | 2311.16495v1 | null |
2023-11-24 | UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | Zhongyu Jiang et.al. | 2311.16477v1 | null |
2023-11-27 | DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization | Zhaoyang Xia et.al. | 2311.16060v1 | link |
2023-11-27 | Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid | Yukai Tang et.al. | 2311.15962v1 | null |
2023-11-27 | Computer Vision for Carriers: PATRIOT | Ari Goodman et.al. | 2311.15914v1 | null |
2023-11-27 | SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Jiehong Lin et.al. | 2311.15707v1 | link |
2023-11-24 | RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling | Xiaoyue Wan et.al. | 2311.14242v1 | null |
2023-11-23 | Appearance-based gaze estimation enhanced with synthetic images using deep neural networks | Dmytro Herashchenko et.al. | 2311.14175v1 | link |
2023-11-23 | GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence | Van Nguyen Nguyen et.al. | 2311.14155v1 | link |
2023-11-23 | GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence | Pengyuan Wang et.al. | 2311.13777v1 | null |
2023-11-22 | HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation | Chengpeng Wu et.al. | 2311.13615v1 | link |
2023-11-24 | Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor | Di Wang et.al. | 2311.12778v2 | null |
2023-11-21 | HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation | Yongliang Lin et.al. | 2311.12588v1 | link |
2023-11-21 | CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems | Young-Hee Lee et.al. | 2311.12580v1 | null |
2023-11-21 | HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling | Afshin Bozorgpour et.al. | 2311.12486v1 | link |
2023-11-21 | Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency | Christian Keilstrup Ingwersen et.al. | 2311.12421v1 | null |
2023-11-20 | Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models | Pooya Fayyazsanavi et.al. | 2311.12128v1 | link |
2023-11-20 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Wenhao Li et.al. | 2311.12028v1 | link |
2023-11-20 | SniffyArt: The Dataset of Smelling Persons | Mathias Zinnen et.al. | 2311.11888v1 | null |
2023-11-21 | Robot Hand-Eye Calibration using Structure-from-Motion | Nicolas Andreff et.al. | 2311.11808v2 | null |
2023-11-18 | SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation | Yamei Chen et.al. | 2311.11125v1 | link |
2023-11-18 | Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment | Parth Rawal et.al. | 2311.11039v1 | null |
2023-11-18 | Multiple View Geometry Transformers for 3D Human Pose Estimation | Ziwei Liao et.al. | 2311.10983v1 | link |
2023-11-18 | Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process | Zixun Huang et.al. | 2311.10918v1 | null |
2023-11-17 | BiHRNet: A Binary high-resolution network for Human Pose Estimation | Zhicheng Zhang et.al. | 2311.10296v1 | null |
2023-11-16 | Match and Locate: low-frequency monocular odometry based on deep feature matching | Stepan Konev et.al. | 2311.10034v1 | null |
2023-11-16 | LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters | Yibin Wu et.al. | 2311.09887v1 | link |
2023-11-16 | Improved TokenPose with Sparsity | Anning Li et.al. | 2311.09653v1 | null |
2023-11-16 | Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation | Yangzheng Wu et.al. | 2311.09500v1 | null |
2023-11-15 | NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios | En-Te Lin et.al. | 2311.09269v1 | link |
2023-11-15 | Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation | Abhishek Goudar et.al. | 2311.09056v1 | link |
2023-11-14 | LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping | Sujal Vijayaraghavan et.al. | 2311.08438v1 | null |
2023-11-13 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | Ziyi Lin et.al. | 2311.07575v1 | link |
2023-11-13 | Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers | Luca Lach et.al. | 2311.07257v1 | link |
2023-11-10 | CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM | Ruben Sanchez-Garcia et.al. | 2311.06194v1 | link |
2023-11-10 | 2D Image head pose estimation via latent space regression under occlusion settings | José Celestino et.al. | 2311.06038v1 | link |
2023-11-10 | Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous | Ziwei Wang et.al. | 2311.05992v1 | null |
2023-11-10 | A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries | Stefan Zellmann et.al. | 2311.05887v1 | link |
2023-11-09 | Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking | Mederic Fourmy et.al. | 2311.05344v1 | null |
2023-11-09 | Spatial Attention-based Distribution Integration Network for Human Pose Estimation | Sihan Gao et.al. | 2311.05323v1 | null |
2023-11-09 | SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing | Arunkumar Rathinam et.al. | 2311.05310v1 | null |
2023-11-09 | Differentiable Cloth Parameter Identification and State Estimation in Manipulation | Dongzhe Zheng et.al. | 2311.05141v1 | null |
2023-11-09 | POISE: Pose Guided Human Silhouette Extraction under Occlusions | Arindam Dutta et.al. | 2311.05077v1 | link |
2023-11-08 | Active Transfer Learning for Efficient Video-Specific Human Pose Estimation | Hiromu Taketsugu et.al. | 2311.05041v1 | link |
2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699v1 | null |
2023-11-09 | Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations | Xiaoting Yin et.al. | 2311.04591v2 | link |
2023-11-08 | Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images | Nishant Jain et.al. | 2311.04521v1 | null |
2023-11-08 | PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points | Tong Hua et.al. | 2311.04477v1 | null |
2023-11-08 | UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields | Injae Kim et.al. | 2311.03784v2 | link |
2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312v1 | null |
2023-11-06 | Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction | Silvia Romero-Azpitarte et.al. | 2311.03146v1 | null |
2023-11-06 | Simultaneous Time Synchronization and Mutual Localization for Multi-robot System | Xiangyong Wen et.al. | 2311.02948v1 | null |
2023-11-06 | Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation | Xueyan Oh et.al. | 2311.02900v1 | null |
2023-11-06 | Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning | Nobline Yoo et.al. | 2311.02815v1 | link |
2023-11-03 | Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression | Jiaqi Wu et.al. | 2311.01782v1 | link |
2023-11-03 | Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation | Jiaqi Wu et.al. | 2311.01770v1 | null |
2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380v1 | link |
2023-11-01 | A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios | Wenyang Hu et.al. | 2311.00401v1 | null |
2023-10-31 | HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception | Junkun Yuan et.al. | 2310.20695v1 | link |
2023-10-31 | Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior | Qingqing Zhao et.al. | 2310.20249v1 | null |
2023-10-30 | FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Chaoyu Chen et.al. | 2310.19293v1 | null |
2023-10-29 | Distributed Nonlinear Filtering using Triangular Transport Maps | Daniel Grange et.al. | 2310.19000v1 | null |
2023-10-29 | TIC-TAC: A Framework To Learn And Evaluate Your Covariance | Megh Shukla et.al. | 2310.18953v1 | link |
2023-10-29 | Improving Multi-Person Pose Tracking with A Confidence Network | Zehua Fu et.al. | 2310.18920v1 | null |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-28 | Enhancing Grasping Performance of Novel Objects through an Improved Fine-Tuning Process | Xiao Hu et.al. | 2310.18569v1 | null |
2023-10-27 | ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation | Michael Zechmair et.al. | 2310.18009v1 | null |
2023-10-26 | Learning Extrinsic Dexterity with Parameterized Manipulation Primitives | Shih-Min Yang et.al. | 2310.17785v1 | null |
2023-10-26 | 6-DoF Stability Field via Diffusion Models | Takuma Yoneda et.al. | 2310.17649v1 | null |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-26 | Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs | Ryota Tanaka et.al. | 2310.17193v1 | link |
2023-10-25 | Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers | Gerald Ebmer et.al. | 2310.16618v1 | null |
2023-10-25 | ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors | Xiaoxuan Ma et.al. | 2310.16447v1 | link |
2023-10-25 | MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | Soroush Mehraban et.al. | 2310.16288v1 | link |
2023-10-25 | TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer | Xiao Lin et.al. | 2310.16279v1 | null |
2023-10-23 | Converting Depth Images and Point Clouds for Feature-based Pose Estimation | Robert Lösch et.al. | 2310.14924v1 | link |
2023-10-23 | Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings | Hazem Youssef et.al. | 2310.14914v1 | null |
2023-10-23 | Player Re-Identification Using Body Part Appearences | Mahesh Bhosale et.al. | 2310.14469v1 | null |
2023-10-20 | LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly | Bowen Fu et.al. | 2310.13819v1 | null |
2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605v1 | null |
2023-10-20 | ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs' Navigation | Zhehan Li et.al. | 2310.13324v1 | link |
2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320v1 | link |
2023-10-19 | Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey | Lijuan Zhou et.al. | 2310.13039v1 | null |
2023-10-19 | FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects | Mayank Lunayach et.al. | 2310.12974v1 | link |
2023-10-18 | Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation | Bosang Kim et.al. | 2310.12189v1 | null |
2023-10-18 | One-Shot Imitation Learning: A Pose Estimation Perspective | Pietro Vitiello et.al. | 2310.12077v1 | null |
2023-10-18 | ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map | Ahmed Tawfik Aboukhadra et.al. | 2310.11811v1 | null |
2023-10-17 | Holistic Parking Slot Detection with Polygon-Shaped Representations | Lihao Wang et.al. | 2310.11629v1 | null |
2023-10-17 | Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication | Chelsey Edge et.al. | 2310.11536v1 | null |
2023-10-18 | AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths | Jiaxin Wei et.al. | 2310.09982v2 | link |
2023-10-15 | Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior | Xiaotong Chen et.al. | 2310.09956v1 | null |
2023-10-15 | Socially reactive navigation models for mobile robots in dynamic environments | Ricarte Ribeiro et.al. | 2310.09916v1 | link |
2023-10-15 | MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection | David C. Jeong et.al. | 2310.09757v1 | link |
2023-10-16 | IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints | Mohammed Ayman Shalaby et.al. | 2310.08686v2 | null |
2023-10-12 | Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor | Ozdemir Can Kara et.al. | 2310.08398v1 | null |
2023-10-12 | Multimodal Active Measurement for Human Mesh Recovery in Close Proximity | Takahiro Maeda et.al. | 2310.08116v1 | link |
2023-10-12 | X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention | Yixuan Zhou et.al. | 2310.08042v1 | link |
2023-10-12 | PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction | Jia-Wang Bian et.al. | 2310.07449v2 | link |
2023-10-11 | SAGE-ICP: Semantic Information-Assisted ICP | Jiaming Cui et.al. | 2310.07237v1 | link |
2023-10-11 | DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation | Rong Wang et.al. | 2310.07206v1 | link |
2023-10-12 | FABind: Fast and Accurate Protein-Ligand Binding | Qizhi Pei et.al. | 2310.06763v2 | link |
2023-10-10 | EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation | Baichuan Huang et.al. | 2310.06751v1 | null |
2023-10-09 | Augmenting Vision-Based Human Pose Estimation with Rotation Matrix | Milad Vazan et.al. | 2310.06068v1 | null |
2023-10-07 | Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles | Elton F. de S. Soares et.al. | 2310.04837v1 | null |
2023-10-10 | 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction | Zhishan Zhou et.al. | 2310.04769v2 | null |
2023-10-06 | SwimXYZ: A large-scale dataset of synthetic swimming motions and videos | Fiche Guénolé et.al. | 2310.04360v1 | null |
2023-10-05 | BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields | Ágoston István Csehi et.al. | 2310.03563v1 | null |
2023-10-05 | 3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation | Chen Zhao et.al. | 2310.03534v1 | null |
2023-10-05 | RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation | Boshi An et.al. | 2310.03478v1 | null |
2023-10-05 | Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code | Hongwei Li et.al. | 2310.03470v1 | null |
2023-10-04 | Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC | Hongyi Fan et.al. | 2310.02719v1 | null |
2023-10-05 | USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields | Moyang Li et.al. | 2310.02687v2 | link |
2023-10-03 | Beyond the Benchmark: Detecting Diverse Anomalies in Videos | Yoav Arad et.al. | 2310.01904v1 | link |
2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897v1 | null |
2023-10-02 | LEAP: Liberate Sparse-view 3D Modeling from Camera Poses | Hanwen Jiang et.al. | 2310.01410v1 | link |
2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404v1 | link |
2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527v3 | link |
2023-09-30 | Diff-DOPE: Differentiable Deep Object Pose Estimation | Jonathan Tremblay et.al. | 2310.00463v1 | null |
2023-09-29 | Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration | Jungseok Hong et.al. | 2310.00146v1 | null |
2023-09-29 | Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation | Zhuoran Yu et.al. | 2310.00099v1 | null |
2023-09-29 | Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head | Qian Wu et.al. | 2309.17143v1 | link |
2023-09-29 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | Yunjiao Zhou et.al. | 2309.16964v1 | null |
2023-09-28 | End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon | Guillaume Bono et.al. | 2309.16634v1 | null |
2023-09-28 | Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task | Frederik Hagelskjær et.al. | 2309.16221v1 | null |
2023-09-28 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing | Lu Dai et.al. | 2309.16189v1 | null |
2023-09-28 | Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation | Sameer Pai et.al. | 2309.16170v1 | null |
2023-09-28 | CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting | Shaoxiang Guo et.al. | 2309.16140v1 | null |
2023-09-28 | A Modular Bio-inspired Robotic Hand with High Sensitivity | Chao Liu et.al. | 2309.16081v1 | null |
2023-09-27 | Handbook on Leveraging Lines for Two-View Relative Pose Estimation | Petr Hruby et.al. | 2309.16040v1 | null |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range | Xinran Li et.al. | 2309.15367v1 | null |
2023-09-26 | Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone | Peter Hardy et.al. | 2309.14865v1 | null |
2023-09-26 | Learning Vision-Based Bipedal Locomotion for Challenging Terrain | Helei Duan et.al. | 2309.14594v1 | null |
2023-09-25 | Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators | Yinan Meng et.al. | 2309.14279v1 | null |
2023-09-25 | Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics | Philipp Quentin et.al. | 2309.14265v1 | null |
2023-09-25 | BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation | Uyoung Jeong et.al. | 2309.14072v1 | link |
2023-09-24 | Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset | Zixun Huang et.al. | 2309.13570v1 | link |
2023-09-21 | ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding | Yu Cheng et.al. | 2309.12183v1 | null |
2023-09-21 | ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers | Philipp Ausserlechner et.al. | 2309.11986v1 | null |
2023-09-21 | Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views | Taeho Kang et.al. | 2309.11962v1 | link |
2023-09-21 | A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose | Qingtian Wu et.al. | 2309.11773v1 | null |
2023-09-20 | Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation | Krishna Kanth Nakka et.al. | 2309.11667v1 | null |
2023-09-20 | Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering | Tae Ha Park et.al. | 2309.11645v1 | null |
2023-09-20 | OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving | Heng Li et.al. | 2309.11011v1 | link |
2023-09-19 | Language-Conditioned Affordance-Pose Detection in 3D Point Clouds | Toan Nguyen et.al. | 2309.10911v1 | null |
2023-09-19 | MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings | Surbhi Madan et.al. | 2309.10765v1 | link |
2023-09-19 | SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction | Anilkumar Swamy et.al. | 2309.10748v1 | null |
2023-09-20 | GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild | Simon Schaefer et.al. | 2309.10369v2 | null |
2023-09-19 | RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery | Jiaxin Wei et.al. | 2309.10255v1 | link |
2023-09-18 | Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation | Kathia Melbouci et.al. | 2309.09934v1 | null |
2023-09-18 | Application-driven Validation of Posteriors in Inverse Problems | Tim J. Adler et.al. | 2309.09764v1 | null |
2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563v1 | null |
2023-09-18 | Sparse and Privacy-enhanced Representation for Human Pose Estimation | Ting-Ying Lin et.al. | 2309.09515v1 | null |
2023-09-19 | RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation | Lijun Li et.al. | 2309.09301v2 | link |
2023-09-16 | Optimal Initialization Strategies for Range-Only Trajectory Estimation | Abhishek Goudar et.al. | 2309.09011v1 | null |
2023-09-16 | DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF | Mert Asim Karaoglu et.al. | 2309.08927v1 | link |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild | Sungchan Park et.al. | 2309.08644v1 | null |
2023-09-15 | YCB-Ev: Event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2309.08482v1 | link |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success | Gergely Sóti et.al. | 2309.08040v1 | null |
2023-09-14 | TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting | Rohan Choudhury et.al. | 2309.07910v1 | null |
2023-09-14 | Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation | Thorsten Hempel et.al. | 2309.07654v1 | link |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-14 | Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images | Junyang Wu et.al. | 2309.07390v1 | null |
2023-09-13 | LInKs "Lifting Independent Keypoints" -- Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation | Peter Hardy et.al. | 2309.07243v1 | null |
2023-09-13 | 3D Active Metric-Semantic SLAM | Yuezhan Tao et.al. | 2309.06950v1 | null |
2023-09-11 | ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion | Hongyu Li et.al. | 2309.05662v1 | null |
2023-09-11 | Towards Intuitive HMI for UAV Control | Filip Zoric et.al. | 2309.05460v1 | null |
2023-09-12 | FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild | Jiong Wang et.al. | 2309.05073v2 | link |
2023-09-09 | Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation | Boyuan Jiang et.al. | 2309.04756v1 | link |
2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750v1 | link |
2023-09-08 | Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry | Akankshya Kar et.al. | 2309.04147v1 | null |
2023-09-07 | ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation | Hui Zhang et.al. | 2309.03891v1 | null |
2023-09-05 | An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria | Upender Kalwa et.al. | 2309.03235v1 | null |
2023-09-05 | A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments | W. Jacob Wagner et.al. | 2309.02569v1 | null |
2023-09-05 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction | Youmin Zhang et.al. | 2309.02436v1 | link |
2023-09-05 | DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation | Lei Zhou et.al. | 2309.01925v1 | link |
2023-09-04 | On the Query Strategies for Efficient Online Active Distillation | Michele Boldo et.al. | 2309.01612v1 | null |
2023-09-04 | DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion | Cédric Rommel et.al. | 2309.01575v1 | null |
2023-09-06 | Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | Hanbing Liu et.al. | 2309.01365v2 | link |
2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324v1 | null |
2023-09-03 | BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking | Dorian F. Henning et.al. | 2309.01236v1 | null |
2023-09-02 | Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis | Jerrin Bright et.al. | 2309.01010v1 | null |
2023-09-01 | Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture | Shaohua Pan et.al. | 2309.00310v1 | link |
2023-08-31 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild | Manuel Kaufmann et.al. | 2308.16894v1 | link |
2023-08-31 | SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects | Ning Gao et.al. | 2308.16528v1 | null |
2023-08-30 | Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports | İrem Üstek et.al. | 2308.16325v1 | link |
2023-08-30 | SignDiff: Learning Diffusion Models for American Sign Language Production | Sen Fang et.al. | 2308.16082v1 | null |
2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984v1 | link |
2023-08-30 | Reconstructing Groups of People with Hypergraph Relational Reasoning | Buzhen Huang et.al. | 2308.15844v1 | link |
2023-08-29 | 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking | Urs Waldmann et.al. | 2308.15316v1 | link |
2023-08-29 | Spatio-temporal MLP-graph network for 3D human pose estimation | Tanvir Hassan et.al. | 2308.15313v1 | link |
2023-08-29 | Pose-Free Neural Radiance Fields via Implicit Pose Regularization | Jiahui Zhang et.al. | 2308.15049v1 | null |
2023-08-28 | R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras | Aron Schmied et.al. | 2308.14713v1 | null |
2023-08-28 | Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson's Disease | Gabriela T. Acevedo Trebbau et.al. | 2308.14679v1 | null |
2023-08-28 | Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera | Jun Yang et.al. | 2308.14665v1 | null |
2023-08-28 | CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment | Pengcheng Dong et.al. | 2308.14324v1 | null |
2023-08-27 | LDL: Line Distance Functions for Panoramic Localization | Junho Kim et.al. | 2308.13989v1 | link |
2023-08-26 | Prior-guided Source-free Domain Adaptation for Human Pose Estimation | Dripta S. Raychaudhuri et.al. | 2308.13954v1 | null |
2023-08-26 | Vision-Based Human Pose Estimation via Deep Learning: A Survey | Gongjin Lan et.al. | 2308.13872v1 | null |
2023-08-24 | POCO: 3D Pose and Shape Estimation with Confidence | Sai Kumar Dwivedi et.al. | 2308.12965v1 | link |
2023-08-24 | Robot Pose Nowcasting: Forecast the Future to Improve the Present | Alessandro Simoni et.al. | 2308.12914v1 | null |
2023-08-23 | Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map | Timothy D Barfoot et.al. | 2308.12418v1 | null |
2023-08-22 | Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape | Jiacong Xu et.al. | 2308.11737v1 | null |
2023-08-22 | TrackFlow: Multi-Object Tracking with Normalizing Flows | Gianluca Mancusi et.al. | 2308.11513v1 | null |
2023-08-22 | A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles | Yusheng Wang et.al. | 2308.11492v1 | null |
2023-08-22 | PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation | Soubarna Banik et.al. | 2308.11440v1 | null |
2023-08-22 | Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views | Wentian Qu et.al. | 2308.11198v1 | null |
2023-08-21 | Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images | Tze Ho Elden Tse et.al. | 2308.11015v1 | null |
2023-08-21 | Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data | Patrick Ruhkamp et.al. | 2308.10627v1 | null |
2023-08-21 | GaitPT: Skeletons Are All You Need For Gait Recognition | Andy Catruna et.al. | 2308.10623v1 | null |
2023-08-21 | Approximately Equivariant Graph Networks | Ningyuan Huang et.al. | 2308.10436v1 | link |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-20 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video | Yingxuan You et.al. | 2308.10305v1 | link |
2023-08-20 | OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision | Shujie Zhang et.al. | 2308.10146v1 | link |
2023-08-19 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation | Yi Zhang et.al. | 2308.10123v1 | link |
2023-08-19 | Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation | Yang Hai et.al. | 2308.10016v1 | link |
2023-08-19 | UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning | Meiqi Sun et.al. | 2308.09953v1 | null |
2023-08-22 | Scene-Aware Feature Matching | Xiaoyong Lu et.al. | 2308.09949v2 | null |
2023-08-18 | PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation | Hanbing Liu et.al. | 2308.09678v1 | link |
2023-08-18 | Improving 3D Pose Estimation for Sign Language | Maksym Ivashechkin et.al. | 2308.09525v1 | null |
2023-08-18 | Denoising Diffusion for 3D Hand Pose Estimation from Images | Maksym Ivashechkin et.al. | 2308.09523v1 | null |
2023-08-18 | ResQ: Residual Quantization for Video Perception | Davide Abati et.al. | 2308.09511v1 | null |
2023-08-17 | MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices | Dongyang Yu et.al. | 2308.09084v1 | null |
2023-08-17 | Pedestrian Environment Model for Automated Driving | Adrian Holzbock et.al. | 2308.09080v1 | link |
2023-08-17 | Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction | Yuhao Yang et.al. | 2308.08518v2 | null |
2023-08-16 | View Consistent Purification for Accurate Cross-View Localization | Shan Wang et.al. | 2308.08110v1 | null |
2023-08-15 | Learning Better Keypoints for Multi-Object 6DoF Pose Estimation | Yangzheng Wu et.al. | 2308.07827v1 | link |
2023-08-14 | Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation | Huan Liu et.al. | 2308.07313v1 | link |
2023-08-12 | 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion | Guirong Zhuo et.al. | 2308.06573v1 | null |
2023-08-17 | EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes | Jiaxi Jiang et.al. | 2308.06493v2 | null |
2023-08-11 | Aggressive Aerial Grasping using a Soft Drone with Onboard Perception | Samuel Ubellacker et.al. | 2308.06351v1 | null |
2023-08-11 | VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields | Dominic Maggio et.al. | 2308.05939v1 | null |
2023-08-10 | Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations | Frederike Dümbgen et.al. | 2308.05783v1 | link |
2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459v1 | null |
2023-08-10 | How-to Augmented Lagrangian on Factor Graphs | Barbara Bazzana et.al. | 2308.05444v1 | null |
2023-08-10 | Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation | Jun Zhou et.al. | 2308.05438v1 | link |
2023-08-10 | Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR | Changkun Liu et.al. | 2308.05394v1 | null |
2023-08-10 | Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | Hongbo Kang et.al. | 2308.05298v1 | link |
2023-08-09 | ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction | Weijie Chen et.al. | 2308.04956v1 | null |
2023-08-07 | SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention | Efimia Panagiotaki et.al. | 2308.03718v1 | link |
2023-08-07 | A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior | Jose Sosa et.al. | 2308.03411v1 | null |
2023-08-06 | Source-free Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2308.03202v1 | link |
2023-08-04 | Diffusion-Augmented Depth Prediction with Sparse Annotations | Jiaqi Li et.al. | 2308.02283v1 | null |
2023-08-04 | DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field | Haowen Wang et.al. | 2308.02239v1 | null |
2023-08-07 | Robust Self-Supervised Extrinsic Self-Calibration | Takayuki Kanai et.al. | 2308.02153v2 | null |
2023-08-03 | Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter | Luca Crupi et.al. | 2308.01833v1 | null |
2023-08-03 | Active Acoustic Sensing for Robot Manipulation | Shihan Lu et.al. | 2308.01600v1 | null |
2023-08-02 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | Andrew Guo et.al. | 2308.01477v1 | null |
2023-08-06 | Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes | Bohao Fan et.al. | 2308.00628v2 | link |
2023-08-01 | Markerless human pose estimation for biomedical applications: a survey | Andrea Avogaro et.al. | 2308.00519v1 | null |
2023-08-01 | Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches | Pia Hanfeld et.al. | 2308.00344v1 | link |
2023-08-01 | Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis | Asish Bera et.al. | 2308.00323v1 | null |
2023-08-01 | Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) | Chaochao Zhou et.al. | 2308.00214v1 | null |
2023-07-31 | Lightweight Super-Resolution Head for Human Pose Estimation | Haonan Wang et.al. | 2307.16765v1 | link |
2023-07-31 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2307.16687v1 | null |
2023-07-30 | Touch if it's transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction | Prajval Kumar Murali et.al. | 2307.16254v1 | null |
2023-07-30 | Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems | Cen Liu et.al. | 2307.16117v1 | link |
2023-07-29 | Iterative Graph Filtering Network for 3D Human Pose Estimation | Zaedul Islam et.al. | 2307.16074v1 | link |
2023-07-29 | HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation | Zuyan Liu et.al. | 2307.16061v1 | null |
2023-07-29 | Effective Whole-body Pose Estimation with Two-stages Distillation | Zhendong Yang et.al. | 2307.15880v1 | link |
2023-07-28 | TrackAgent: 6D Object Tracking via Reinforcement Learning | Konstantin Röhrl et.al. | 2307.15671v1 | null |
2023-07-28 | Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation | Jaime Corsetti et.al. | 2307.15514v1 | link |
2023-07-28 | Robust Visual Sim-to-Real Transfer for Robotic Manipulation | Ricardo Garcia et.al. | 2307.15320v1 | null |
2023-07-27 | Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving | Peter Bauer et.al. | 2307.14889v1 | null |
2023-07-26 | Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control | Yijiong Lin et.al. | 2307.14510v1 | null |
2023-07-28 | CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor | Alexandros Filotheou et.al. | 2307.14247v2 | link |
2023-07-26 | Deep Robust Multi-Robot Re-localisation in Natural Environments | Milad Ramezani et.al. | 2307.13950v1 | null |
2023-07-25 | Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior | Jose Sosa et.al. | 2307.13361v1 | null |
2023-07-23 | TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation | Huijie Zhang et.al. | 2307.12400v1 | null |
2023-07-25 | FDCT: Fast Depth Completion for Transparent Objects | Tianan Li et.al. | 2307.12274v2 | link |
2023-07-22 | Challenges for Monocular 6D Object Pose Estimation in Robotics | Stefan Thalhammer et.al. | 2307.12172v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-07-22 | Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence | Yang Tian et.al. | 2307.12106v1 | link |
2023-07-26 | LAMP: Leveraging Language Prompts for Multi-person Pose Estimation | Shengnan Hu et.al. | 2307.11934v2 | link |
2023-07-21 | YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation | Arul Selvam Periyasamy et.al. | 2307.11550v1 | null |
2023-07-21 | KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation | Ivano Donadi et.al. | 2307.11543v1 | link |
2023-07-21 | Semantically-enhanced Deep Collision Prediction for Autonomous Navigation using Aerial Robots | Mihir Kulkarni et.al. | 2307.11522v1 | null |
2023-07-20 | SimCol3D -- 3D Reconstruction during Colonoscopy Challenge | Anita Rau et.al. | 2307.11261v1 | link |
2023-07-20 | MSQNet: Actor-agnostic Action Recognition with Multi-modal Query | Anindya Mondal et.al. | 2307.10763v1 | link |
2023-07-19 | POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities | Rui Wang et.al. | 2307.10387v1 | link |
2023-07-18 | ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting | Hongwei Zheng et.al. | 2307.09026v1 | null |
2023-07-17 | Human Emergency Detection during Autonomous Hospital Transports | Andreas Zachariae et.al. | 2307.08359v1 | link |
2023-07-17 | Self-supervised Monocular Depth Estimation: Let's Talk About The Weather | Kieran Saunders et.al. | 2307.08357v1 | null |
2023-07-20 | Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer | Yujiao Shi et.al. | 2307.08015v3 | link |
2023-07-15 | Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents | Ke Cao et.al. | 2307.07763v1 | null |
2023-07-13 | Haptic-guided assisted telemanipulation approach for grasping desired objects from heaps | Maxime Adjigble et.al. | 2307.07053v1 | null |
2023-07-13 | Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data | Miroslav Purkrábek et.al. | 2307.06737v1 | link |
2023-07-12 | Deep learning-based estimation of whole-body kinematics from multi-view images | Kien X. Nguyen et.al. | 2307.05896v1 | link |
2023-07-12 | GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human | Bruce X. B. Yu et.al. | 2307.05853v1 | link |
2023-07-09 | TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement | Mahmoud Abdulsalam et.al. | 2307.05561v1 | null |
2023-07-11 | ResMatch: Residual Attention Learning for Local Feature Matching | Yuxin Deng et.al. | 2307.05180v1 | link |
2023-07-07 | Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation | Jessica Yin et.al. | 2307.03839v1 | null |
2023-07-07 | Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | Zhongyu Jiang et.al. | 2307.03833v1 | link |
2023-07-07 | Equivariant Single View Pose Prediction Via Induced and Restricted Representations | Owen Howell et.al. | 2307.03704v1 | null |
2023-07-07 | RCDN -- Robust X-Corner Detection Algorithm based on Advanced CNN Model | Ben Chen et.al. | 2307.03505v1 | null |
2023-07-06 | Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning | Christian Jauch et.al. | 2307.03007v1 | null |
2023-07-06 | Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive | Eran Bamani et.al. | 2307.02949v1 | null |
2023-07-06 | A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition | Orhan Konak et.al. | 2307.02906v1 | null |
2023-07-04 | Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones | Elia Cereda et.al. | 2307.01559v1 | null |
2023-07-03 | Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach | Dongyang Yu et.al. | 2307.01004v1 | null |
2023-07-01 | Automatic Solver Generator for Systems of Laurent Polynomial Equations | Evgeniy Martyushev et.al. | 2307.00320v1 | link |
2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306v1 | link |
2023-06-30 | GIRA: Gaussian Mixture Models for Inference and Robot Autonomy | Kshitij Goel et.al. | 2307.00071v1 | link |
2023-06-30 | Towards the extraction of robust sign embeddings for low resource sign language recognition | Mathieu De Coster et.al. | 2306.17558v1 | null |
2023-06-30 | Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle | Václav Pritzl et.al. | 2306.17544v1 | link |
2023-06-30 | Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization | Stephen Hausler et.al. | 2306.17529v1 | null |
2023-06-29 | ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models | Weihao Cheng et.al. | 2306.17140v1 | null |
2023-06-29 | Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation | Zhongwei Qiu et.al. | 2306.17074v1 | null |
2023-06-28 | Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects | Alireza Rezazadeh et.al. | 2306.15858v1 | null |
2023-06-09 | Data-Link: High Fidelity Manufacturing Datasets for Model2Real Transfer under Industrial Settings | Sunny Katyara et.al. | 2306.05766v1 | null |
2023-05-28 | Counter-Hypothetical Particle Filters for Single Object Pose Tracking | Elizabeth A. Olson et.al. | 2305.17828v1 | null |
2023-05-25 | Enhanced 6D Pose Estimation for Robotic Fruit Picking | Marco Costanzo et.al. | 2305.15856v1 | null |
2023-05-22 | You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example | Walter Goodwin et.al. | 2305.12626v1 | null |
2023-05-18 | Manifold-Aware Self-Training for Unsupervised Domain Adaptation on Regressing 6D Object Pose | Yichen Zhang et.al. | 2305.10808v1 | link |
2023-05-08 | RelPose++: Recovering 6D Poses from Sparse-view Observations | Amy Lin et.al. | 2305.04926v1 | link |
2023-04-17 | Uncovering the Background-Induced bias in RGB based 6-DoF Object Pose Estimation | Elena Govi et.al. | 2304.08230v1 | link |
2023-03-28 | CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects | Nick Heppert et.al. | 2303.15782v1 | link |
2023-03-23 | Prior-free Category-level Pose Estimation with Implicit Space Transformation | Jianhui Liu et.al. | 2303.13479v1 | link |
2023-06-21 | 6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics | Maximilian Ulmer et.al. | 2303.13241v3 | null |
2023-03-22 | Rigidity-Aware Detection for 6D Object Pose Estimation | Yang Hai et.al. | 2303.12396v1 | link |
2023-03-22 | Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation | Heng Yang et.al. | 2303.12246v1 | link |
2023-03-21 | Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation | Fulin Liu et.al. | 2303.11516v1 | link |
2023-03-18 | SOCS: Semantically-aware Object Coordinate Space for Category-Level 6D Object Pose Estimation under Large Shape Variations | Boyan Wan et.al. | 2303.10346v1 | null |
2023-03-12 | Module-Wise Network Quantization for 6D Object Pose Estimation | Saqib Javed et.al. | 2303.06753v1 | link |
2023-03-09 | SpyroPose: Importance Sampling Pyramids for Object Pose Distribution Estimation in SE(3) | Rasmus Laurvig Haugaard et.al. | 2303.05308v1 | null |
2023-03-03 | Depth-based 6DoF Object Pose Estimation using Swin Transformer | Zhujun Li et.al. | 2303.02133v1 | link |
2023-03-02 | Canonical mapping as a general-purpose object descriptor for robotic manipulation | Benjamin Joffe et.al. | 2303.01331v1 | null |
2023-02-14 | MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation | Dingding Cai et.al. | 2302.07300v1 | null |
2023-02-14 | Model-Based Underwater 6D Pose Estimation from RGB | Davide Sapienza et.al. | 2302.06821v1 | null |
2023-02-02 | A Projective Geometric View for 6D Pose Estimation in mmWave MIMO Systems | Shengqiang Shen et.al. | 2302.00227v2 | null |
2023-01-31 | Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors | Gabriele M. Caddeo et.al. | 2301.13667v1 | link |
2023-01-19 | Learning ultrasound plane pose regression: assessing generalized pose coordinates in the fetal brain | Chiara Di Vece et.al. | 2301.08317v1 | null |
2023-01-19 | RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation | Leonard Bruns et.al. | 2301.08147v1 | link |
2022-12-21 | HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios | HyunJun Jung et.al. | 2212.10428v2 | link |
2022-12-13 | MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare | Yann Labbé et.al. | 2212.06870v1 | null |
2022-12-11 | Context-aware 6D Pose Estimation of Known Objects using RGB-D data | Ankit Kumar et.al. | 2212.05560v1 | null |
2023-01-30 | Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation | Wei Chen et.al. | 2212.04632v2 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-29 | Towards Explaining Uncertainty Estimates in Point Cloud Registration | Ziyuan Qin et.al. | 2412.20612v1 | null |
2024-12-26 | Resolving the Ambiguity of Complete-to-Partial Point Cloud Registration for Image-Guided Liver Surgery with Patches-to-Partial Matching | Zixin Yang et.al. | 2412.19328v1 | null |
2024-12-25 | Cross-PCR: A Robust Cross-Source Point Cloud Registration Framework | Guiyu Zhao et.al. | 2412.18873v1 | null |
2024-12-23 | PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging | Mattias Paul Heinrich et.al. | 2412.17390v1 | null |
2024-12-19 | 3D Registration in 30 Years: A Survey | Jiaqi Yang et.al. | 2412.13735v2 | link |
2024-12-13 | TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes | Yan Xia et.al. | 2412.10308v1 | null |
2024-12-10 | A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM | Zongbo Liao et.al. | 2412.07513v1 | null |
2024-12-07 | AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration | Jiong Lin et.al. | 2412.05507v1 | null |
2024-12-06 | GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration | Yaojie Zhang et.al. | 2412.04855v1 | null |
2024-12-04 | AffordDP: Generalizable Diffusion Policy with Transferable Affordance | Shijie Wu et.al. | 2412.03142v1 | null |
2024-12-04 | QuadricsReg: Large-Scale Point Cloud Registration using Quadric Primitives | Ji Wu et.al. | 2412.02998v1 | null |
2024-12-01 | FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting | Phu Pham et.al. | 2412.00682v1 | null |
2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377v1 | null |
2024-11-22 | EADReg: Probabilistic Correspondence Generation with Efficient Autoregressive Diffusion Model for Outdoor Point Cloud Registration | Linrui Gong et.al. | 2411.15271v1 | null |
2024-11-20 | Automatic marker-free registration based on similar tetrahedras for single-tree point clouds | Jing Ren et.al. | 2411.13069v1 | null |
2024-11-19 | 3D Reconstruction by Looking: Instantaneous Blind Spot Detector for Indoor SLAM through Mixed Reality | Hanbeom Chang et.al. | 2411.12514v1 | null |
2024-11-16 | Deep Loss Convexification for Learning Iterative Models | Ziming Zhang et.al. | 2411.10649v1 | null |
2024-11-12 | 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration | Liyuan Zhang et.al. | 2411.07740v1 | null |
2024-11-04 | Mining and Transferring Feature-Geometry Coherence for Unsupervised Point Cloud Registration | Kezheng Xiong et.al. | 2411.01870v1 | link |
2024-10-30 | UniRiT: Towards Few-Shot Non-Rigid Point Cloud Registration | Geng Li et.al. | 2410.22909v1 | null |
2024-10-29 | Micro-Structures Graph-Based Point Cloud Registration for Balancing Efficiency and Accuracy | Rongling Zhang et.al. | 2410.21857v1 | null |
2024-10-29 | Memory-Efficient Point Cloud Registration via Overlapping Region Sampling | Tomoyasu Shimada et.al. | 2410.21753v1 | null |
2024-10-21 | RANSAC Back to SOTA: A Two-stage Consensus Filtering for Real-time 3D Registration | Pengcheng Shi et.al. | 2410.15682v1 | link |
2024-10-14 | A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration | Renlang Huang et.al. | 2410.10295v1 | link |
2024-10-14 | Kinematic-ICP: Enhancing LiDAR Odometry with Kinematic Constraints for Wheeled Mobile Robots Moving on Planar Surfaces | Tiziano Guadagnino et.al. | 2410.10277v1 | null |
2024-10-10 | LiPO: LiDAR Inertial Odometry for ICP Comparison | Darwin Mick et.al. | 2410.08097v1 | null |
2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729v1 | link |
2024-10-07 | Enhanced Multi-Robot SLAM System with Cross-Validation Matching and Exponential Threshold Keyframe Selection | Ang He et.al. | 2410.05017v1 | null |
2024-10-03 | LoGDesc: Local geometric features aggregation for robust point cloud registration | Karim Slimani et.al. | 2410.02420v1 | link |
2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589v1 | null |
2024-10-01 | TFCT-I2P: Three stream fusion network with color aware transformer for image-to-point cloud registration | Muyao Peng et.al. | 2410.00360v1 | link |
2024-10-06 | KISS-Matcher: Fast and Robust Point Cloud Registration Revisited | Hyungtae Lim et.al. | 2409.15615v2 | link |
2024-09-23 | MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies | Haojie Huang et.al. | 2409.15517v1 | null |
2024-09-22 | SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration | Sara Monji-Azad et.al. | 2409.14474v1 | null |
2024-09-27 | FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator | Bang-Shien Chen et.al. | 2409.13978v2 | link |
2024-09-17 | Enhancing the Reliability of LiDAR Point Cloud Sampling: A Colorization and Super-Resolution Approach Based on LiDAR-Generated Images | Sier Ha et.al. | 2409.11532v1 | null |
2024-09-14 | Registration between Point Cloud Streams and Sequential Bounding Boxes via Gradient Descent | Xuesong Li et.al. | 2409.09312v1 | null |
2024-09-11 | Unsupervised Point Cloud Registration with Self-Distillation | Christian Löwens et.al. | 2409.07558v1 | link |
2024-09-10 | Mahalanobis k-NN: A Statistical Lens for Robust Point-Cloud Registrations | Tejas Anvekar et.al. | 2409.06267v1 | link |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-08 | Sight View Constraint for Robust Point Cloud Registration | Yaojie Zhang et.al. | 2409.05065v1 | null |
2024-08-23 | UMERegRobust - Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration | Yuval Haitman et.al. | 2408.12380v2 | link |
2024-08-21 | Informed, Constrained, Aligned: A Field Analysis on Degeneracy-aware Point Cloud Registration in the Wild | Turcan Tuna et.al. | 2408.11809v1 | null |
2024-08-20 | LoopSplat: Loop Closure by Registering 3D Gaussian Splats | Liyuan Zhu et.al. | 2408.10154v2 | link |
2024-08-05 | CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2408.02394v1 | null |
2024-08-05 | MaFreeI2P: A Matching-Free Image-to-Point Cloud Registration Paradigm with Active Camera Pose Retrieval | Gongxin Yao et.al. | 2408.02392v1 | null |
2024-07-29 | Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning | Ray Zhang et.al. | 2407.20223v1 | null |
2024-07-24 | Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model | Lingjie Su et.al. | 2407.17183v1 | null |
2024-07-23 | SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration | Chien Erh Lin et.al. | 2407.16823v1 | link |
2024-07-19 | PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training | Suyi Chen et.al. | 2407.14054v1 | link |
2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537v2 | link |
2024-07-22 | Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems | Jianzhu Huai et.al. | 2407.11705v2 | null |
2024-07-14 | PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration | Runzhao Yao et.al. | 2407.10142v1 | link |
2024-07-13 | ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency | Shaocheng Yan et.al. | 2407.09862v1 | link |
2024-07-11 | BiEquiFormer: Bi-Equivariant Representations for Global Point Cloud Registration | Stefanos Pertigkiozoglou et.al. | 2407.08729v1 | null |
2024-07-10 | Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval | Shiqi Li et.al. | 2407.07525v1 | null |
2024-07-08 | SGOR: Outlier Removal by Leveraging Semantic and Geometric Information for Robust Point Cloud Registration | Guiyu Zhao et.al. | 2407.06297v1 | link |
2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597v1 | null |
2024-07-07 | GaussReg: Fast 3D Registration with Gaussian Splatting | Jiahao Chang et.al. | 2407.05254v1 | null |
2024-07-06 | Incremental Multiview Point Cloud Registration | Xiaoya Cheng et.al. | 2407.05021v1 | link |
2024-06-25 | Point Tree Transformer for Point Cloud Registration | Meiling Wang et.al. | 2406.17530v1 | null |
2024-06-17 | Correspondence Free Multivector Cloud Registration using Conformal Geometric Algebra | Francisco Xavier Vasconcelos et.al. | 2406.11732v1 | link |
2024-06-05 | L-PR: Exploiting LiDAR Fiducial Marker for Unordered Low Overlap Multiview Point Cloud Registration | Yibo Liu et.al. | 2406.03298v1 | link |
2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085v1 | null |
2024-05-26 | NV-LIO: LiDAR-Inertial Odometry using Normal Vectors Towards Robust SLAM in Multifloor Environments | Dongha Chung et.al. | 2405.12563v2 | link |
2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594v1 | null |
2024-05-10 | Benchmarking Classical and Learning-Based Multibeam Point Cloud Registration | Li Ling et.al. | 2405.06279v1 | link |
2024-05-09 | Rotation Initialization and Stepwise Refinement for Universal LiDAR Calibration | Yifan Duan et.al. | 2405.05589v1 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969v1 | null |
2024-05-06 | Deep Learning-based Point Cloud Registration for Augmented Reality-guided Surgery | Maximilian Weber et.al. | 2405.03314v1 | null |
2024-04-27 | FRAME: A Modular Framework for Autonomous Map-merging: Advancements in the Field | Nikolaos Stathoulopoulos et.al. | 2404.18006v1 | null |
2024-04-22 | PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer | Rui She et.al. | 2404.14034v1 | null |
2024-04-22 | A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning | Yu-Xin Zhang et.al. | 2404.13830v1 | link |
2024-04-09 | Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search | Tianyu Huang et.al. | 2404.06155v1 | link |
2024-04-08 | Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes | Yu Sheng et.al. | 2404.05164v1 | null |
2024-04-06 | Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes | Zhiyuan Yu et.al. | 2404.04557v1 | link |
2024-04-05 | A Ground Mobile Robot for Autonomous Terrestrial Laser Scanning-Based Field Phenotyping | Javier Rodriguez-Sanchez et.al. | 2404.04404v1 | null |
2024-04-01 | FPGA-Accelerated Correspondence-free Point Cloud Registration with PointNet Features | Keisuke Sugiura et.al. | 2404.01237v1 | null |
2024-03-28 | SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks | Yaxu Xie et.al. | 2403.19474v1 | link |
2024-03-26 | Global Point Cloud Registration Network for Large Transformations | Hanz Cuevas-Velasquez et.al. | 2403.18040v1 | null |
2024-03-28 | Exploring Accurate 3D Phenotyping in Greenhouse through Neural Radiance Fields | Junhong Zhao et.al. | 2403.15981v2 | null |
2024-03-15 | VRHCF: Cross-Source Point Cloud Registration via Voxel Representation and Hierarchical Correspondence Filtering | Guiyu Zhao et.al. | 2403.10085v1 | link |
2024-03-15 | MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings | Yu Du et.al. | 2403.09996v1 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990v1 | null |
2024-03-13 | FastMAC: Stochastic Spectral Sampling of Correspondence Graph | Yifei Zhang et.al. | 2403.08770v1 | link |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156v1 | link |
2024-03-10 | PSS-BA: LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2403.06124v1 | null |
2024-03-27 | Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension | Quan Liu et.al. | 2403.03532v2 | link |
2024-03-15 | RELEAD: Resilient Localization with Enhanced LiDAR Odometry in Adverse Environments | Zhiqiang Chen et.al. | 2402.18934v2 | null |
2024-02-28 | PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers | Seong Hun Lee et.al. | 2402.16598v2 | link |
2024-02-23 | CLIPPER+: A Fast Maximal Clique Algorithm for Robust Global Registration | Kaveh Fathian et.al. | 2402.15464v1 | link |
2024-02-11 | CLIPPER: Robust Data Association without an Initial Guess | Parker C. Lusk et.al. | 2402.07284v1 | null |
2024-02-08 | Tightly Coupled Range Inertial Localization on a 3D Prior Map Based on Sliding Window Factor Graph Optimization | Kenji Koide et.al. | 2402.05540v1 | null |
2024-01-16 | Registration of algebraic varieties using Riemannian optimization | Florentin Goyens et.al. | 2401.08562v1 | link |
2024-01-09 | Iterative Feedback Network for Unsupervised Point Cloud Registration | Yifan Xie et.al. | 2401.04357v1 | link |
2024-01-06 | PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations | Rui She et.al. | 2401.03167v1 | null |
2024-01-04 | OptFlow: Fast Optimization-based Scene Flow Estimation without Supervision | Rahul Ahuja et.al. | 2401.02550v1 | null |
2024-01-17 | Diff-PCR: Diffusion-Based Correspondence Searching in Doubly Stochastic Matrix Space for Point Cloud Registration | Qianliang Wu et.al. | 2401.00436v4 | null |
2023-12-22 | On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods | Anh Duc Nguyen et.al. | 2312.13970v2 | link |
2023-12-20 | D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer | Junjie Gao et.al. | 2312.12970v1 | null |
2023-12-14 | SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration | Kezheng Xiong et.al. | 2312.08664v1 | null |
2023-12-11 | PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration | Yue Wu et.al. | 2312.06063v1 | null |
2023-12-05 | DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration | Zhi Chen et.al. | 2312.03053v1 | null |
2023-12-08 | Zero-Shot Point Cloud Registration | Weijie Wang et.al. | 2312.03032v2 | null |
2023-12-05 | A Dynamic Network for Efficient Point Cloud Registration | Yang Ai et.al. | 2312.02877v1 | null |
2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593v1 | link |
2023-12-04 | Rotation-Invariant Rapid TRISO-Fueled Pebble Identification Based on Feature Matching and Point Cloud Registration | Ming Fang et.al. | 2312.02006v1 | null |
2023-12-27 | E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning | Xiuhong Lin et.al. | 2311.18433v2 | link |
2023-11-15 | Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change | Tao Sun et.al. | 2311.09346v1 | null |
2023-11-02 | Transformation Decoupling Strategy based on Screw Theory for Deterministic Point Cloud Registration with Gravity Prior | Xinyi Li et.al. | 2311.01432v1 | null |
2023-11-02 | Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration | Yifan Xie et.al. | 2311.01202v1 | link |
2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874v1 | null |
2023-10-27 | Do we need scan-matching in radar odometry? | Vladimír Kubelka et.al. | 2310.18117v1 | link |
2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359v1 | null |
2023-10-18 | DBDNet:Partial-to-Partial Point Cloud Registration with Dual Branches Decoupling | Shiqi Li et.al. | 2310.11733v1 | null |
2023-10-15 | OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer | Junjie Gao et.al. | 2310.09817v1 | null |
2023-10-09 | FeatSense -- A Feature-based Registration Algorithm with GPU-accelerated TSDF-Mapping Backend for NVIDIA Jetson Boards | Julian Gaal et.al. | 2310.05766v1 | link |
2023-10-09 | Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration | Chunge Bai et.al. | 2310.05504v1 | link |
2023-10-06 | Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching | Shiquan Yi et.al. | 2310.04162v1 | link |
2023-10-05 | FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators | Haiping Wang et.al. | 2310.03420v1 | link |
2023-10-02 | COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry | Patrick Pfreundschuh et.al. | 2310.01235v1 | link |
2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023v1 | null |
2023-09-27 | Partial Transport for Point-Cloud Registration | Yikun Bai et.al. | 2309.15787v1 | null |
2023-09-27 | KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping | Renlang Huang et.al. | 2309.15394v1 | null |
2023-09-26 | CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration | Shuhao Kang et.al. | 2309.14660v1 | null |
2023-09-20 | AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration | Zheng Dang et.al. | 2309.11170v1 | null |
2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436v1 | link |
2023-09-17 | Hamiltonian Dynamics Learning from Point Cloud Observations for Nonholonomic Mobile Robot Control | Abdullah Altawaitan et.al. | 2309.09163v1 | link |
2023-09-16 | FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization | Nan Ma et.al. | 2309.08966v1 | null |
2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914v1 | link |
2023-09-15 | A Ground Segmentation Method Based on Point Cloud Map for Unstructured Roads | Zixuan Li et.al. | 2309.08164v1 | null |
2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086v1 | null |
2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471v1 | link |
2023-09-12 | SGFeat: Salient Geometric Feature for Point Cloud Registration | Qianliang Wu et.al. | 2309.06207v1 | null |
2023-09-01 | Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning | Ahmed Hatem et.al. | 2308.16481v2 | null |
2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411v1 | null |
2023-08-18 | DReg-NeRF: Deep Registration for Neural Radiance Fields | Yu Chen et.al. | 2308.09386v1 | link |
2023-08-18 | Overlap Bias Matching is Necessary for Point Cloud Registration | Pengcheng Shi et.al. | 2308.09364v1 | null |
2023-08-10 | Deep Semantic Graph Matching for Large-scale Outdoor Point Clouds Registration | Shaocong Liu et.al. | 2308.05314v1 | null |
2023-08-09 | PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration | Mingzhi Yuan et.al. | 2308.04782v1 | link |
2023-07-25 | GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer | Zheng Qin et.al. | 2308.03768v1 | link |
2023-07-26 | One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration | Yongzhe Yuan et.al. | 2307.14019v1 | null |
2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116v1 | link |
2023-09-12 | ELiOT : End-to-end Lidar Odometry using Transformer Framework | Daegyu Lee et.al. | 2307.11998v4 | null |
2023-08-08 | Density-invariant Features for Distant Point Cloud Registration | Quan Liu et.al. | 2307.09788v2 | link |
2023-07-18 | SphereNet: Learning a Noise-Robust and General Descriptor for Point Cloud Registration | Guiyu Zhao et.al. | 2307.09351v1 | null |
2023-07-14 | CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2307.07142v1 | null |
2023-07-11 | Exact Point Cloud Downsampling for Fast and Accurate Global Trajectory Optimization | Kenji Koide et.al. | 2307.02948v2 | link |
2023-07-03 | Direct Superpoints Matching for Fast and Robust Point Cloud Registration | Aniket Gupta et.al. | 2307.01362v1 | link |
2023-07-04 | A denoised Mean Teacher for domain adaptive point cloud registration | Alexander Bigalke et.al. | 2306.14749v2 | link |
2023-06-20 | End-to-end 2D-3D Registration between Image and LiDAR Point Cloud for Vehicle Localization | Guangming Wang et.al. | 2306.11346v1 | null |
2023-06-14 | ICET Online Accuracy Characterization for Geometry-Based Laser Scan Matching | Matthew McDermott et.al. | 2306.08690v1 | link |
2023-06-12 | Volume-DROID: A Real-Time Implementation of Volumetric Mapping with DROID-SLAM | Peter Stratton et.al. | 2306.06850v1 | link |
2023-06-11 | PWR-Align: Leveraging Part-Whole Relationships for Part-wise Rigid Point Cloud Registration in Mixed Reality Applications | Manorama Jha et.al. | 2306.06717v1 | null |
2023-06-07 | Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks | Sara Monji-Azad et.al. | 2306.04701v1 | null |
2023-05-23 | Cross-source Point Cloud Registration: Challenges, Progress and Prospects | Xiaoshui Huang et.al. | 2305.13570v1 | null |
2023-05-19 | Efficient and Deterministic Search Strategy Based on Residual Projections for Point Cloud Registration | Xinyi Li et.al. | 2305.11716v1 | null |
2023-05-18 | 3D Registration with Maximal Cliques | Xiyu Zhang et.al. | 2305.10854v1 | link |
2023-05-05 | HD2Reg: Hierarchical Descriptors and Detectors for Point Cloud Registration | Canhui Tang et.al. | 2305.03487v1 | link |
2023-05-08 | APR: Online Distant Point Cloud Registration Through Aggregated Point Cloud Reconstruction | Quan Liu et.al. | 2305.02893v2 | link |
2023-04-27 | RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object | Shiyu Xing et.al. | 2304.14092v1 | link |
2023-04-26 | Non-rigid Point Cloud Registration for Middle Ear Diagnostics with Endoscopic Optical Coherence Tomography | Peng Liu et.al. | 2304.13618v1 | link |
2023-04-25 | BO-ICP: Initialization of Iterative Closest Point Based on Bayesian Optimization | Harel Biggie et.al. | 2304.13114v1 | link |
2023-04-18 | SDFReg: Learning Signed Distance Functions for Point Cloud Registration | Leida Zhang et.al. | 2304.08929v1 | null |
2023-04-12 | SiLK -- Simple Learned Keypoints | Pierre Gleize et.al. | 2304.06194v1 | link |
2023-04-11 | TT-SDF2PC: Registration of Point Cloud and Compressed SDF Directly in the Memory-Efficient Tensor Train Domain | Alexey I. Boyko et.al. | 2304.05342v1 | null |
2023-04-10 | HybridFusion: LiDAR and Vision Cross-Source Point Cloud Fusion | Yu Wang et.al. | 2304.04508v1 | null |
2023-04-09 | Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos | Shiyang Lu et.al. | 2304.04325v1 | null |
2023-04-09 | DSMNet: Deep High-precision 3D Surface Modeling from Sparse Point Cloud Frames | Changjie Qiu et.al. | 2304.04200v1 | null |
2023-04-02 | Robust Multiview Point Cloud Registration with Reliable Pose Graph Initialization and History Reweighting | Haiping Wang et.al. | 2304.00467v1 | link |
2023-03-31 | kNN-Res: Residual Neural Network with kNN-Graph coherence for point cloud registration | Muhammad S. Battikh et.al. | 2304.00050v1 | link |
2023-03-31 | RDMNet: Reliable Dense Matching Based Point Cloud Registration for Autonomous Driving | Chenghao Shi et.al. | 2303.18084v1 | null |
2023-04-23 | HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching | Yiheng Li et.al. | 2303.16526v2 | link |
2023-03-27 | Learnable Graph Matching: A Practical Paradigm for Data Association | Jiawei He et.al. | 2303.15414v1 | link |
2023-03-23 | Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration | Guofeng Mei et.al. | 2303.13290v1 | link |
2023-03-22 | RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration | Jiuming Liu et.al. | 2303.12384v1 | link |
2023-03-17 | Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration | Zheng Qin et.al. | 2303.09950v1 | link |
2023-03-14 | RoCNet: 3D Robust Registration of Point-Clouds using Deep Learning | Karim Slimani et.al. | 2303.07963v1 | null |
2023-03-07 | GMCR: Graph-based Maximum Consensus Estimation for Point Cloud Registration | Michael Gentner et.al. | 2303.04032v1 | null |
2023-03-02 | Neural Intrinsic Embedding for Non-rigid Point Cloud Matching | Puhua Jiang et.al. | 2303.01038v1 | null |
2023-03-14 | A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation | Lin Li et.al. | 2302.14511v2 | link |
2023-02-28 | PCR-CG: Point Cloud Registration via Deep Color and Geometry | Yu Zhang et.al. | 2302.14418v1 | link |
2023-02-28 | Efficient Implicit Neural Reconstruction Using LiDAR | Dongyu Yan et.al. | 2302.14363v1 | link |
2023-02-25 | Accurate Gaussian Process Distance Fields with applications to Echolocation and Mapping | Cedric Le Gentil et.al. | 2302.13005v1 | null |
2023-02-14 | Point Cloud Registration for LiDAR and Photogrammetric Data: a Critical Synthesis and Performance Analysis on Classic and Deep Learning Algorithms | Ningli Xu et.al. | 2302.07184v1 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-12-02 | The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs | Christina Kassab et.al. | 2412.01539v1 | null |
2024-11-30 | Density-aware Global-Local Attention Network for Point Cloud Segmentation | Chade Li et.al. | 2412.00489v1 | null |
2024-11-28 | Textured As-Is BIM via GIS-informed Point Cloud Segmentation | Mohamed S. H. Alabassy et.al. | 2411.18898v1 | null |
2024-11-27 | Towards Cross-device and Training-free Robotic Grasping in 3D Open World | Weiguang Zhao et.al. | 2411.18133v1 | null |
2024-11-20 | BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation | Umamaheswaran Raman Kumar et.al. | 2411.13251v1 | null |
2024-11-13 | Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model | Yutao Shen et.al. | 2411.08453v1 | null |
2024-11-13 | Multiscale Graph Construction Using Non-local Cluster Features | Reina Kaneko et.al. | 2411.08371v1 | null |
2024-10-30 | Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification | Pengkun Liu et.al. | 2410.23105v1 | null |
2024-11-03 | Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2410.22489v2 | null |
2024-10-28 | Exploring contextual modeling with linear complexity for point cloud segmentation | Yong Xien Chng et.al. | 2410.21211v1 | null |
2024-10-14 | Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies | Yanjie Ze et.al. | 2410.10803v1 | link |
2024-10-09 | Evaluating the Impact of Point Cloud Colorization on Semantic Segmentation Accuracy | Qinfeng Zhu et.al. | 2410.06725v1 | null |
2024-09-24 | Underground Mapping and Localization Based on Ground-Penetrating Radar | Jinchang Zhang et.al. | 2409.16446v1 | null |
2024-09-22 | Lidar Panoptic Segmentation in an Open World | Anirudh S Chakravarthy et.al. | 2409.14273v1 | link |
2024-09-03 | When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels | Yifan Liu et.al. | 2409.01691v1 | null |
2024-09-03 | Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation | Haodong Wang et.al. | 2409.01662v1 | null |
2024-08-29 | Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment | Liyao Tang et.al. | 2408.16520v1 | link |
2024-08-21 | GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation | Abiao Li et.al. | 2408.11558v1 | link |
2024-08-02 | Trainable Pointwise Decoder Module for Point Cloud Segmentation | Bike Chen et.al. | 2408.01548v1 | null |
2024-07-31 | Fine-grained Metrics for Point Cloud Semantic Segmentation | Zhuheng Lu et.al. | 2407.21289v1 | null |
2024-07-19 | Scale Disparity of Instances in Interactive Point Cloud Segmentation | Chenrui Han et.al. | 2407.14009v1 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761v1 | null |
2024-07-17 | Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation | Ruijie Xu et.al. | 2407.12489v1 | link |
2024-07-17 | HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation | Tianpei Zou et.al. | 2407.12387v1 | link |
2024-07-17 | Serialized Point Mamba: A Serialized Point Cloud Mamba Segmentation Model | Tao Wang et.al. | 2407.12319v1 | null |
2024-07-12 | Uplifting Range-View-based 3D Semantic Segmentation in Real-Time with Multi-Sensor Fusion | Shiqi Tan et.al. | 2407.09697v1 | null |
2024-07-01 | fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Francis Williams et.al. | 2407.01781v1 | null |
2024-06-25 | Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model | Zhuoyuan Li et.al. | 2406.17442v1 | null |
2024-08-04 | Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes | Yong-Qiang Mao et.al. | 2405.19735v2 | null |
2024-05-24 | 3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving | Boyi Sun et.al. | 2405.15286v1 | link |
2024-05-25 | Filling Missing Values Matters for Range Image-Based Point Cloud Segmentation | Bike Chen et.al. | 2405.10175v2 | null |
2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699v1 | link |
2024-04-04 | OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views | Francis Engelmann et.al. | 2404.03650v1 | null |
2024-03-28 | RiEMann: Near Real-Time SE(3)-Equivariant Robot Manipulation without Point Cloud Segmentation | Chongkai Gao et.al. | 2403.19460v1 | null |
2024-05-30 | CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation | Guoyang Zhao et.al. | 2403.16794v2 | link |
2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317v1 | null |
2024-03-11 | 3DRef: 3D Dataset and Benchmark for Reflection Detection in RGB and Lidar Data | Xiting Zhao et.al. | 2403.06538v1 | null |
2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | Peng Zhang et.al. | 2403.06401v1 | null |
2024-03-03 | Region-Transformer: Self-Attention Region Based Class-Agnostic Point Cloud Segmentation | Dipesh Gyawali et.al. | 2403.01407v1 | null |
2024-01-29 | Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation | Jie Liu et.al. | 2401.16051v1 | link |
2024-01-19 | Symbol as Points: Panoptic Symbol Spotting via Point-based Representation | Wenlong Liu et.al. | 2401.10556v1 | link |
2023-12-29 | Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation | Xiawei Li et.al. | 2312.16578v2 | link |
2023-12-19 | Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case Study on Urban Areas | Alperen Enes Bayar et.al. | 2312.11880v1 | null |
2023-12-15 | T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning | Weijie Wei et.al. | 2312.10217v1 | link |
2023-12-14 | FAPP: Fast and Adaptive Perception and Planning for UAVs in Dynamic Cluttered Environments | Minghao Lu et.al. | 2312.08743v1 | null |
2023-12-12 | Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation | Yuanbin Wang et.al. | 2312.07221v1 | null |
2023-12-11 | Densify Your Labels: Unsupervised Clustering with Bipartite Matching for Weakly Supervised Point Cloud Segmentation | Shaobo Xia et.al. | 2312.06799v1 | null |
2024-01-15 | Provable Adversarial Robustness for Group Equivariant Tasks: Graphs, Point Clouds, Molecules, and More | Jan Schuchardt et.al. | 2312.02708v2 | null |
2023-11-24 | OneFormer3D: One Transformer for Unified Point Cloud Segmentation | Maxim Kolodiazhnyi et.al. | 2311.14405v1 | null |
2023-11-18 | DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields | Yu Chi et.al. | 2311.12063v1 | link |
2023-11-10 | U3DS |
Jiaxu Liu et.al. | 2311.06018v1 | null |
2023-11-06 | Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation | Shichao Dong et.al. | 2311.01989v2 | null |
2023-10-19 | 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision | Cheng-Kun Yang et.al. | 2310.12817v1 | null |
2023-10-11 | PointHR: Exploring High-Resolution Architectures for 3D Point Cloud Segmentation | Haibo Qiu et.al. | 2310.07743v1 | link |
2023-09-26 | Addressing Data Misalignment in Image-LiDAR Fusion on Point Cloud Segmentation | Wei Jong Yang et.al. | 2309.14932v1 | null |
2023-09-20 | Towards Robust Few-shot Point Cloud Semantic Segmentation | Yating Xu et.al. | 2309.11228v1 | link |
2023-09-20 | Generalized Few-Shot Point Cloud Segmentation Via Geometric Words | Yating Xu et.al. | 2309.11222v1 | link |
2023-08-29 | Compositional Semantic Mix for Domain Adaptation in Point Cloud Segmentation | Cristiano Saltori et.al. | 2308.14619v2 | link |
2023-08-22 | Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation | Zongyi Xu et.al. | 2308.11166v1 | link |
2023-08-14 | Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid | Alexander Kyuroson et.al. | 2308.07283v1 | null |
2023-08-08 | Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement | Zhenhua Ning et.al. | 2308.03177v2 | link |
2023-07-31 | pCTFusion: Point Convolution-Transformer Fusion with Semantic Aware Loss for Outdoor LiDAR Point Cloud Segmentation | Abhishek Kuriyal et.al. | 2307.14777v2 | link |
2023-07-27 | Clustering based Point Cloud Representation Learning for 3D Analysis | Tuo Feng et.al. | 2307.14605v1 | link |
2023-07-20 | See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data | Yuhang Lu et.al. | 2307.10782v1 | null |
2023-07-14 | Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar | Runwei Guan et.al. | 2307.07102v1 | link |
2023-07-08 | BPNet: Bézier Primitive Segmentation on 3D Point Clouds | Rao Fu et.al. | 2307.04013v1 | link |
2023-06-28 | Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction | Athrva Atul Pandhare et.al. | 2306.16306v1 | null |
2023-05-30 | Dynamic Clustering Transformer Network for Point Cloud Segmentation | Dening Lu et.al. | 2306.08073v1 | null |
2023-05-23 | Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation | Shuting He et.al. | 2305.14335v1 | link |
2023-05-22 | Contrastive Predictive Autoencoders for Dynamic Point Cloud Self-Supervised Learning | Xiaoxiao Sheng et.al. | 2305.12959v1 | null |
2023-05-17 | Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences | Ahmed J. Afifi et.al. | 2305.09928v1 | null |
2023-05-08 | OctFormer: Octree-based Transformers for 3D Point Clouds | Peng-Shuai Wang et.al. | 2305.03045v2 | link |
2023-05-22 | Urban GeoBIM construction by integrating semantic LiDAR point clouds with as-designed BIM models | Jie Shao et.al. | 2304.11719v2 | null |
2023-04-22 | Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic Segmentation | Feng Jiang et.al. | 2304.11393v1 | link |
2023-06-02 | Transformer-Based Visual Segmentation: A Survey | Xiangtai Li et.al. | 2304.09854v2 | link |
2023-04-11 | Feature-assisted interactive geometry reconstruction in 3D point clouds using incremental region growing | Attila Szabo et.al. | 2304.05109v1 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2025-01-03 | GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models | Zhangyang Qi et.al. | 2501.01428v2 | null |
2025-01-02 | VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Yuanpeng Tu et.al. | 2501.01427v1 | null |
2025-01-02 | Unifying Specialized Visual Encoders for Video Language Models | Jihoon Chung et.al. | 2501.01426v1 | null |
2025-01-03 | AdaptVC: High Quality Voice Conversion with Adaptive Learning | Jaehun Kim et.al. | 2501.01347v2 | null |
2025-01-02 | Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers? | Manuel Weber et.al. | 2501.01256v1 | null |
2025-01-02 | Automated Self-Refinement and Self-Correction for LLM-based Product Attribute Value Extraction | Alexander Brinkmann et.al. | 2501.01237v1 | null |
2025-01-02 | Symmetries-enhanced Multi-Agent Reinforcement Learning | Nikolaos Bousias et.al. | 2501.01136v1 | null |
2025-01-03 | MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization | Haina Zhu et.al. | 2501.01108v2 | null |
2025-01-02 | Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice | Federico Ravenda et.al. | 2501.00982v1 | null |
2025-01-01 | Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model | Chenyang Liu et.al. | 2501.00895v1 | null |
2024-12-30 | QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing | Shlomo Kashani et.al. | 2412.20956v1 | null |
2024-12-30 | Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding | Liuzhenghao Lv et.al. | 2412.20888v1 | link |
2024-12-30 | TimeRAF: Retrieval-Augmented Foundation model for Zero-shot Time Series Forecasting | Huanyu Zhang et.al. | 2412.20810v1 | null |
2024-12-30 | Learning to Rank Pre-trained Vision-Language Models for Downstream Tasks | Yuhe Ding et.al. | 2412.20682v1 | null |
2024-12-29 | Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond) | Tomer Garber et.al. | 2412.20596v1 | null |
2024-12-27 | Zero-shot Hazard Identification in Autonomous Driving: A Case Study on the COOOL Benchmark | Lukas Picek et.al. | 2412.19944v1 | null |
2024-12-27 | EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs | Daniil A. Berdyshev et.al. | 2412.19725v1 | link |
2024-12-30 | VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models | Tao Wu et.al. | 2412.19645v2 | null |
2024-12-27 | MINIMA: Modality Invariant Image Matching | Xingyu Jiang et.al. | 2412.19412v1 | link |
2024-12-26 | Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | Ziang Yan et.al. | 2412.19326v1 | link |
2024-12-26 | RecLM: Recommendation Instruction Tuning | Yangqin Jiang et.al. | 2412.19302v1 | null |
2024-12-26 | Time Series Foundational Models: Their Role in Anomaly Detection and Prediction | Chathurangi Shyalika et.al. | 2412.19286v1 | link |
2024-12-26 | Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval | Yang Du et.al. | 2412.19178v1 | link |
2024-12-26 | CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting | Siyu Jiao et.al. | 2412.19142v1 | null |
2024-12-26 | Semantic Residual for Multimodal Unified Discrete Representation | Hai Huang et.al. | 2412.19128v1 | null |
2024-12-26 | Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing | Inpyo Hong et.al. | 2412.19125v1 | link |
2024-12-24 | Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models | Zehan Wang et.al. | 2412.18605v1 | null |
2024-12-24 | ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation | Hongjie Li et.al. | 2412.18600v1 | null |
2024-12-24 | Distilling Fine-grained Sentiment Understanding from Large Language Models | Yice Zhang et.al. | 2412.18552v1 | link |
2024-12-24 | The Key of Understanding Vision Tasks: Explanatory Instructions | Yang Shen et.al. | 2412.18525v1 | link |
2024-12-24 | Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English | Avinash Anand et.al. | 2412.18415v1 | link |
2024-12-24 | Extract Free Dense Misalignment from CLIP | JeongYeon Nam et.al. | 2412.18404v1 | null |
2024-12-24 | A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction | Stefano Damiano et.al. | 2412.18348v1 | link |
2024-12-24 | Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model | Yushu Li et.al. | 2412.18303v1 | null |
2024-12-24 | Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight | Xi Ding et.al. | 2412.18298v1 | link |
2024-12-24 | Improved Feature Generating Framework for Transductive Zero-shot Learning | Zihan Ye et.al. | 2412.18282v1 | null |
2024-12-23 | CiteBART: Learning to Generate Citations for Local Citation Recommendation | Ege Yiğit Çelik et.al. | 2412.17534v1 | link |
2024-12-23 | Multiple Consistency-guided Test-Time Adaptation for Contrastive Audio-Language Models with Unlabeled Audio | Gongyu Chen et.al. | 2412.17306v1 | null |
2024-12-23 | Discriminative Image Generation with Diffusion Models for Zero-Shot Learning | Dingjie Fu et.al. | 2412.17219v1 | null |
2024-12-22 | Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis | Ye-Xin Lu et.al. | 2412.16977v1 | null |
2024-12-22 | Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation | Quan Dao et.al. | 2412.16906v1 | null |
2024-12-22 | Autoregressive Speech Synthesis with Next-Distribution Prediction | Xinfa Zhu et.al. | 2412.16846v1 | null |
2024-12-21 | RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing | Zhipeng Huang et.al. | 2412.16778v1 | null |
2024-12-21 | HyperCLIP: Adapting Vision-Language models with Hypernetworks | Victor Akinwande et.al. | 2412.16777v1 | null |
2024-12-21 | Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval | Luo Ji et.al. | 2412.16615v1 | null |
2024-12-21 | Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling | Daichi Yashima et.al. | 2412.16576v1 | link |
2024-12-20 | Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts | Muhammad Abdullah Sohail et.al. | 2412.16119v1 | link |
2024-12-20 | CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Songhua Liu et.al. | 2412.16112v1 | link |
2024-12-20 | Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers | Yifan Yang et.al. | 2412.16102v1 | null |
2024-12-20 | Fearful Falcons and Angry Llamas: Emotion Category Annotations of Arguments by Humans and LLMs | Lynn Greschner et.al. | 2412.15993v1 | null |
2024-12-20 | Watertox: The Art of Simplicity in Universal Attacks A Cross-Model Framework for Robust Adversarial Generation | Zhenghao Gao et.al. | 2412.15924v1 | null |
2024-12-20 | On the Suitability of pre-trained foundational LLMs for Analysis in German Legal Education | Lorenz Wendlinger et.al. | 2412.15902v1 | null |
2024-12-20 | AutoLife: Automatic Life Journaling with Smartphones and LLMs | Huatao Xu et.al. | 2412.15714v1 | null |
2024-12-20 | Cracking the Code: Evaluating Zero-Shot Prompting Methods for Providing Programming Feedback | Niklas Ippisch et.al. | 2412.15702v1 | null |
2024-12-20 | SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training | Wenxi Chen et.al. | 2412.15649v1 | null |
2024-12-20 | A New Method to Capturing Compositional Knowledge in Linguistic Space | Jiahe Wan et.al. | 2412.15632v1 | null |
2024-12-19 | Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings | Daniel Russo et.al. | 2412.15189v1 | link |
2024-12-19 | STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning | Marius Memmel et.al. | 2412.15182v1 | null |
2024-12-19 | Adaptive Pruning for Large Language Models with Structural Importance Awareness | Haotian Zheng et.al. | 2412.15127v1 | null |
2024-12-19 | Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling | Leying Zhang et.al. | 2412.14890v1 | null |
2024-12-19 | Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data | Shuang Li et.al. | 2412.14873v1 | link |
2024-12-19 | Extending TWIG: Zero-Shot Predictive Hyperparameter Selection for KGEs based on Graph Structure | Jeffrey Sardina et.al. | 2412.14801v1 | null |
2024-12-19 | Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning | Kepu Zhang et.al. | 2412.14588v1 | null |
2024-12-19 | MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | Junjie Zhou et.al. | 2412.14475v1 | null |
2024-12-19 | WildSAT: Learning Satellite Image Representations from Wildlife Observations | Rangel Daroya et.al. | 2412.14428v1 | null |
2024-12-18 | I0T: Embedding Standardization Method Towards Zero Modality Gap | Na Min An et.al. | 2412.14384v1 | link |
2024-12-18 | Autoregressive Video Generation without Vector Quantization | Haoge Deng et.al. | 2412.14169v1 | link |
2024-12-18 | Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation | Jianyu Zhang et.al. | 2412.14145v1 | null |
2024-12-18 | Foundation Models Meet Low-Cost Sensors: Test-Time Adaptation for Rescaling Disparity for Zero-Shot Metric Depth Estimation | Rémi Marsal et.al. | 2412.14103v1 | null |
2024-12-18 | FarExStance: Explainable Stance Detection for Farsi | Majid Zarharan et.al. | 2412.14008v1 | link |
2024-12-18 | Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition | Ethan Baron et.al. | 2412.13947v1 | null |
2024-12-18 | Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer | Xinyuan Shao et.al. | 2412.13908v1 | link |
2024-12-18 | Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models | Anna Scius-Bertrand et.al. | 2412.13859v1 | null |
2024-12-18 | SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor | Chenyu Yang et.al. | 2412.13786v1 | null |
2024-12-18 | G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o | Tony Cheng Tong et.al. | 2412.13647v1 | link |
2024-12-18 | Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking | Zhengfei Xu et.al. | 2412.13614v1 | null |
2024-12-17 | GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding | Haoyi Jiang et.al. | 2412.13193v1 | link |
2024-12-17 | A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis | Xiao Zhou et.al. | 2412.13126v1 | null |
2024-12-17 | Enabling Low-Resource Language Retrieval: Establishing Baselines for Urdu MS MARCO | Umer Butt et.al. | 2412.12997v1 | null |
2024-12-17 | An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | Shreeyash Gowaikar et.al. | 2412.12898v1 | null |
2024-12-17 | Question: How do Large Language Models perform on the Question Answering tasks? Answer: | Kevin Fischer et.al. | 2412.12893v1 | null |
2024-12-17 | MIVE: New Design and Benchmark for Multi-Instance Video Editing | Samuel Teodoro et.al. | 2412.12877v1 | null |
2024-12-17 | Comparative Analysis of Zero-Shot Capability of Time-Series Foundation Models in Short-Term Load Prediction | Nan Lin et.al. | 2412.12834v1 | null |
2024-12-17 | FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering | Zheng Cheng et.al. | 2412.12833v1 | null |
2024-12-17 | Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages | Robert Litschko et.al. | 2412.12806v1 | null |
2024-12-17 | ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation | Shiqi Huang et.al. | 2412.12798v1 | link |
2024-12-16 | Causal Diffusion Transformers for Generative Modeling | Chaorui Deng et.al. | 2412.12095v1 | link |
2024-12-16 | CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Yuxuan Sun et.al. | 2412.12077v1 | null |
2024-12-16 | A LoRA is Worth a Thousand Pictures | Chenxi Liu et.al. | 2412.12048v1 | null |
2024-12-16 | Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps | Linfeng Zhao et.al. | 2412.12024v1 | null |
2024-12-16 | Cost-Effective Label-free Node Classification with LLMs | Taiyan Zhang et.al. | 2412.11983v1 | null |
2024-12-16 | Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning | Yuti Liu et.al. | 2412.11952v1 | null |
2024-12-16 | Stepwise Reasoning Error Disruption Attack of LLMs | Jingyu Peng et.al. | 2412.11934v1 | null |
2024-12-16 | PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection | Sepideh Mamooler et.al. | 2412.11923v1 | null |
2024-12-16 | Improved Models for Media Bias Detection and Subcategorization | Tim Menzner et.al. | 2412.11835v1 | null |
2024-12-16 | A Distributed Collaborative Retrieval Framework Excelling in All Queries and Corpora based on Zero-shot Rank-Oriented Automatic Evaluation | Tian-Yi Che et.al. | 2412.11832v1 | null |
2024-12-13 | UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities | Muhammad Uzair Khattak et.al. | 2412.10372v1 | link |
2024-12-13 | Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media | Jiaqing Yuan et.al. | 2412.10266v1 | null |
2024-12-13 | Efficient Generative Modeling with Residual Vector Quantization-Based Tokens | Jaehyeon Kim et.al. | 2412.10208v1 | null |
2024-12-13 | Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments | Kehan Chen et.al. | 2412.10137v1 | null |
2024-12-13 | Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data | Jonas Golde et.al. | 2412.10121v1 | null |
2024-12-13 | Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP | Yating Yu et.al. | 2412.09895v1 | link |
2024-12-13 | CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection | Qibo Chen et.al. | 2412.09799v1 | null |
2024-12-12 | Toward Foundation Model for Multivariate Wearable Sensing of Physiological Signals | Yunfei Luo et.al. | 2412.09758v1 | link |
2024-12-12 | Should We Learn Contact-Rich Manipulation Policies from Sampling-Based Planners? | Huaijiang Zhu et.al. | 2412.09743v1 | null |
2024-12-12 | TransferLight: Zero-Shot Traffic Signal Control on any Road-Network | Johann Schmidt et.al. | 2412.09719v1 | null |
2024-12-12 | EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM | Zhuofan Zong et.al. | 2412.09618v1 | null |
2024-12-12 | Learning to Adapt: Bio-Inspired Gait Strategies for Versatile Quadruped Locomotion | Joseph Humphreys et.al. | 2412.09440v1 | null |
2024-12-12 | Distribution free uncertainty quantification in neuroscience-inspired deep operators | Shailesh Garg et.al. | 2412.09369v1 | null |
2024-12-12 | Towards Open-Vocabulary Video Semantic Segmentation | Xinhao Li et.al. | 2412.09329v1 | link |
2024-12-12 | T-SVG: Text-Driven Stereoscopic Video Generation | Qiao Jin et.al. | 2412.09323v1 | null |
2024-12-12 | Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Xiaoshuang Huang et.al. | 2412.09278v1 | link |
2024-12-12 | Pinpoint Counterfactuals: Reducing social bias in foundation models via localized counterfactual generation | Kirill Sirotkin et.al. | 2412.09160v1 | null |
2024-12-12 | Evaluating Pixel Language Models on Non-Standardized Languages | Alberto Muñoz-Ortiz et.al. | 2412.09084v1 | null |
2024-12-12 | Cross-View Completion Models are Zero-shot Correspondence Estimators | Honggyu An et.al. | 2412.09072v1 | null |
2024-12-13 | An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques | Chunxiao Li et.al. | 2412.09063v2 | null |
2024-12-11 | RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation | Mingfei Han et.al. | 2412.08591v1 | null |
2024-12-11 | SenCLIP: Enhancing zero-shot land-use mapping for Sentinel-2 with ground-level prompting | Pallavi Jain et.al. | 2412.08536v1 | link |
2024-12-11 | SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation | Tapas Kumar Dutta et.al. | 2412.08482v1 | null |
2024-12-11 | Assessing Personalized AI Mentoring with Large Language Models in the Computing Field | Xiao Luo et.al. | 2412.08430v1 | null |
2024-12-11 | Zero-Shot Mono-to-Binaural Speech Synthesis | Alon Levkovitch et.al. | 2412.08356v1 | null |
2024-12-11 | BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language | Nikolay Banar et.al. | 2412.08329v1 | null |
2024-12-11 | Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion | Bingzhi Shen et.al. | 2412.08315v1 | null |
2024-12-11 | 2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset | Marta R. Costa-jussà et.al. | 2412.08274v1 | null |
2024-12-11 | Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field | Tanay Aggarwal et.al. | 2412.08258v1 | link |
2024-12-11 | Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? | Zihao Li et.al. | 2412.08174v1 | null |
2024-12-10 | Video Motion Transfer with Diffusion Transformers | Alexander Pondaven et.al. | 2412.07776v1 | link |
2024-12-10 | From Slow Bidirectional to Fast Causal Video Generators | Tianwei Yin et.al. | 2412.07772v1 | null |
2024-12-11 | Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting | Zetong Yang et.al. | 2412.07768v2 | null |
2024-12-10 | SAT: Spatial Aptitude Training for Multimodal Language Models | Arijit Ray et.al. | 2412.07755v1 | null |
2024-12-10 | Zero-Shot ATC Coding with Large Language Models for Clinical Assessments | Zijian Chen et.al. | 2412.07743v1 | null |
2024-12-10 | DriveMM: All-in-One Large Multimodal Model for Autonomous Driving | Zhijian Huang et.al. | 2412.07689v1 | link |
2024-12-10 | Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions | Anant Prakash Awasthi et.al. | 2412.07687v1 | null |
2024-12-10 | FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing | Yingying Deng et.al. | 2412.07517v1 | link |
2024-12-10 | ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning | Hongshu Guo et.al. | 2412.07507v1 | null |
2024-12-10 | Bilingual BSARD: Extending Statutory Article Retrieval to Dutch | Ehsan Lotfi et.al. | 2412.07462v1 | null |
2024-12-09 | Visual Lexicon: Rich Image Features in Language Space | XuDong Wang et.al. | 2412.06774v1 | null |
2024-12-09 | JAPAGEN: Efficient Few/Zero-shot Learning via Japanese Training Dataset Generation with LLM | Takuro Fujii et.al. | 2412.06738v1 | link |
2024-12-09 | You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale | Baorui Ma et.al. | 2412.06699v1 | link |
2024-12-09 | Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation | Shun Zhang et.al. | 2412.06664v1 | null |
2024-12-09 | LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation | Haihang Wu et.al. | 2412.06419v1 | null |
2024-12-09 | Continual Learning for Segment Anything Model Adaptation | Jinglong Yang et.al. | 2412.06418v1 | link |
2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292v1 | null |
2024-12-09 | No Annotations for Object Detection in Art through Stable Diffusion | Patrick Ramos et.al. | 2412.06286v1 | link |
2024-12-09 | DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction | Yunheng Li et.al. | 2412.06244v1 | null |
2024-12-09 | Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings | Zhao Liu et.al. | 2412.06134v1 | null |
2024-12-06 | DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo | Junzhe Zhu et.al. | 2412.05268v1 | null |
2024-12-06 | Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization | Luca Masserano et.al. | 2412.05244v1 | null |
2024-12-06 | Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization | Samuel Schapiro et.al. | 2412.05169v1 | null |
2024-12-06 | A Practical Examination of AI-Generated Text Detectors for Large Language Models | Brian Tufts et.al. | 2412.05139v1 | null |
2024-12-06 | Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale? | Seyed Amin Tabatabaei et.al. | 2412.05137v1 | null |
2024-12-06 | The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation | Ruoyu Wang et.al. | 2412.05101v1 | null |
2024-12-06 | HOLa: HoloLens Object Labeling | Michael Schwimmbeck et.al. | 2412.04945v1 | link |
2024-12-06 | Xiaojie Yin et.al. | 2412.04925v1 | null | |
2024-12-06 | StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching | Jixun Yao et.al. | 2412.04724v1 | null |
2024-12-06 | LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs | Xuan Chen et.al. | 2412.04690v1 | null |
2024-12-05 | Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail | Luca Bartolomei et.al. | 2412.04472v1 | link |
2024-12-05 | Grounding Descriptions in Images informs Zero-Shot Visual Recognition | Shaunak Halbe et.al. | 2412.04429v1 | link |
2024-12-05 | SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding | Rong Li et.al. | 2412.04383v1 | null |
2024-12-05 | Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting | Edoardo Cetin et.al. | 2412.04368v1 | null |
2024-12-05 | Towards Zero-shot 3D Anomaly Localization | Yizhou Wang et.al. | 2412.04304v1 | null |
2024-12-05 | 3D Part Segmentation via Geometric Aggregation of 2D Visual Features | Marco Garosi et.al. | 2412.04247v1 | null |
2024-12-05 | Quantifying the Limits of Segment Anything Model: Analyzing Challenges in Segmenting Tree-Like and Low-Contrast Structures | Yixin Zhang et.al. | 2412.04243v1 | link |
2024-12-05 | Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image | Shuang Xu et.al. | 2412.04201v1 | null |
2024-12-05 | Unified Framework for Open-World Compositional Zero-shot Learning | Hirunima Jayasekara et.al. | 2412.04083v1 | link |
2024-12-05 | Towards Generalizable Autonomous Penetration Testing via Domain Randomization and Meta-Reinforcement Learning | Shicheng Zhou et.al. | 2412.04078v1 | link |
2024-12-04 | The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control | Ruili Feng et.al. | 2412.03568v1 | null |
2024-12-04 | FLAIR: VLM with Fine-grained Language-informed Image Representations | Rui Xiao et.al. | 2412.03561v1 | link |
2024-12-04 | Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression | Junjie Wen et.al. | 2412.03293v1 | null |
2024-12-04 | Expanding Event Modality Applications through a Robust CLIP-Based Encoder | Sungheon Jeong et.al. | 2412.03093v1 | null |
2024-12-04 | ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction | Victor Junqiu Wei et.al. | 2412.03075v1 | null |
2024-12-04 | UTSD: Unified Time Series Diffusion Model | Xiangkai Ma et.al. | 2412.03068v1 | null |
2024-12-03 | A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications | Yixiang Qu et.al. | 2412.02868v1 | null |
2024-12-03 | Is Large-Scale Pretraining the Secret to Good Domain Generalization? | Piotr Teterwak et.al. | 2412.02856v1 | null |
2024-12-03 | Enhancing Robustness of CLIP to Common Corruptions through Bimodal Test-Time Adaptation | Sarthak Kumar Maharana et.al. | 2412.02837v1 | null |
2024-12-03 | Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects | Abdurrahman Zeybey et.al. | 2412.02803v1 | null |
2024-12-03 | FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation | Kefan Chen et.al. | 2412.02690v1 | null |
2024-12-03 | Multimodal Remote Sensing Scene Classification Using VLMs and Dual-Cross Attention Networks | Jinjin Cai et.al. | 2412.02531v1 | null |
2024-12-03 | LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization | Ethan Smith et.al. | 2412.02352v1 | null |
2024-12-03 | Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation | Zhi Qu et.al. | 2412.02101v1 | link |
2024-12-03 | Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion | Liu Liu et.al. | 2412.02075v1 | link |
2024-12-02 | PKRD-CoT: A Unified Chain-of-thought Prompting for Multi-Modal Large Language Models in Autonomous Driving | Xuewen Luo et.al. | 2412.02025v1 | null |
2024-12-04 | The use of large language models to enhance cancer clinical trial educational materials | Mingye Gao et.al. | 2412.01955v2 | null |
2024-12-02 | RandAR: Decoder-only Autoregressive Visual Generation in Random Orders | Ziqi Pang et.al. | 2412.01827v1 | null |
2024-12-02 | COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | Sanghwan Kim et.al. | 2412.01814v1 | link |
2024-12-02 | Hard Constraint Guided Flow Matching for Gradient-Free Generation of PDE Solutions | Chaoran Cheng et.al. | 2412.01786v1 | null |
2024-12-02 | T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs | Shukang Yin et.al. | 2411.19951v2 | link |
2024-11-29 | Reverse Thinking Makes LLMs Stronger Reasoners | Justin Chih-Yao Chen et.al. | 2411.19865v1 | null |
2024-11-29 | Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures | Alain Riou et.al. | 2411.19806v1 | null |
2024-11-29 | Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models | Kaican Li et.al. | 2411.19757v1 | link |
2024-11-29 | Multimodal Whole Slide Foundation Model for Pathology | Tong Ding et.al. | 2411.19666v1 | link |
2024-11-29 | LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification | Taja Kuzman et.al. | 2411.19638v1 | link |
2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492v1 | null |
2024-11-29 | Proto Successor Measure: Representing the Space of All Possible Solutions of Reinforcement Learning | Siddhant Agarwal et.al. | 2411.19418v1 | null |
2024-11-28 | CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections | Mohamed Fazli Imam et.al. | 2411.19346v1 | link |
2024-11-28 | OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration | Yiming Zuo et.al. | 2411.19278v1 | link |
2024-11-27 | Diffusion Self-Distillation for Zero-Shot Customized Image Generation | Shengqu Cai et.al. | 2411.18616v1 | null |
2024-11-27 | Isolating authorship from content with semantic embeddings and contrastive learning | Javier Huertas-Tato et.al. | 2411.18472v1 | null |
2024-11-27 | SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation | Duc-Hai Pham et.al. | 2411.18229v1 | null |
2024-11-27 | DRS: Deep Question Reformulation With Structured Output | Zhecheng Li et.al. | 2411.17993v1 | link |
2024-11-26 | Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Zigeng Chen et.al. | 2411.17787v1 | link |
2024-11-26 | MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | Harsh Singh et.al. | 2411.17636v1 | null |
2024-11-26 | ShowUI: One Vision-Language-Action Model for GUI Visual Agent | Kevin Qinghong Lin et.al. | 2411.17465v1 | link |
2024-11-26 | FLEX-CLIP: Feature-Level GEneration Network Enhanced CLIP for X-shot Cross-modal Retrieval | Jingyou Xie et.al. | 2411.17454v1 | null |
2024-11-26 | PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning | Zhen Sun et.al. | 2411.17453v1 | null |
2024-11-26 | CoA: Chain-of-Action for Generative Semantic Labels | Meng Wei et.al. | 2411.17406v1 | link |
2024-11-26 | vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation | Bastian Wittmann et.al. | 2411.17386v1 | null |
2024-11-26 | 2D Matryoshka Training for Information Retrieval | Shuai Wang et.al. | 2411.17299v1 | link |
2024-11-26 | APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents | Jun Yu Chen et.al. | 2411.17255v1 | link |
2024-11-26 | Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors | Zhengfei Kuang et.al. | 2411.17249v1 | null |
2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240v1 | link |
2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668v1 | null |
2024-11-25 | Generating Out-Of-Distribution Scenarios Using Language Models | Erfan Aasi et.al. | 2411.16554v1 | null |
2024-11-25 | TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation | Linqing Zhong et.al. | 2411.16425v1 | null |
2024-11-25 | Poster: Could Large Language Models Perform Network Management? | Zine el abidine Kherroubi et.al. | 2411.16232v1 | null |
2024-11-25 | SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context | Jungang Li et.al. | 2411.16213v1 | null |
2024-11-25 | Learn from Foundation Model: Fruit Detection Model without Manual Annotation | Yanan Wang et.al. | 2411.16196v1 | link |
2024-11-25 | Language Driven Occupancy Prediction | Zhu Yu et.al. | 2411.16072v1 | link |
2024-11-25 | Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models | Niloufar Alipour Talemi et.al. | 2411.16018v1 | null |
2024-11-24 | PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making | Jonathan Light et.al. | 2411.15998v1 | null |
2024-11-24 | Segment to Recognize Robustly -- Enhancing Recognition by Image Decomposition | Klara Janouskova et.al. | 2411.15933v1 | null |
2024-11-22 | Context-Aware Multimodal Pretraining | Karsten Roth et.al. | 2411.15099v1 | null |
2024-11-22 | Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models | Aurel X. Appius et.al. | 2411.14917v1 | null |
2024-11-22 | Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation | Huy Le et.al. | 2411.14913v1 | null |
2024-11-22 | Leveraging Hierarchical Prototypes as the Verbalizer for Implicit Discourse Relation Recognition | Wanqiu Long et.al. | 2411.14880v1 | null |
2024-11-22 | VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models | Camilo Chacón Sartori et.al. | 2411.14832v1 | null |
2024-11-22 | De-biased Multimodal Electrocardiogram Analysis | Haitao Li et.al. | 2411.14795v1 | null |
2024-11-22 | Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers | Hongbo Liu et.al. | 2411.14789v1 | null |
2024-11-21 | Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems | Qihao Yuan et.al. | 2411.14594v1 | link |
2024-11-21 | Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding | Yiming Zhang et.al. | 2411.14401v1 | null |
2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347v1 | link |
2024-11-21 | StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart | Jian Shi et.al. | 2411.14295v1 | null |
2024-11-21 | Efficient Aspect-Based Summarization of Climate Change Reports with Small Language Models | Iacopo Ghinassi et.al. | 2411.14272v1 | link |
2024-11-21 | Generalizing End-To-End Autonomous Driving In Real-World Environments Using Zero-Shot LLMs | Zeyu Dong et.al. | 2411.14256v1 | null |
2024-11-21 | Evaluating the Robustness of Analogical Reasoning in Large Language Models | Martha Lewis et.al. | 2411.14215v1 | link |
2024-11-21 | Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data | Xianda Guo et.al. | 2411.14053v1 | link |
2024-11-21 | Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion | Jinhong He et.al. | 2411.13961v1 | link |
2024-11-21 | Learning to Cooperate with Humans using Generative Agents | Yancheng Liang et.al. | 2411.13934v1 | link |
2024-11-21 | CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation | Lin Sun et.al. | 2411.13836v1 | link |
2024-11-20 | Find Any Part in 3D | Ziqi Ma et.al. | 2411.13550v1 | null |
2024-11-20 | BIPro: Zero-shot Chinese Poem Generation via Block Inverse Prompting Constrained Generation Framework | Xu Zou et.al. | 2411.13237v1 | null |
2024-11-20 | Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding | Nabeel Seedat et.al. | 2411.13163v1 | null |
2024-11-20 | Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM | Jiawei Yu et.al. | 2411.13159v1 | null |
2024-11-20 | Learning Time-Optimal and Speed-Adjustable Tactile In-Hand Manipulation | Johannes Pitz et.al. | 2411.13148v1 | null |
2024-11-20 | TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models | Xin Wang et.al. | 2411.13136v1 | null |
2024-11-20 | Training Physics-Driven Deep Learning Reconstruction without Raw Data Access for Equitable Fast MRI | Yaşar Utku Alçalar et.al. | 2411.13022v1 | null |
2024-11-20 | Evaluating LLMs Capabilities Towards Understanding Social Dynamics | Anique Tahir et.al. | 2411.13008v1 | null |
2024-11-19 | Improving Controllability and Editability for Pretrained Text-to-Music Generation Models | Yixiao Zhang et.al. | 2411.12641v1 | null |
2024-11-19 | Instant Policy: In-Context Imitation Learning via Graph Diffusion | Vitalis Vosylius et.al. | 2411.12633v1 | null |
2024-11-19 | SAM Carries the Burden: A Semi-Supervised Approach Refining Pseudo Labels for Medical Segmentation | Ron Keuth et.al. | 2411.12602v1 | link |
2024-11-19 | Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing | Ruyi Ding et.al. | 2411.12508v1 | null |
2024-11-19 | Predicting User Intents and Musical Attributes from Music Discovery Conversations | Daeyong Kwon et.al. | 2411.12254v1 | link |
2024-11-19 | Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings | Iroro Orife et.al. | 2411.12209v1 | link |
2024-11-19 | A More Advanced Group Polarization Measurement Approach Based on LLM-Based Agents and Graphs | Zixin Liu et.al. | 2411.12196v1 | null |
2024-11-19 | UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning | Yuan Yuan et.al. | 2411.12164v1 | link |
2024-11-19 | HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments | Shuijing Liu et.al. | 2411.12150v1 | null |
2024-11-18 | VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation | Bangguo Yu et.al. | 2411.11609v1 | null |
2024-11-18 | Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting | Hongjun Wang et.al. | 2411.11448v1 | link |
2024-11-18 | Scalable Autoregressive Monocular Depth Estimation | Jinhong Wang et.al. | 2411.11361v1 | null |
2024-11-18 | Text-guided Zero-Shot Object Localization | Jingjing Wang et.al. | 2411.11357v1 | null |
2024-11-18 | Visual-Semantic Graph Matching Net for Zero-Shot Learning | Bowen Duan et.al. | 2411.11351v1 | link |
2024-11-18 | Zero-Shot Load Forecasting with Large Language Models | Wenlong Liao et.al. | 2411.11350v1 | null |
2024-11-18 | Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation | Peng Shu et.al. | 2411.11295v1 | null |
2024-11-18 | Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2411.11288v1 | null |
2024-11-18 | Zero-Shot Automatic Annotation and Instance Segmentation using LLM-Generated Datasets: Eliminating Field Imaging and Manual Annotation for Deep Learning Model Development | Ranjan Sapkota et.al. | 2411.11285v1 | null |
2024-11-18 | ZeFaV: Boosting Large Language Models for Zero-shot Fact Verification | Son T. Luu et.al. | 2411.11247v1 | link |
2024-11-15 | Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting | Ziqi Xie et.al. | 2411.10309v1 | link |
2024-11-15 | CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Dengke Zhang et.al. | 2411.10086v1 | null |
2024-11-15 | 'What did the Robot do in my Absence?' Video Foundation Models to Enhance Intermittent Supervision | Kavindie Katuwandeniya et.al. | 2411.10016v1 | null |
2024-11-15 | Zero-shot Voice Conversion with Diffusion Transformers | Songting Liu et.al. | 2411.09943v1 | link |
2024-11-14 | LLM Hallucination Reasoning with Zero-shot Knowledge Test | Seongmin Lee et.al. | 2411.09689v1 | null |
2024-11-14 | Script-centric behavior understanding for assisted autism spectrum disorder diagnosis | Wenxing Liu et.al. | 2411.09413v1 | null |
2024-11-14 | Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures | Shuzhi Gong et.al. | 2411.09389v1 | null |
2024-11-14 | Exploring Zero-Shot Anomaly Detection with CLIP in Medical Imaging: Are We There Yet? | Aldo Marzullo et.al. | 2411.09310v1 | null |
2024-11-14 | Mono2Stereo: Monocular Knowledge Transfer for Enhanced Stereo Matching | Yuran Wang et.al. | 2411.09151v1 | null |
2024-11-15 | UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos | Chengbo Yuan et.al. | 2411.09145v2 | null |
2024-11-13 | Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training | Nghia Trung Ngo et.al. | 2411.08785v1 | null |
2024-11-13 | Measuring similarity between embedding spaces using induced neighborhood graphs | Tiago F. Tavares et.al. | 2411.08687v1 | null |
2024-11-13 | Zero-shot capability of SAM-family models for bone segmentation in CT scans | Caroline Magg et.al. | 2411.08629v1 | null |
2024-11-13 | Grammarization-Based Grasping with Deep Multi-Autoencoder Latent Space Exploration by Reinforcement Learning Agent | Leonidas Askianakis et.al. | 2411.08566v1 | null |
2024-11-13 | CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs | Suhas S Kowshik et.al. | 2411.08553v1 | null |
2024-11-13 | An Information Theoretic Approach to Operationalize Right to Data Protection | Abhinav Java et.al. | 2411.08506v1 | null |
2024-11-13 | Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval | Yeong-Joon Ju et.al. | 2411.08334v1 | link |
2024-11-12 | Retrieval Augmented Time Series Forecasting | Kutay Tire et.al. | 2411.08249v1 | link |
2024-11-12 | Latent Space Disentanglement in Diffusion Transformers Enables Precise Zero-shot Semantic Editing | Zitao Shuai et.al. | 2411.08196v1 | null |
2024-11-12 | LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | Anoop Cherian et.al. | 2411.08027v1 | null |
2024-11-12 | Semantic Sleuth: Identifying Ponzi Contracts via Large Language Models | Cong Wu et.al. | 2411.07498v1 | null |
2024-11-11 | Untangling Hate Speech Definitions: A Semantic Componential Analysis Across Cultures and Domains | Katerina Korre et.al. | 2411.07417v1 | null |
2024-11-11 | Warmstarting for Scaling Language Models | Neeratyoy Mallik et.al. | 2411.07340v1 | null |
2024-11-11 | DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning | Zecheng Zhang et.al. | 2411.07239v1 | null |
2024-11-11 | The Super Weight in Large Language Models | Mengxia Yu et.al. | 2411.07191v1 | link |
2024-11-11 | NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics | David Robinson et.al. | 2411.07186v1 | null |
2024-11-11 | SAMPart3D: Segment Any Part in 3D Objects | Yunhan Yang et.al. | 2411.07184v1 | link |
2024-11-11 | Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models | Yanchen Wang et.al. | 2411.07121v1 | link |
2024-11-11 | Transformer verbatim in-context retrieval across time and scale | Kristijan Armeni et.al. | 2411.07075v1 | link |
2024-11-11 | MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps | Xue Xia et.al. | 2411.06971v1 | null |
2024-11-11 | Robust Fine-tuning of Zero-shot Models via Variance Reduction | Beier Zhu et.al. | 2411.06966v1 | link |
2024-11-11 | UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models | Jiachen Liang et.al. | 2411.06921v1 | null |
2024-11-11 | Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning | Hongsheng Zhang et.al. | 2411.06764v1 | null |
2024-11-08 | End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Dylan Goetting et.al. | 2411.05755v1 | link |
2024-11-08 | Asterisk: Keep it Simple* | Andrew Semenov et.al. | 2411.05691v1 | null |
2024-11-08 | Assessing Open-Source Large Language Models on Argumentation Mining Subtasks | Mohammad Yeghaneh Abkenar et.al. | 2411.05639v1 | null |
2024-11-08 | An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking | Zijian Chen et.al. | 2411.05508v1 | null |
2024-11-08 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Shengda Fan et.al. | 2411.05451v1 | link |
2024-11-08 | Enhancing Visual Classification using Comparative Descriptors | Hankyeol Lee et.al. | 2411.05357v1 | link |
2024-11-08 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving | Tao Ma et.al. | 2411.05311v1 | null |
2024-11-07 | Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities | Shengzhi Li et.al. | 2411.05232v1 | link |
2024-11-07 | Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation | Mu Yang et.al. | 2411.05141v1 | null |
2024-11-07 | SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation | Koichi Namekata et.al. | 2411.04989v1 | null |
2024-11-07 | DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning | Gaoyue Zhou et.al. | 2411.04983v1 | null |
2024-11-07 | Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games | Usman Anwar et.al. | 2411.04976v1 | link |
2024-11-07 | In the Era of Prompt Learning with Vision-Language Models | Ankit Jha et.al. | 2411.04892v1 | null |
2024-11-07 | Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks | Sanja Karilanova et.al. | 2411.04760v1 | null |
2024-11-07 | Vision Language Models are In-Context Value Learners | Yecheng Jason Ma et.al. | 2411.04549v1 | null |
2024-11-07 | Best Practices for Distilling Large Language Models into BERT for Web Search Ranking | Dezhi Ye et.al. | 2411.04539v1 | null |
2024-11-07 | Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models | Xinyu Zhang et.al. | 2411.04530v1 | null |
2024-11-07 | Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting Diversity | Robby Costales et.al. | 2411.04466v1 | null |
2024-11-07 | AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering | Yungeng Liu et.al. | 2411.04440v1 | link |
2024-11-06 | RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models | Maya Varma et.al. | 2411.04097v1 | link |
2024-11-06 | Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models | Minh Duc Bui et.al. | 2411.03888v1 | link |
2024-11-06 | SA3DIP: Segment Any 3D Instance with Potential 3D Priors | Xi Yang et.al. | 2411.03819v1 | link |
2024-11-06 | No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages | Youssef Mohamed et.al. | 2411.03769v1 | link |
2024-11-06 | Zero-shot Dynamic MRI Reconstruction with Global-to-local Diffusion Model | Yu Guan et.al. | 2411.03723v1 | null |
2024-11-06 | Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction | Muhammad Tayyab Khan et.al. | 2411.03707v1 | null |
2024-11-06 | 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement | Ziqi Lu et.al. | 2411.03706v1 | link |
2024-11-06 | Towards Scalable Automated Grading: Leveraging Large Language Models for Conceptual Question Evaluation in Engineering | Rujun Gao et.al. | 2411.03659v1 | null |
2024-11-05 | Exploring the Benefits of Domain-Pretraining of Generative Large Language Models for Chemistry | Anurag Acharya et.al. | 2411.03542v1 | null |
2024-11-05 | A Mamba Foundation Model for Time Series Forecasting | Haoyu Ma et.al. | 2411.02941v1 | null |
2024-11-05 | DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark | Haodong Li et.al. | 2411.02733v1 | link |
2024-11-04 | EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector | Deok-Hyeon Cho et.al. | 2411.02625v1 | link |
2024-11-04 | MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs | Sheng-Chieh Lin et.al. | 2411.02571v1 | null |
2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545v1 | null |
2024-11-04 | A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification | Sorouralsadat Fatemi et.al. | 2411.02476v1 | null |
2024-11-04 | Do Advanced Language Models Eliminate the Need for Prompt Engineering in Software Engineering? | Guoqing Wang et.al. | 2411.02093v1 | null |
2024-11-04 | CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching | Yu Pan et.al. | 2411.02026v1 | null |
2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925v1 | null |
2024-11-04 | ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation | Hengkai Tan et.al. | 2411.01850v1 | null |
2024-11-04 | DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability | Bo Gao et.al. | 2411.01819v1 | null |
2024-11-03 | Investigating Large Language Models for Complex Word Identification in Multilingual and Multidomain Setups | Răzvan-Alexandru Smădu et.al. | 2411.01706v1 | link |
2024-11-03 | Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli | Matthias Tangemann et.al. | 2411.01505v1 | link |
2024-11-02 | Task-Oriented Hierarchical Object Decomposition for Visuomotor Control | Jianing Qian et.al. | 2411.01284v1 | null |
2024-11-02 | MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction | Wang Zhao et.al. | 2411.01226v1 | link |
2024-11-02 | Transfer Learning for Finetuning Large Language Models | Tobias Strangmann et.al. | 2411.01195v1 | null |
2024-10-31 | DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models | Heng-Jui Chang et.al. | 2410.24177v1 | null |
2024-11-02 | Kevin Black et.al. | 2410.24164v2 | null | |
2024-10-31 | Scaling Concept With Text-Guided Diffusion Models | Chao Huang et.al. | 2410.24151v1 | null |
2024-10-31 | Matchmaker: Self-Improving Large Language Model Programs for Schema Matching | Nabeel Seedat et.al. | 2410.24105v1 | null |
2024-10-31 | In-Context Fine-Tuning for Time-Series Foundation Models | Abhimanyu Das et.al. | 2410.24087v1 | null |
2024-10-31 | GAMap: Zero-Shot Object Goal Navigation with Multi-Scale Geometric-Affordance Guidance | Shuaihang Yuan et.al. | 2410.23978v1 | null |
2024-10-31 | Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model | Hao Zhang et.al. | 2410.23905v1 | link |
2024-10-31 | EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection | Qinqian Lei et.al. | 2410.23904v1 | link |
2024-10-31 | The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge | Dake Guo et.al. | 2410.23815v1 | null |
2024-10-31 | RealMind: Zero-Shot EEG-Based Visual Decoding and Captioning Using Multi-Modal Models | Dongyang Li et.al. | 2410.23754v1 | null |
2024-10-30 | Multi-student Diffusion Distillation for Better One-step Generators | Yanke Song et.al. | 2410.23274v1 | null |
2024-10-30 | Partial Channel Dependence with Channel Masks for Time Series Foundation Models | Seunghan Lee et.al. | 2410.23222v1 | null |
2024-10-30 | Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks | Michael Matthews et.al. | 2410.23208v1 | link |
2024-10-30 | FlexTSF: A Universal Forecasting Model for Time Series with Variable Regularities | Jingge Xiao et.al. | 2410.23160v1 | link |
2024-10-30 | DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes | Jialiang Zhang et.al. | 2410.23004v1 | null |
2024-10-30 | SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset | Ngoc Dung Huynh et.al. | 2410.22648v1 | null |
2024-10-30 | SleepNetZero: Zero-Burden Zero-Shot Reliable Sleep Staging With Neural Networks Based on Ballistocardiograms | Shuzhen Li et.al. | 2410.22646v1 | null |
2024-10-29 | RealCQA-V2 : Visual Premise Proving | Saleem Ahmed et.al. | 2410.22492v1 | null |
2024-10-29 | Local Policies Enable Zero-shot Long-horizon Manipulation | Murtaza Dalal et.al. | 2410.22332v1 | null |
2024-10-29 | Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | Yuxuan Chen et.al. | 2410.22240v1 | link |
2024-10-29 | Active Learning for Vision-Language Models | Bardia Safaei et.al. | 2410.22187v1 | null |
2024-10-29 | Data Generation for Hardware-Friendly Post-Training Quantization | Lior Dikstein et.al. | 2410.22110v1 | link |
2024-10-29 | PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement | Shutong Jin et.al. | 2410.22059v1 | null |
2024-10-29 | Reliable Semantic Understanding for Real World Zero-shot Object Goal Navigation | Halil Utku Unlu et.al. | 2410.21926v1 | null |
2024-10-30 | Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models | Lu Yu et.al. | 2410.21802v2 | link |
2024-10-29 | Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer | Zihan Pengmei et.al. | 2410.21683v1 | null |
2024-10-28 | SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval | Isidora Chara Tourni et.al. | 2410.21501v1 | null |
2024-10-28 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | Wanhua Li et.al. | 2410.21411v1 | link |
2024-10-28 | Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | Nour Jedidi et.al. | 2410.21242v1 | null |
2024-10-28 | Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments | Marharyta Domnich et.al. | 2410.21131v1 | link |
2024-10-28 | Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model | Yang Tan et.al. | 2410.21127v1 | link |
2024-10-28 | Zero-Shot Action Recognition in Surveillance Videos | Joao Pereira et.al. | 2410.21113v1 | null |
2024-10-28 | Exploring the Reliability of Foundation Model-Based Frontier Selection in Zero-Shot Object Goal Navigation | Shuaihang Yuan et.al. | 2410.21037v1 | null |
2024-10-28 | Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies | Franck Djeumou et.al. | 2410.20990v1 | null |
2024-10-28 | DeTeCtive: Detecting AI-generated Text via Multi-Level Contrastive Learning | Xun Guo et.al. | 2410.20964v1 | link |
2024-10-28 | MrT5: Dynamic Token Merging for Efficient Byte-level Language Models | Julie Kallini et.al. | 2410.20771v1 | link |
2024-10-28 | Face-MLLM: A Large Face Perception Model | Haomiao Sun et.al. | 2410.20717v1 | null |
2024-10-28 | Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design | Xiangxin Zhou et.al. | 2410.20688v1 | link |
2024-10-25 | Adversarial Environment Design via Regret-Guided Diffusion Models | Hojun Chung et.al. | 2410.19715v1 | null |
2024-10-25 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Xiangyu Zeng et.al. | 2410.19702v1 | null |
2024-10-25 | IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation | Kaixian Qu et.al. | 2410.19697v1 | null |
2024-10-25 | Context-Based Visual-Language Place Recognition | Soojin Woo et.al. | 2410.19341v1 | link |
2024-10-25 | Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting | Xingyu Zhu et.al. | 2410.19294v1 | null |
2024-10-24 | Label Set Optimization via Activation Distribution Kurtosis for Zero-shot Classification with Generative Models | Yue Li et.al. | 2410.19195v1 | null |
2024-10-24 | AlignCap: Aligning Speech Emotion Captioning to Human Preferences | Ziqi Liang et.al. | 2410.19134v1 | null |
2024-10-24 | ConceptDrift: Uncovering Biases through the Lens of Foundational Models | Cristian Daniel Păduraru et.al. | 2410.18970v1 | null |
2024-10-24 | BioMistral-NLU: Towards More Generalizable Medical Language Understanding through Instruction Tuning | Yujuan Velvin Fu et.al. | 2410.18955v1 | null |
2024-10-24 | SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment | Caelan Garrett et.al. | 2410.18907v1 | null |
2024-10-24 | Probabilistic Language-Image Pre-Training | Sanghyuk Chun et.al. | 2410.18857v1 | link |
2024-10-24 | Task Calibration: Calibrating Large Language Models on Inference Tasks | Yingjie Li et.al. | 2410.18764v1 | null |
2024-10-24 | Data Scaling Laws in Imitation Learning for Robotic Manipulation | Fanqi Lin et.al. | 2410.18647v1 | null |
2024-10-24 | Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data | Anup Shirgaonkar et.al. | 2410.18588v1 | null |
2024-10-24 | Zero-shot Object Navigation with Vision-Language Models Reasoning | Congcong Wen et.al. | 2410.18570v1 | null |
2024-10-24 | Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics | Jinghao Hu et.al. | 2410.18537v1 | null |
2024-10-24 | Scaling up Masked Diffusion Models on Text | Shen Nie et.al. | 2410.18514v1 | link |
2024-10-23 | Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases | Anna Glazkova et.al. | 2410.18040v1 | null |
2024-10-23 | Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models | Nils Blank et.al. | 2410.17772v1 | null |
2024-10-23 | Learning Versatile Skills with Curriculum Masking | Yao Tang et.al. | 2410.17744v1 | link |
2024-10-23 | Entity-based Reinforcement Learning for Autonomous Cyber Defence | Isaac Symes Thompson et.al. | 2410.17647v1 | link |
2024-10-23 | Incremental Learning of Affordances using Markov Logic Networks | George Potter et.al. | 2410.17624v1 | null |
2024-10-23 | Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective | Rui Yang et.al. | 2410.17600v1 | null |
2024-10-23 | Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors | Bang You et.al. | 2410.17551v1 | null |
2024-10-23 | Generalizable Motion Planning via Operator Learning | Sharath Matada et.al. | 2410.17547v1 | null |
2024-10-23 | X-MOBILITY: End-To-End Generalizable Navigation via World Modeling | Wei Liu et.al. | 2410.17491v1 | null |
2024-10-22 | Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2410.17393v1 | null |
2024-10-22 | Altogether: Image Captioning via Re-aligning Alt-text | Hu Xu et.al. | 2410.17251v1 | link |
2024-10-22 | LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias | Haian Jin et.al. | 2410.17242v1 | null |
2024-10-22 | Are Visual-Language Models Effective in Action Recognition? A Comparative Study | Mahmoud Ali et.al. | 2410.17149v1 | null |
2024-10-22 | LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging | Ke Wang et.al. | 2410.17146v1 | link |
2024-10-22 | SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine | Xiaochen Wang et.al. | 2410.17021v1 | null |
2024-10-22 | Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations | Cheng Lei et.al. | 2410.16953v1 | null |
2024-10-22 | DNAHLM -- DNA sequence and Human Language mixed large language Model | Wang Liang et.al. | 2410.16917v1 | link |
2024-10-22 | AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models | Yongjian Wu et.al. | 2410.16820v1 | link |
2024-10-22 | PLDR-LLM: Large Language Model from Power Law Decoder Representations | Burc Gokden et.al. | 2410.16703v1 | link |
2024-10-22 | GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting | Pai Zhu et.al. | 2410.16647v1 | null |
2024-10-21 | MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report | Samrajya Thapa et.al. | 2410.16239v1 | link |
2024-10-21 | IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems | Yihuan Mao et.al. | 2410.16237v1 | null |
2024-10-21 | Continuous Speech Synthesis using per-token Latent Diffusion | Arnon Turetzky et.al. | 2410.16048v1 | null |
2024-10-21 | Few-shot target-driven instance detection based on open-vocabulary object detection models | Ben Crulis et.al. | 2410.16028v1 | null |
2024-10-21 | Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly | Junsheng Zhou et.al. | 2410.15971v1 | null |
2024-10-21 | Mitigating Object Hallucination via Concentric Causal Attention | Yun Xing et.al. | 2410.15926v1 | link |
2024-10-21 | MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images | Pablo Meseguer et.al. | 2410.15881v1 | null |
2024-10-21 | Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images | Yiming Li et.al. | 2410.15879v1 | null |
2024-10-21 | FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL | Woosung Koh et.al. | 2410.15876v1 | null |
2024-10-21 | Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment | Yankai Jiang et.al. | 2410.15744v1 | null |
2024-10-18 | BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities | Shaozhe Hao et.al. | 2410.14672v1 | link |
2024-10-18 | Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum | Ryan Soh-Eun Shim et.al. | 2410.14589v1 | null |
2024-10-18 | SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning | Magdalena Wysocka et.al. | 2410.14399v1 | null |
2024-10-18 | AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios | Ziming Huang et.al. | 2410.14379v1 | link |
2024-10-18 | Zero-shot Action Localization via the Confidence of Large Vision-Language Models | Josiah Aklilu et.al. | 2410.14340v1 | null |
2024-10-18 | Storyboard guided Alignment for Fine-grained Video Action Recognition | Enqi Liu et.al. | 2410.14238v1 | null |
2024-10-18 | Assessing Open-world Forgetting in Generative Image Model Customization | Héctor Laria et.al. | 2410.14159v1 | null |
2024-10-17 | Measuring and Modifying the Readability of English Texts with GPT-4 | Sean Trott et.al. | 2410.14028v1 | link |
2024-10-17 | Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens | Lijie Fan et.al. | 2410.13863v1 | null |
2024-10-17 | VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding | Runsen Xu et.al. | 2410.13860v1 | link |
2024-10-17 | DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control | Yujie Wei et.al. | 2410.13830v1 | null |
2024-10-17 | AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | Ke Yang et.al. | 2410.13825v1 | null |
2024-10-17 | Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers | Yuchen Liang et.al. | 2410.13746v1 | null |
2024-10-17 | ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions | Shailaja Keyur Sampat et.al. | 2410.13662v1 | link |
2024-10-17 | Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? | Shailaja Keyur Sampat et.al. | 2410.13651v1 | link |
2024-10-18 | Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything | Joonhyeon Song et.al. | 2410.13621v2 | link |
2024-10-17 | Large Language Models as Narrative-Driven Recommenders | Lukas Eberhard et.al. | 2410.13604v1 | null |
2024-10-17 | Representing Model Weights with Language using Tree Experts | Eliahu Horwitz et.al. | 2410.13569v1 | null |
2024-10-16 | In-Context Learning Enables Robot Action Prediction in LLMs | Yida Yin et.al. | 2410.12782v1 | null |
2024-10-16 | Towards Zero-Shot Camera Trap Image Categorization | Jiří Vyskočil et.al. | 2410.12769v1 | null |
2024-10-16 | Towards Graph Foundation Models: The Perspective of Zero-shot Reasoning on Knowledge Graphs | Kai Wang et.al. | 2410.12609v1 | null |
2024-10-16 | A Claim Decomposition Benchmark for Long-form Answer Verification | Zhihao Zhang et.al. | 2410.12558v1 | link |
2024-10-16 | SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling | Loris Gaven et.al. | 2410.12481v1 | null |
2024-10-16 | SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset | Xuyuan Li et.al. | 2410.12399v1 | null |
2024-10-16 | ERVQ: Enhanced Residual Vector Quantization with Intra-and-Inter-Codebook Optimization for Neural Audio Codecs | Rui-Chen Zheng et.al. | 2410.12359v1 | null |
2024-10-16 | MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation | An-Sheng Lee et.al. | 2410.12330v1 | link |
2024-10-16 | Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety | Lucas Choi et.al. | 2410.12225v1 | null |
2024-10-15 | Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming | Yilun Hao et.al. | 2410.12112v1 | null |
2024-10-15 | FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting | Zhe Li et.al. | 2410.11802v1 | null |
2024-10-15 | Time-Series Foundation Model for Value-at-Risk | Anubha Goel et.al. | 2410.11773v1 | link |
2024-10-15 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab et.al. | 2410.11711v1 | link |
2024-10-15 | PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning | Man Liu et.al. | 2410.11560v1 | null |
2024-10-15 | AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data | Xinjie Zhao et.al. | 2410.11531v1 | null |
2024-10-15 | Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction | Renhang Liu et.al. | 2410.11522v1 | link |
2024-10-15 | Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement | Zhi Wang et.al. | 2410.11448v1 | link |
2024-10-15 | DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM | Yingjun Shen et.al. | 2410.11373v1 | null |
2024-10-15 | Enhance Graph Alignment for Large Language Models | Haitong Luo et.al. | 2410.11370v1 | null |
2024-10-15 | In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinions | Alireza Shamshiri et.al. | 2410.11265v1 | null |
2024-10-14 | Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models | Jingzhi Bao et.al. | 2410.10821v1 | link |
2024-10-14 | Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations | Litu Rout et.al. | 2410.10792v1 | null |
2024-10-14 | SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators | Rasoul Shafipour et.al. | 2410.10714v1 | null |
2024-10-14 | MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | Minghao Zhu et.al. | 2410.10589v1 | link |
2024-10-14 | Recipe for Zero-shot POS Tagging: Is It Useful in Realistic Scenarios? | Zeno Vandenbulcke et.al. | 2410.10576v1 | null |
2024-10-14 | Continual Learning Improves Zero-Shot Action Recognition | Shreyank N Gowda et.al. | 2410.10497v1 | null |
2024-10-14 | Learning to Ground VLMs without Forgetting | Aritra Bhowmik et.al. | 2410.10491v1 | null |
2024-10-14 | Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts | Xu Liu et.al. | 2410.10469v1 | null |
2024-10-14 | 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting | Wanlin Liang et.al. | 2410.10412v1 | null |
2024-10-14 | GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation | Taha Aksu et.al. | 2410.10393v1 | link |
2024-10-11 | Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures | Evan Lucas et.al. | 2410.08971v1 | null |
2024-10-11 | NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Zheng Yi Ho et.al. | 2410.08970v1 | null |
2024-10-11 | Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images | Virmarie Maquiling et.al. | 2410.08926v1 | null |
2024-10-11 | SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation | Haosheng Li et.al. | 2410.08901v1 | null |
2024-10-11 | A Benchmark for Cross-Domain Argumentative Stance Classification on Social Media | Jiaqing Yuan et.al. | 2410.08900v1 | null |
2024-10-11 | RoRA-VLM: Robust Retrieval-Augmented Vision Language Models | Jingyuan Qi et.al. | 2410.08876v1 | null |
2024-10-11 | One-shot Generative Domain Adaptation in 3D GANs | Ziqiang Li et.al. | 2410.08824v1 | link |
2024-10-11 | Zero-Shot Offline Imitation Learning via Optimal Transport | Thomas Rupf et.al. | 2410.08751v1 | link |
2024-10-11 | Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers | Jin Cao et.al. | 2410.08688v1 | link |
2024-10-11 | Boosting Open-Vocabulary Object Detection by Handling Background Samples | Ruizhe Zeng et.al. | 2410.08645v1 | null |
2024-10-10 | LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts | Anh-Quan Cao et.al. | 2410.08211v1 | null |
2024-10-10 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation | Hang Yin et.al. | 2410.08189v1 | null |
2024-10-10 | On the Evaluation of Generative Robotic Simulations | Feng Chen et.al. | 2410.08172v1 | null |
2024-10-10 | ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion | Zitian Zhang et.al. | 2410.08168v1 | null |
2024-10-10 | Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning | Vassil Atanassov et.al. | 2410.07877v1 | null |
2024-10-10 | RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation | Songming Liu et.al. | 2410.07864v1 | null |
2024-10-10 | Rewriting Conversational Utterances with Instructed Large Language Models | Elnara Galimzhanova et.al. | 2410.07797v1 | null |
2024-10-10 | The Power of Input: Benchmarking Zero-Shot Sim-To-Real Transfer of Reinforcement Learning Control Policies for Quadrotor Control | Alberto Dionigi et.al. | 2410.07686v1 | null |
2024-10-10 | Parallel Digital Twin-driven Deep Reinforcement Learning for User Association and Load Balancing in Dynamic Wireless Networks | Zhenyu Tao et.al. | 2410.07611v1 | null |
2024-10-10 | CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features | Po-han Li et.al. | 2410.07610v1 | null |
2024-10-09 | AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation | Yukang Cao et.al. | 2410.07164v1 | null |
2024-10-09 | Exploring the Readiness of Prominent Small Language Models for the Democratization of Financial Literacy | Tagore Rao Kosireddy et.al. | 2410.07118v1 | link |
2024-10-09 | Collusion Detection with Graph Neural Networks | Lucas Gomes et.al. | 2410.07091v1 | null |
2024-10-09 | Stanceformer: Target-Aware Transformer for Stance Detection | Krishna Garg et.al. | 2410.07083v1 | link |
2024-10-09 | Compositional Entailment Learning for Hyperbolic Vision-Language Models | Avik Pal et.al. | 2410.06912v1 | null |
2024-10-09 | F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching | Yushen Chen et.al. | 2410.06885v1 | link |
2024-10-09 | K-SAM: A Prompting Method Using Pretrained U-Net to Improve Zero Shot Performance of SAM on Lung Segmentation in CXR Images | Mohamed Deriche et.al. | 2410.06825v1 | null |
2024-10-09 | Toward Physics-guided Time Series Embedding | Jiaxi Hu et.al. | 2410.06651v1 | null |
2024-10-09 | Open-RGBT: Open-vocabulary RGB-T Zero-shot Semantic Segmentation in Open-world Environments | Meng Yu et.al. | 2410.06626v1 | null |
2024-10-09 | DCP: Learning Accelerator Dataflow for Neural Network via Propagation | Peng Xu et.al. | 2410.06553v1 | null |
2024-10-07 | Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality | Youngtaek Oh et.al. | 2410.05210v1 | link |
2024-10-07 | ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering | Francesco Maria Molfese et.al. | 2410.05077v1 | link |
2024-10-07 | PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing | Feng Tian et.al. | 2410.04844v1 | null |
2024-10-07 | LPZero: Language Model Zero-cost Proxy Search from Zero | Peijie Dong et.al. | 2410.04808v1 | null |
2024-10-07 | Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data | Matteo Risso et.al. | 2410.04802v1 | null |
2024-10-07 | Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering | Kazumoto Nakamura et.al. | 2410.04801v1 | null |
2024-10-07 | Document-level Causal Relation Extraction with Knowledge-guided Binary Question Answering | Zimu Wang et.al. | 2410.04752v1 | null |
2024-10-07 | ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction | Hyungjin Chung et.al. | 2410.04721v1 | null |
2024-10-07 | Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks | Yu-Hua Chen et.al. | 2410.04702v1 | null |
2024-10-07 | SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech | Minchan Kim et.al. | 2410.04690v1 | null |
2024-10-04 | GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs | Pu Hua et.al. | 2410.03645v1 | null |
2024-10-04 | What Matters for Model Merging at Scale? | Prateek Yadav et.al. | 2410.03617v1 | null |
2024-10-04 | Table Question Answering for Low-resourced Indic Languages | Vaishali Pal et.al. | 2410.03576v1 | link |
2024-10-04 | STREAMS: An Assistive Multimodal AI Framework for Empowering Biosignal Based Robotic Controls | Ali Rabiee et.al. | 2410.03486v1 | null |
2024-10-04 | Zero-Shot Fact Verification via Natural Logic and Large Language Models | Marek Strong et.al. | 2410.03341v1 | link |
2024-10-04 | Selective Test-Time Adaptation for Unsupervised Anomaly Detection using Neural Implicit Representations | Sameer Ambekar et.al. | 2410.03306v1 | link |
2024-10-04 | Comparing zero-shot self-explanations with human rationales in multilingual text classification | Stephanie Brandl et.al. | 2410.03296v1 | null |
2024-10-04 | Enhanced Transformer architecture for in-context learning of dynamical systems | Matteo Rufolo et.al. | 2410.03291v1 | null |
2024-10-04 | What do Large Language Models Need for Machine Translation Evaluation? | Shenbin Qian et.al. | 2410.03278v1 | link |
2024-10-04 | PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Saleh Afzoon et.al. | 2410.03198v1 | null |
2024-10-03 | Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations | Nick Jiang et.al. | 2410.02762v1 | link |
2024-10-03 | Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | Ulyana Piterbarg et.al. | 2410.02749v1 | link |
2024-10-03 | Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers | Shijie Chen et.al. | 2410.02642v1 | null |
2024-10-03 | Plots Unlock Time-Series Understanding in Multimodal Models | Mayank Daswani et.al. | 2410.02637v1 | null |
2024-10-03 | LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model | Duy M. H. Nguyen et.al. | 2410.02615v1 | null |
2024-10-03 | Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment | Kai Liu et.al. | 2410.02505v1 | link |
2024-10-03 | Cross-Embodiment Dexterous Grasping with Reinforcement Learning | Haoqi Yuan et.al. | 2410.02479v1 | null |
2024-10-03 | Learning Diverse Bimanual Dexterous Manipulation Skills from Human Demonstrations | Bohan Zhou et.al. | 2410.02477v1 | null |
2024-10-03 | Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification | Yunchuan Guan et.al. | 2410.02267v1 | link |
2024-10-03 | Visual Prompting in LLMs for Enhancing Emotion Recognition | Qixuan Zhang et.al. | 2410.02244v1 | null |
2024-10-02 | An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings | Soham Govande et.al. | 2410.01704v1 | link |
2024-10-02 | Saliency-Guided DETR for Moment Retrieval and Highlight Detection | Aleksandr Gordeev et.al. | 2410.01615v1 | link |
2024-10-02 | Coordinate-Based Neural Representation Enabling Zero-Shot Learning for 3D Multiparametric Quantitative MRI | Guoyan Lao et.al. | 2410.01577v1 | null |
2024-10-03 | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections | Francesc Net et.al. | 2410.01536v2 | link |
2024-10-02 | Toward a Holistic Evaluation of Robustness in CLIP Models | Weijie Tu et.al. | 2410.01534v1 | null |
2024-10-02 | SinkSAM: A Monocular Depth-Guided SAM Framework for Automatic Sinkhole Segmentation | Osher Rafaeli et.al. | 2410.01473v1 | link |
2024-10-02 | The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs | Hong Li et.al. | 2410.01417v1 | null |
2024-10-02 | AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment | Umair Nawaz et.al. | 2410.01407v1 | link |
2024-10-02 | Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots | Renkai Wu et.al. | 2410.01395v1 | link |
2024-10-02 | Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling | Yuguang Yang et.al. | 2410.01350v1 | null |
2024-09-30 | Uni |
Yubin Wang et.al. | 2409.20558v1 | null |
2024-09-30 | Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos | Md Mohaiminul Islam et.al. | 2409.20557v1 | null |
2024-09-30 | Robi Butler: Remote Multimodal Interactions with Household Robot Assistant | Anxing Xiao et.al. | 2409.20548v1 | null |
2024-09-30 | FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing | Lingling Cai et.al. | 2409.20500v1 | null |
2024-10-01 | Instance-adaptive Zero-shot Chain-of-Thought Prompting | Xiaosong Yuan et.al. | 2409.20441v2 | null |
2024-09-30 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao et.al. | 2409.20365v1 | link |
2024-09-30 | CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset | Akshatha Arodi et.al. | 2409.20353v1 | link |
2024-09-30 | RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning | Yuxuan Wu et.al. | 2409.20291v1 | null |
2024-09-30 | Analysing Zero-Shot Readability-Controlled Sentence Simplification | Abdullah Barayan et.al. | 2409.20246v1 | null |
2024-09-30 | VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection | Huilin Deng et.al. | 2409.20146v1 | null |
2024-09-27 | Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs | Yanyuan Qiao et.al. | 2409.18794v1 | null |
2024-09-27 | When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation | Yuli Zhou et.al. | 2409.18653v1 | link |
2024-09-27 | Do LLMs suffer from Multi-Party Hangover? A Diagnostic Approach to Addressee Recognition and Response Selection in Conversations | Nicolò Penzo et.al. | 2409.18602v1 | link |
2024-09-27 | "Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models | Ricardo Knauer et.al. | 2409.18594v1 | null |
2024-09-27 | EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis | Haoyu Wang et.al. | 2409.18512v1 | null |
2024-09-27 | Exploring Language Model Generalization in Low-Resource Extractive QA | Saptarshi Sengupta et.al. | 2409.18446v1 | link |
2024-09-26 | AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models | Xin Hong et.al. | 2409.18339v1 | null |
2024-09-26 | Learning to Drive via Asymmetric Self-Play | Chris Zhang et.al. | 2409.18218v1 | null |
2024-09-26 | Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Jing He et.al. | 2409.18124v1 | null |
2024-09-26 | GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Shangyi Luo et.al. | 2409.18084v1 | null |
2024-09-26 | FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction | Runze He et.al. | 2409.18071v1 | null |
2024-09-26 | DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving | Dingrui Wang et.al. | 2409.18053v1 | link |
2024-09-26 | IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning | Soeun Lee et.al. | 2409.18046v1 | link |
2024-09-26 | Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy | Owen Henkel et.al. | 2409.17904v1 | null |
2024-09-26 | Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models | Hui-Po Wang et.al. | 2409.17836v1 | link |
2024-09-27 | Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model | Nilanjan Sinhababu et.al. | 2409.17745v2 | null |
2024-09-26 | AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status | Jinghao Zhang et.al. | 2409.17740v1 | null |
2024-09-26 | Robust Ladder Climbing with a Quadrupedal Robot | Dylan Vogel et.al. | 2409.17731v1 | null |
2024-09-25 | Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Bowen Zhao et.al. | 2409.17080v1 | link |
2024-09-25 | ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis | Fangshuo Zhou et.al. | 2409.17049v1 | link |
2024-09-25 | Detecting Temporal Ambiguity in Questions | Bhawna Piryani et.al. | 2409.17046v1 | link |
2024-09-25 | Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness | Shixuan Ma et.al. | 2409.16914v1 | link |
2024-09-25 | Pruning Multilingual Large Language Models for Multilingual Inference | Hwichan Kim et.al. | 2409.16911v1 | link |
2024-09-25 | Multi-objective Evolution of Heuristic Using Large Language Model | Shunyu Yao et.al. | 2409.16867v1 | null |
2024-09-25 | Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation | Yulin Wang et.al. | 2409.16818v1 | link |
2024-09-25 | Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification | Ming Li et.al. | 2409.16718v1 | link |
2024-09-24 | Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval | Qiuhai Zeng et.al. | 2409.16497v1 | null |
2024-09-24 | BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes | Kasun Weerakoon et.al. | 2409.16484v1 | null |
2024-09-24 | Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation | Homanga Bharadhwaj et.al. | 2409.16283v1 | null |
2024-09-24 | Fields of The World: A Machine Learning Benchmark Dataset For Global Agricultural Field Boundary Segmentation | Hannah Kerner et.al. | 2409.16252v1 | link |
2024-09-24 | Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech | Yunji Chu et.al. | 2409.16203v1 | null |
2024-09-24 | HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detection | Yuqi Ma et.al. | 2409.16136v1 | null |
2024-09-24 | Evaluation of state-of-the-art ASR Models in Child-Adult Interactions | Aditya Ashvin et.al. | 2409.16135v1 | null |
2024-09-24 | Bridging Environments and Language with Rendering Functions and Vision-Language Models | Theo Cachet et.al. | 2409.16024v1 | null |
2024-09-24 | Finetuning LLMs for Comparative Assessment Tasks | Vatsal Raina et.al. | 2409.15979v1 | null |
2024-09-24 | StyleSinger 2: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control | Yu Zhang et.al. | 2409.15977v1 | link |
2024-09-24 | SLIMER-IT: Zero-Shot NER on Italian Language | Andrew Zamai et.al. | 2409.15933v1 | link |
2024-09-24 | Zero-Shot Detection of AI-Generated Images | Davide Cozzolino et.al. | 2409.15875v1 | null |
2024-09-24 | Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models | Sijing Chen et.al. | 2409.12139v3 | null |
2024-09-18 | IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition | Rui Liu et.al. | 2409.12092v1 | null |
2024-09-18 | Efficacy of Synthetic Data as a Benchmark | Gaurav Maheshwari et.al. | 2409.11968v1 | null |
2024-09-18 | GauTOAO: Gaussian-based Task-Oriented Affordance of Objects | Jiawen Wang et.al. | 2409.11941v1 | null |
2024-09-18 | LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Foundation Models | Amaia Cardiel et.al. | 2409.11919v1 | null |
2024-09-18 | ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images | Abhinaw Jagtap et.al. | 2409.11874v1 | null |
2024-09-18 | One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation | Finn Lukas Busch et.al. | 2409.11764v1 | null |
2024-09-18 | Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation | Haohan Guo et.al. | 2409.11630v1 | null |
2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512v1 | null |
2024-09-17 | Enriching Datasets with Demographics through Large Language Models: What's in a Name? | Khaled AlNuaimi et.al. | 2409.11491v1 | null |
2024-09-17 | Says Who? Effective Zero-Shot Annotation of Focalization | Rebecca M. M. Hicke et.al. | 2409.11390v1 | null |
2024-09-17 | Towards Time Series Reasoning with LLMs | Winnie Chow et.al. | 2409.11376v1 | null |
2024-09-17 | Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | Gonzalo Martin Garcia et.al. | 2409.11355v1 | link |
2024-09-17 | Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora | Francesco Nespoli et.al. | 2409.11107v1 | null |
2024-09-17 | TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation | Yansong Wu et.al. | 2409.11047v1 | null |
2024-09-18 | GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models | Hanjun Luo et.al. | 2409.11022v2 | link |
2024-09-17 | Relative Representations: Topological and Geometric Perspectives | Alejandro García-Castellanos et.al. | 2409.10967v1 | link |
2024-09-17 | Multi-Floor Zero-Shot Object Navigation Policy | Lingfeng Zhang et.al. | 2409.10906v1 | null |
2024-09-17 | Implicit Reasoning in Deep Time Series Forecasting | Willa Potosnak et.al. | 2409.10840v1 | null |
2024-09-18 | Context-Dependent Interactable Graphical User Interface Element Detection for Spatial Computing Applications | Shuqing Li et.al. | 2409.10811v2 | null |
2024-09-16 | Do Pre-trained Vision-Language Models Encode Object States? | Kaleb Newman et.al. | 2409.10488v1 | null |
2024-09-16 | Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation | Hanbo Bi et.al. | 2409.10389v1 | null |
2024-09-16 | beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems | Vojtěch Vančura et.al. | 2409.10309v1 | link |
2024-09-16 | SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps | Jakub Gregorek et.al. | 2409.10202v1 | null |
2024-09-16 | SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting | Mohammad Nomaan Qureshi et.al. | 2409.10161v1 | null |
2024-09-16 | StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion | Yinghao Aaron Li et.al. | 2409.10058v1 | null |
2024-09-16 | A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models | Ryandhimas E. Zezario et.al. | 2409.09914v1 | null |
2024-09-15 | GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion | Vitor Guizilini et.al. | 2409.09896v1 | null |
2024-09-15 | PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics | Yuxuan Liu et.al. | 2409.09811v1 | null |
2024-09-15 | Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models | Yuan-Hong Liao et.al. | 2409.09788v1 | null |
2024-09-13 | Data Efficient Child-Adult Speaker Diarization with Simulated Conversations | Anfeng Xu et.al. | 2409.08881v1 | link |
2024-09-13 | A RAG Approach for Generating Competency Questions in Ontology Engineering | Xueli Pan et.al. | 2409.08820v1 | null |
2024-09-13 | Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling | Jialu Tang et.al. | 2409.08788v1 | null |
2024-09-13 | HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit | Yang Li et.al. | 2409.08767v1 | null |
2024-09-13 | DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset | Jiawei Du et.al. | 2409.08731v1 | link |
2024-09-13 | Eir: Thai Medical Large Language Models | Yutthakorn Thiprak et.al. | 2409.08523v1 | null |
2024-09-13 | GroundingBooth: Grounding Text-to-Image Customization | Zhexiao Xiong et.al. | 2409.08520v1 | null |
2024-09-13 | Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection | Haoxuan Wang et.al. | 2409.08513v1 | link |
2024-09-12 | SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer | Helin Wang et.al. | 2409.08425v1 | link |
2024-09-12 | Sequential Discrete Action Selection via Blocking Conditions and Resolutions | Liam Merz Hoffmeister et.al. | 2409.08410v1 | null |
2024-09-12 | DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors | Thomas Hanwen Zhu et.al. | 2409.08278v1 | null |
2024-09-12 | AnySkin: Plug-and-play Skin Sensing for Robotic Touch | Raunaq Bhirangi et.al. | 2409.08276v1 | null |
2024-09-12 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner et.al. | 2409.08185v1 | link |
2024-09-12 | The CLC-UKET Dataset: Benchmarking Case Outcome Prediction for the UK Employment Tribunal | Huiyuan Xie et.al. | 2409.08098v1 | null |
2024-09-12 | EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance | Zicheng Duan et.al. | 2409.08091v1 | link |
2024-09-12 | Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations | Wangjin Zhou et.al. | 2409.08039v1 | null |
2024-09-12 | From Explanations to Action: A Zero-Shot, Theory-Driven LLM Framework for Student Performance Feedback | Vinitra Swamy et.al. | 2409.08027v1 | null |
2024-09-11 | Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models | Matthieu Dubois et.al. | 2409.07615v1 | null |
2024-09-11 | Minimizing Embedding Distortion for Robust Out-of-Distribution Performance | Tom Shaked et.al. | 2409.07582v1 | null |
2024-09-11 | SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis | Helin Wang et.al. | 2409.07556v1 | link |
2024-09-11 | Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence | Luo Ji et.al. | 2409.07341v1 | null |
2024-09-11 | Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | Weixi Weng et.al. | 2409.07331v1 | null |
2024-09-11 | PaveSAM Segment Anything for Pavement Distress | Neema Jakisa Owor et.al. | 2409.07295v1 | null |
2024-09-11 | A Fine-grained Sentiment Analysis of App Reviews using Large Language Models: An Evaluation Study | Faiz Ali Shah et.al. | 2409.07162v1 | link |
2024-09-11 | Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment | Tien-Hong Lo et.al. | 2409.07151v1 | null |
2024-09-11 | Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations | Keumgang Cha et.al. | 2409.07048v1 | null |
2024-09-10 | ExIQA: Explainable Image Quality Assessment Using Distortion Attributes | Sepehr Kazemi Ranjbar et.al. | 2409.06853v1 | null |
2024-09-10 | Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts | Eleftheria Briakou et.al. | 2409.06790v1 | null |
2024-09-11 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | Danli Shi et.al. | 2409.06644v2 | null |
2024-09-10 | DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots | Maria Bauza et.al. | 2409.06613v1 | null |
2024-09-10 | An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition | Yi-Cheng Wang et.al. | 2409.06468v1 | null |
2024-09-10 | SpeechTaxi: On Multilingual Semantic Speech Classification | Lennart Keller et.al. | 2409.06372v1 | null |
2024-09-10 | MAGDA: Multi-agent guideline-driven diagnostic assistance | David Bani-Harouni et.al. | 2409.06351v1 | null |
2024-09-10 | PharmacoMatch: Efficient 3D Pharmacophore Screening through Neural Subgraph Matching | Daniel Rose et.al. | 2409.06316v1 | null |
2024-09-10 | Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings | Sakshi Deo Shukla et.al. | 2409.06222v1 | link |
2024-09-10 | Revisiting Prompt Pretraining of Vision-Language Models | Zhenyuan Chen et.al. | 2409.06166v1 | null |
2024-09-09 | Differentiable programming across the PDE and Machine Learning barrier | Nacime Bouziani et.al. | 2409.06085v1 | null |
2024-09-09 | FairHome: A Fair Housing and Fair Lending Dataset | Anusha Bagalkotkar et.al. | 2409.05990v1 | null |
2024-09-09 | Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments | Haritheja Etukuru et.al. | 2409.05865v1 | link |
2024-09-10 | Evaluating Multiview Object Consistency in Humans and Image Models | Tyler Bonnen et.al. | 2409.05862v2 | link |
2024-09-09 | A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation | Qi Jiang et.al. | 2409.05809v1 | null |
2024-09-09 | AnomalyCD: A benchmark for Earth anomaly change detection with high-resolution and time-series observations | Jingtao Li et.al. | 2409.05679v1 | null |
2024-09-09 | Zero-shot Outlier Detection via Prior-data Fitted Networks: Model Selection Bygone! | Yuchen Shen et.al. | 2409.05672v1 | null |
2024-09-09 | CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning | Jinwei He et.al. | 2409.05559v1 | null |
2024-09-09 | EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels | Qingyao Tian et.al. | 2409.05442v1 | link |
2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413v1 | null |
2024-09-09 | NLLB-E5: A Scalable Multilingual Retrieval Model | Arkadeep Acharya et.al. | 2409.05401v1 | null |
2024-09-09 | IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Ashwin Sankar et.al. | 2409.05356v1 | link |
2024-09-06 | FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning | Yunhao Bai et.al. | 2409.04298v1 | link |
2024-09-06 | Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering | Jan Hofmann et.al. | 2409.04122v1 | null |
2024-09-06 | UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity | Yicheng Fu et.al. | 2409.04081v1 | null |
2024-09-06 | AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model | Zeyu Zhang et.al. | 2409.04073v1 | link |
2024-09-06 | Refining Wikidata Taxonomy using Large Language Models | Yiwen Peng et.al. | 2409.04056v1 | link |
2024-09-05 | Deep Clustering of Remote Sensing Scenes through Heterogeneous Transfer Learning | Isaac Ray et.al. | 2409.03938v1 | null |
2024-09-05 | A deep learning approach to wall-shear stress quantification: From numerical training to zero-shot experimental application | Esther Lagemann et.al. | 2409.03933v1 | null |
2024-09-05 | Few-shot Adaptation of Medical Vision-Language Models | Fereshteh Shakeri et.al. | 2409.03868v1 | link |
2024-09-05 | View-Invariant Policy Learning via Zero-Shot Novel View Synthesis | Stephen Tian et.al. | 2409.03685v1 | null |
2024-09-05 | Text-Guided Mixup Towards Long-Tailed Image Categorization | Richard Franklin et.al. | 2409.03583v1 | link |
2024-09-05 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | Xi Chen et.al. | 2409.03525v1 | null |
2024-09-05 | Have Large Vision-Language Models Mastered Art History? | Ombretta Strafforello et.al. | 2409.03521v1 | null |
2024-09-05 | RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning | Lawrence Yunliang Chen et.al. | 2409.03403v1 | null |
2024-09-05 | Bringing the RT-1-X Foundation Model to a SCARA robot | Jonathan Salzer et.al. | 2409.03299v1 | null |
2024-09-05 | LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Henrique Da Silva Gameiro et.al. | 2409.03291v1 | link |
2024-09-05 | iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models | Yassir Lairgi et.al. | 2409.03284v1 | link |
2024-09-05 | FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications | Hao-Han Guo et.al. | 2409.03283v1 | null |
2024-09-04 | Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection | Kaiqing Lin et.al. | 2409.02664v1 | null |
2024-09-04 | Evaluation Study on SAM 2 for Class-agnostic Instance-level Segmentation | Tiantian Zhang et.al. | 2409.02567v1 | link |
2024-09-04 | StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models | Wen Li et.al. | 2409.02543v1 | link |
2024-09-04 | Language is Scary when Over-Analyzed: Unpacking Implied Misogynistic Reasoning with Argumentation Theory-Driven Prompts | Arianna Muti et.al. | 2409.02519v1 | null |
2024-09-04 | Dispelling Four Challenges in Inertial Motion Tracking with One Recurrent Inertial Graph-based Estimator (RING) | Simon Bachhuber et.al. | 2409.02502v1 | null |
2024-09-04 | Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization | Cho-Ying Wu et.al. | 2409.02486v1 | null |
2024-09-04 | Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning | Guanwen Xie et.al. | 2409.02428v1 | null |
2024-09-03 | Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems | Sanjita Prajapati et.al. | 2409.02278v1 | null |
2024-09-05 | LinFusion: 1 GPU, 1 Minute, 16K Image | Songhua Liu et.al. | 2409.02097v2 | link |
2024-09-03 | DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos | Wenbo Hu et.al. | 2409.02095v1 | link |
2024-08-30 | Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding | Gueter Josmy Faure et.al. | 2408.17443v1 | link |
2024-08-30 | VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters | Mouxiang Chen et.al. | 2408.17253v1 | link |
2024-08-30 | Reasoning AI Performance Degradation in 6G Networks with Large Language Models | Liming Huang et.al. | 2408.17097v1 | null |
2024-08-30 | Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning | Fengyuan Dai et.al. | 2408.17083v1 | null |
2024-08-29 | Exploring Multiple Strategies to Improve Multilingual Coreference Resolution in CorefUD | Ondřej Pražák et.al. | 2408.16893v1 | link |
2024-08-29 | Fluent and Accurate Image Captioning with a Self-Trained Reward Model | Nicholas Moratelli et.al. | 2408.16827v1 | null |
2024-08-29 | PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning | Noor Hussein et.al. | 2408.16769v1 | link |
2024-08-29 | SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners | Ziyu Guo et.al. | 2408.16768v1 | link |
2024-08-29 | Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge | Beidi Dong et.al. | 2408.16749v1 | null |
2024-08-29 | LLMs generate structurally realistic social networks but overestimate political homophily | Serina Chang et.al. | 2408.16629v1 | link |
2024-08-29 | Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning | Zhengqing Gao et.al. | 2408.16486v1 | link |
2024-08-29 | WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding | Mohan Li et.al. | 2408.16423v1 | null |
2024-08-29 | Text-Enhanced Zero-Shot Action Recognition: A training-free approach | Massimo Bosetti et.al. | 2408.16412v1 | null |
2024-08-29 | Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning | Luyao Tang et.al. | 2408.16310v1 | link |
2024-08-29 | Training-free Video Temporal Grounding using Large-scale Pre-trained Models | Minghang Zheng et.al. | 2408.16219v1 | link |
2024-08-28 | CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases | Yannis Chronis et.al. | 2408.16170v1 | null |
2024-08-29 | Spatio-Temporal Context Prompting for Zero-Shot Action Detection | Wei-Jhe Huang et.al. | 2408.15996v2 | null |
2024-08-28 | Multi-modal Adversarial Training for Zero-Shot Voice Cloning | John Janiczek et.al. | 2408.15916v1 | null |
2024-08-28 | Visual Prompt Engineering for Medical Vision Language Models in Radiology | Stefan Denner et.al. | 2408.15802v1 | null |
2024-08-28 | Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | Huachuan Qiu et.al. | 2408.15787v1 | link |
2024-08-28 | LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models | Max Ploner et.al. | 2408.15729v1 | null |
2024-08-28 | Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas | Fabio Quattrini et.al. | 2408.15660v1 | link |
2024-08-28 | Learning dynamics models for velocity estimation in autonomous racing | Jan Węgrzynowski et.al. | 2408.15610v1 | null |
2024-08-28 | Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation | Ziqian Ning et.al. | 2408.15474v1 | null |
2024-08-28 | Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance | Kunpeng Wang et.al. | 2408.15063v2 | link |
2024-08-26 | MONAS: Efficient Zero-Shot Neural Architecture Search for MCUs | Ye Qiao et.al. | 2408.15034v1 | null |
2024-08-27 | Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning | Sakhinana Sagar Srinivas et.al. | 2408.14964v1 | null |
2024-08-27 | ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning | Wenjin Hou et.al. | 2408.14868v1 | null |
2024-08-27 | Points2Plans: From Point Clouds to Long-Horizon Plans with Composable Relational Dynamics | Yixuan Huang et.al. | 2408.14769v1 | null |
2024-08-26 | Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning | Xinyang Gu et.al. | 2408.14472v1 | link |
2024-08-28 | Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study | Liuchang Xu et.al. | 2408.14438v2 | null |
2024-08-26 | Uncertainties of Latent Representations in Computer Vision | Michael Kirchhof et.al. | 2408.14281v1 | null |
2024-08-26 | Self-supervised Speech Representations Still Struggle with African American Vernacular English | Kalvin Chang et.al. | 2408.14262v1 | link |
2024-08-26 | AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework | Jie Feng et.al. | 2408.13986v1 | link |
2024-08-25 | OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation | Muhammad Rameez ur Rahman et.al. | 2408.13936v1 | link |
2024-08-25 | Infrared Domain Adaptation with Zero-Shot Quantization | Burak Sevsay et.al. | 2408.13925v1 | null |
2024-08-25 | LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback | Tanushree Banerjee et.al. | 2408.13915v1 | null |
2024-08-25 | Splatt3R: Zero-shot Gaussian Splatting from Uncalibarated Image Pairs | Brandon Smart et.al. | 2408.13912v1 | null |
2024-08-25 | Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization | Jia-Run Du et.al. | 2408.13777v1 | link |
2024-08-23 | On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning | Tiago Tavares et.al. | 2408.13068v1 | null |
2024-08-23 | WildFusion: Individual Animal Identification with Calibrated Similarity Fusion | Vojtěch Cermak et.al. | 2408.12934v1 | link |
2024-08-23 | Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Yichi Zhang et.al. | 2408.12889v1 | link |
2024-08-23 | Predicting Affective States from Screen Text Sentiment | Songyan Teng et.al. | 2408.12844v1 | null |
2024-08-23 | Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery | Zhenyuan Yang et.al. | 2408.12821v1 | null |
2024-08-23 | VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models | Purushothaman Natarajan et.al. | 2408.12808v1 | link |
2024-08-23 | Cap2Sum: Learning to Summarize Videos by Generating Captions | Cairong Zhao et.al. | 2408.12800v1 | null |
2024-08-22 | Segment Anything Model for Grain Characterization in Hard Drive Design | Kai Nichols et.al. | 2408.12732v1 | null |
2024-08-22 | Cell-ontology guided transcriptome foundation model | Xinyu Yuan et.al. | 2408.12373v1 | null |
2024-08-22 | SAM-SP: Self-Prompting Makes SAM Great Again | Chunpeng Zhou et.al. | 2408.12364v1 | null |
2024-08-22 | Adapt CLIP as Aggregation Instructor for Image Dehazing | Xiaozhe Zhang et.al. | 2408.12317v1 | null |
2024-08-22 | Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations | Kai Tzu-iunn Ong et.al. | 2408.12315v1 | null |
2024-08-23 | Tactile-Morph Skills: Energy-Based Control Meets Data-Driven Learning | Anran Zhang et.al. | 2408.12285v2 | null |
2024-08-22 | Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning | Ziming Liu et.al. | 2408.12253v1 | null |
2024-08-22 | LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction | Aishik Nagar et.al. | 2408.12249v1 | null |
2024-08-22 | PRG: Prompt-Based Distillation Without Annotation via Proxy Relational Graph | Yijin Xu et.al. | 2408.12248v1 | null |
2024-08-22 | OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion | Guoting Wei et.al. | 2408.12246v1 | link |
2024-08-23 | Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment | Kun Luo et.al. | 2408.12194v2 | null |
2024-08-21 | Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction | Anthony GX-Chen et.al. | 2408.11816v1 | null |
2024-08-21 | EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Xiuwei Xu et.al. | 2408.11811v1 | null |
2024-08-21 | Iterative Object Count Optimization for Text-to-image Diffusion Models | Oz Zafar et.al. | 2408.11721v1 | null |
2024-08-21 | Memorization In In-Context Learning | Shahriar Golchin et.al. | 2408.11546v1 | null |
2024-08-21 | Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech | Anastasia Avdeeva et.al. | 2408.11528v1 | null |
2024-08-21 | XDT-CXR: Investigating Cross-Disease Transferability in Zero-Shot Binary Classification of Chest X-Rays | Umaima Rahman et.al. | 2408.11493v1 | link |
2024-08-21 | Enabling Small Models for Zero-Shot Classification through Model Label Learning | Jia Zhang et.al. | 2408.11449v1 | null |
2024-08-21 | EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning | Bohao Xing et.al. | 2408.11424v1 | link |
2024-08-21 | Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies | Sai Koneru et.al. | 2408.11327v1 | null |
2024-08-21 | Towards Evaluating Large Language Models on Sarcasm Understanding | Yazhou Zhang et.al. | 2408.11319v1 | null |
2024-08-21 | CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network | Zijian Zhao et.al. | 2408.10919v2 | null |
2024-08-20 | ViLReF: A Chinese Vision-Language Retinal Foundation Model | Shengzhu Yang et.al. | 2408.10894v1 | link |
2024-08-20 | Open 3D World in Autonomous Driving | Xinlong Cheng et.al. | 2408.10880v1 | null |
2024-08-20 | SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS | Karl El Hajal et.al. | 2408.10771v1 | null |
2024-08-20 | Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian | Cem Üyük et.al. | 2408.10724v1 | null |
2024-08-20 | AnyGraph: Graph Foundation Model in the Wild | Lianghao Xia et.al. | 2408.10700v1 | link |
2024-08-20 | Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches | Yanjie Dong et.al. | 2408.10691v1 | null |
2024-08-20 | A Review of Human-Object Interaction Detection | Yuxiao Wang et.al. | 2408.10641v1 | null |
2024-08-20 | LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models | Yupeng Su et.al. | 2408.10631v1 | link |
2024-08-20 | Generalizable Facial Expression Recognition | Yuhang Zhang et.al. | 2408.10614v1 | link |
2024-08-19 | SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models | Anke Tang et.al. | 2408.10174v1 | link |
2024-08-19 | Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track | Feiyu Pan et.al. | 2408.10125v1 | null |
2024-08-19 | GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization | Ran Liu et.al. | 2408.10115v1 | link |
2024-08-19 | Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision | Zhijun Jia et.al. | 2408.10096v1 | null |
2024-08-19 | CLIPCleaner: Cleaning Noisy Labels with CLIP | Chen Feng et.al. | 2408.10012v1 | link |
2024-08-19 | Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype | Yadong Lu et.al. | 2408.09984v1 | null |
2024-08-19 | Caption-Driven Explorations: Aligning Image and Text Embeddings through Human-Inspired Foveated Vision | Dario Zanca et.al. | 2408.09948v1 | null |
2024-08-19 | DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery | Corentin Dumery et.al. | 2408.09928v1 | null |
2024-08-19 | SAM-UNet:Enhancing Zero-Shot Segmentation of SAM for Universal Medical Images | Sihan Yang et.al. | 2408.09886v1 | link |
2024-08-19 | Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving | Jun Yan et.al. | 2408.09839v1 | link |
2024-08-16 | DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models | Eman Ali et.al. | 2408.08855v1 | null |
2024-08-16 | ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis | Yubao Zhao et.al. | 2408.08849v1 | link |
2024-08-16 | EasyRec: Simple yet Effective Language Models for Recommendation | Xubin Ren et.al. | 2408.08821v1 | link |
2024-08-16 | ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language | Yongkang Liu et.al. | 2408.08724v1 | null |
2024-08-16 | TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning | Miaoge Li et.al. | 2408.08703v1 | null |
2024-08-16 | A Mean Field Ansatz for Zero-Shot Weight Transfer | Xingyuan Chen et.al. | 2408.08681v1 | null |
2024-08-16 | GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model | Xavier Riley et.al. | 2408.08653v1 | null |
2024-08-16 | Persona is a Double-edged Sword: Enhancing the Zero-shot Reasoning by Ensembling the Role-playing and Neutral Prompts | Junseok Kim et.al. | 2408.08631v1 | null |
2024-08-16 | Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation | Tri Ton et.al. | 2408.08591v1 | null |
2024-08-16 | CommunityKG-RAG: Leveraging Community Structures in Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking | Rong-Ching Chang et.al. | 2408.08535v1 | null |
2024-08-15 | ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws | Ruihang Li et.al. | 2408.08310v1 | null |
2024-08-16 | Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion | Abeer Aldayel et.al. | 2408.08212v2 | null |
2024-08-15 | Navigating Data Scarcity using Foundation Models: A Benchmark of Few-Shot and Zero-Shot Learning Approaches in Medical Imaging | Stefano Woerner et.al. | 2408.08058v1 | link |
2024-08-15 | LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning | Jiajie Li et.al. | 2408.07981v1 | null |
2024-08-15 | Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training | Yiming Li et.al. | 2408.07919v1 | link |
2024-08-15 | DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions | Ryosuke Korekata et.al. | 2408.07910v1 | null |
2024-08-15 | A Spitting Image: Modular Superpixel Tokenization in Vision Transformers | Marius Aasan et.al. | 2408.07680v2 | link |
2024-08-14 | Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques | Ivory Yang et.al. | 2408.07676v1 | null |
2024-08-14 | SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning | Jianye Xu et.al. | 2408.07644v1 | link |
2024-08-14 | Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health | Yongquan Hu et.al. | 2408.07313v1 | null |
2024-08-14 | MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing | Yongquan Hu et.al. | 2408.07311v1 | null |
2024-08-14 | GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval | Zechen Bai et.al. | 2408.07249v1 | null |
2024-08-13 | Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents | Pranav Putta et.al. | 2408.07199v1 | null |
2024-08-13 | PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping | Subash Khanal et.al. | 2408.07050v1 | link |
2024-08-15 | Prompt-Based Segmentation at Multiple Resolutions and Lighting Conditions using Segment Anything Model 2 | Osher Rafaeli et.al. | 2408.06970v2 | null |
2024-08-13 | How Aligned are Human Chart Takeaways and LLM Predictions? A Case Study on Bar Charts with Varying Layouts | Huichen Will Wang et.al. | 2408.06837v1 | null |
2024-08-13 | PRESENT: Zero-Shot Text-to-Prosody Control | Perry Lam et.al. | 2408.06827v1 | link |
2024-08-13 | Visual Neural Decoding via Improved Visual-EEG Semantic Consistency | Hongzhou Chen et.al. | 2408.06788v1 | null |
2024-08-13 | Do Vision-Language Foundational models show Robust Visual Perception? | Shivam Chandhok et.al. | 2408.06781v1 | link |
2024-08-13 | DC3DO: Diffusion Classifier for 3D Objects | Nursena Koprucu et.al. | 2408.06693v1 | link |
2024-08-13 | CROME: Cross-Modal Adapters for Efficient Multimodal LLM | Sayna Ebrahimi et.al. | 2408.06610v1 | null |
2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481v1 | link |
2024-08-12 | From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model | Athulya Sundaresan Geetha et.al. | 2408.06305v1 | null |
2024-08-12 | Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM | Trisha Das et.al. | 2408.06285v1 | null |
2024-08-12 | A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | Sampath Rajapaksha et.al. | 2408.06272v1 | null |
2024-08-12 | 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) | Jaydeep Rade et.al. | 2408.06244v1 | null |
2024-08-12 | Zero-shot 3D Segmentation of Abdominal Organs in CT Scans Using Segment Anything Model 2: Adapting Video Tracking Capabilities for 3D Medical Imaging | Yosuke Yamagishi et.al. | 2408.06170v1 | null |
2024-08-12 | OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning | Mushui Liu et.al. | 2408.06158v1 | link |
2024-08-12 | Text2Interaction: Establishing Safe and Preferable Human-Robot Interaction | Jakob Thumm et.al. | 2408.06105v1 | link |
2024-08-12 | Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces | Junrui Zhang et.al. | 2408.06083v1 | null |
2024-08-12 | Perceptual Similarity for Measuring Decision-Making Style and Policy Diversity in Games | Chiu-Chou Lin et.al. | 2408.06051v1 | link |
2024-08-12 | Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection | Yixin Guo et.al. | 2408.05974v1 | link |
2024-08-09 | Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement | Weiqing Yang et.al. | 2408.05006v1 | null |
2024-08-09 | SEA-SQL: Semantic-Enhanced Text-to-SQL with Adaptive Refinement | Chaofan Li et.al. | 2408.04919v1 | null |
2024-08-09 | Towards a Generative Approach for Emotion Detection and Reasoning | Ankita Bhaumik et.al. | 2408.04906v1 | null |
2024-08-09 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | Mengcheng Lan et.al. | 2408.04883v1 | link |
2024-08-09 | On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey | Jingcai Guo et.al. | 2408.04879v1 | link |
2024-08-09 | ChatGPT Meets Iris Biometrics | Parisa Farmanifard et.al. | 2408.04868v1 | null |
2024-08-09 | An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting | Rui Cao et.al. | 2408.04867v1 | link |
2024-08-09 | One Shot is Enough for Sequential Infrared Small Target Segmentation | Bingbing Dan et.al. | 2408.04823v1 | link |
2024-08-09 | FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers | Joshua Nathaniel Williams et.al. | 2408.04816v1 | link |
2024-08-08 | Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 | Andrew Seohwan Yu et.al. | 2408.04762v1 | null |
2024-08-08 | Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics | Ruining Li et.al. | 2408.04631v1 | null |
2024-08-08 | SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation | Jieming Yu et.al. | 2408.04593v1 | null |
2024-08-08 | SCENE: Evaluating Explainable AI Techniques Using Soft Counterfactuals | Haoran Zheng et.al. | 2408.04575v1 | null |
2024-08-08 | Conversational Prompt Engineering | Liat Ein-Dor et.al. | 2408.04560v1 | null |
2024-08-08 | Depth Any Canopy: Leveraging Depth Foundation Models for Canopy Height Estimation | Daniele Rege Cambrin et.al. | 2408.04523v1 | link |
2024-08-08 | Model-Based Transfer Learning for Contextual Reinforcement Learning | Jung-Hoon Cho et.al. | 2408.04498v1 | link |
2024-08-08 | Towards Synergistic Deep Learning Models for Volumetric Cirrhotic Liver Segmentation in MRIs | Vandan Gorade et.al. | 2408.04491v1 | null |
2024-08-08 | KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination | Yin Gu et.al. | 2408.04336v1 | null |
2024-08-08 | Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP | François Remy et.al. | 2408.04303v1 | link |
2024-08-08 | Learning to Rewrite: Generalized LLM-Generated Text Detection | Wei Hao et.al. | 2408.04237v1 | null |
2024-08-07 | Achieving Human Level Competitive Robot Table Tennis | David B. D'Ambrosio et.al. | 2408.03906v1 | null |
2024-08-07 | Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond | Beomseok Lee et.al. | 2408.03900v1 | link |
2024-08-07 | Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning | Zi-Yi Dou et.al. | 2408.03567v1 | null |
2024-08-07 | Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving | Amirhosein Chahe et.al. | 2408.03516v1 | null |
2024-08-07 | Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval | Iman Azimi et.al. | 2408.02964v2 | link |
2024-08-06 | Few-shot Scooping Under Domain Shift via Simulated Maximal Deployment Gaps | Yifan Zhu et.al. | 2408.02949v1 | null |
2024-08-05 | Interactive 3D Medical Image Segmentation with SAM 2 | Chuyun Shen et.al. | 2408.02635v1 | link |
2024-08-05 | Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection | Ting Lei et.al. | 2408.02484v1 | link |
2024-08-07 | TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments | Daeun Song et.al. | 2408.02454v2 | null |
2024-08-05 | Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages | Carlos Mullov et.al. | 2408.02290v1 | null |
2024-08-05 | Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes | Dimitris Angelis et.al. | 2408.02275v1 | null |
2024-08-05 | Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts | Andong Tan et.al. | 2408.02265v1 | null |
2024-08-05 | Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets | Lucas Choi et.al. | 2408.02244v1 | null |
2024-08-05 | Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource Settings | Md. Arid Hasan et.al. | 2408.02237v1 | null |
2024-08-05 | ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning | Yuxuan Wang et.al. | 2408.02210v1 | null |
2024-08-05 | Large-scale Deployment of Vision-based Tactile Sensors on Multi-fingered Grippers | Meng Wang et.al. | 2408.02206v1 | null |
2024-08-02 | Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features | Mengyu Bu et.al. | 2408.01394v1 | link |
2024-08-02 | Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation | Jheng-Hong Yang et.al. | 2408.01363v1 | null |
2024-08-02 | Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks | Anders Giovanni Møller et.al. | 2408.01346v1 | null |
2024-08-02 | Out-Of-Distribution Detection for Audio-visual Generalized Zero-Shot Learning: A General Framework | Liuyuan Wen et.al. | 2408.01284v1 | link |
2024-08-02 | HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling | YiFan Hao et.al. | 2408.01230v1 | link |
2024-08-05 | Agentic LLM Workflows for Generating Patient-Friendly Medical Reports | Malavikha Sudarshan et.al. | 2408.01112v2 | link |
2024-08-02 | An Encoding--Searching Separation Perspective on Bi-Encoder Neural Search | Hung-Nghiep Tran et.al. | 2408.01094v1 | null |
2024-08-02 | UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents | Yi Tu et.al. | 2408.01038v1 | null |
2024-08-01 | Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper) | Bin Han et.al. | 2408.00932v1 | null |
2024-08-01 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | Siyu Jiao et.al. | 2408.00744v1 | link |
2024-08-01 | Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | Guangzhi Xiong et.al. | 2408.00727v1 | link |
2024-08-01 | SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Yichen Lu et.al. | 2408.00624v1 | link |
2024-08-01 | A new approach for encoding code and assisting code understanding | Mengdan Fan et.al. | 2408.00521v1 | null |
2024-08-01 | GalleryGPT: Analyzing Paintings with Large Multimodal Models | Yi Bin et.al. | 2408.00491v1 | link |
2024-08-01 | SF-TIM: A Simple Framework for Enhancing Quadrupedal Robot Jumping Agility by Combining Terrain Imagination and Measurement | Ze Wang et.al. | 2408.00486v1 | null |
2024-08-01 | Few-shot Defect Image Generation based on Consistency Modeling | Qingfeng Shi et.al. | 2408.00372v1 | link |
2024-08-01 | IN-Sight: Interactive Navigation through Sight | Philipp Schoch et.al. | 2408.00343v1 | null |
2024-07-31 | Open-Vocabulary Audio-Visual Semantic Segmentation | Ruohao Guo et.al. | 2407.21721v1 | null |
2024-07-31 | Zero-Shot Cross-Domain Dialogue State Tracking via Dual Low-Rank Adaptation | Xiang Luo et.al. | 2407.21633v1 | link |
2024-07-31 | EZSR: Event-based Zero-Shot Recognition | Yan Yang et.al. | 2407.21616v1 | null |
2024-07-31 | Fine-gained Zero-shot Video Sampling | Dengsheng Chen et.al. | 2407.21475v1 | null |
2024-07-31 | Generalized Tampered Scene Text Detection in the era of Generative AI | Chenfan Qu et.al. | 2407.21422v1 | null |
2024-07-31 | Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs | Elan Markowitz et.al. | 2407.21358v1 | link |
2024-07-31 | DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations | Dongwon Son et.al. | 2407.21267v1 | null |
2024-07-30 | Learning Stable Robot Grasping with Transformer-based Tactile Control Policies | En Yen Puang et.al. | 2407.21172v1 | link |
2024-07-30 | Zero Shot Health Trajectory Prediction Using Transformer | Pawel Renc et.al. | 2407.21124v1 | link |
2024-07-30 | Prompting Encoder Models for Zero-Shot Classification: A Cross-Domain Study in Italian | Serena Auriemma et.al. | 2407.20654v1 | null |
2024-07-30 | Pruning Large Language Models with Semi-Structural Adaptive Sparse Training | Weiyu Huang et.al. | 2407.20584v1 | link |
2024-07-29 | Evaluating Large Language Models for automatic analysis of teacher simulations | David de-Fitero-Dominguez et.al. | 2407.20360v1 | null |
2024-07-29 | Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing | Ekaterina Iakovleva et.al. | 2407.20232v1 | null |
2024-07-29 | QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval | Hongming Tan et.al. | 2407.20207v1 | null |
2024-07-29 | Diffusion Feedback Helps CLIP See Better | Wenxuan Wang et.al. | 2407.20171v1 | link |
2024-07-29 | Generative Diffusion Model Bootstraps Zero-shot Classification of Fetal Ultrasound Images In Underrepresented African Populations | Fangyijie Wang et.al. | 2407.20072v1 | link |
2024-07-29 | Leveraging Foundation Models for Zero-Shot IoT Sensing | Dinghao Xue et.al. | 2407.19893v1 | link |
2024-07-29 | Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model | Zhenyu Tao et.al. | 2407.19765v1 | null |
2024-07-29 | Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation | Manish Bhattarai et.al. | 2407.19619v1 | null |
2024-07-29 | AgEval: A Benchmark for Zero-Shot and Few-Shot Plant Stress Phenotyping with Multimodal LLMs | Muhammad Arbab Arshad et.al. | 2407.19617v1 | null |
2024-07-28 | XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training | Biao Wu et.al. | 2407.19546v1 | link |
2024-07-28 | Motamot: A Dataset for Revealing the Supremacy of Large Language Models over Transformer Models in Bengali Political Sentiment Analysis | Fatema Tuj Johora Faria et.al. | 2407.19528v1 | link |
2024-07-26 | Automatic Detection of Moral Values in Music Lyrics | Vjosa Preniqi et.al. | 2407.18787v1 | link |
2024-07-26 | Adversarial Robustification via Text-to-Image Diffusion Models | Daewon Choi et.al. | 2407.18658v1 | link |
2024-07-29 | Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks | Mahmoud Salhab et.al. | 2407.18571v2 | null |
2024-07-26 | Is larger always better? Evaluating and prompting large language models for non-generative medical tasks | Yinghao Zhu et.al. | 2407.18525v1 | link |
2024-07-26 | Lensless fiber endomicroscopic phase imaging with speckle-conditioned diffusion model | Zhaoqing Chen et.al. | 2407.18456v1 | null |
2024-07-26 | HybridDepth: Robust Depth Fusion for Mobile AR by Leveraging Depth from Focus and Single-Image Priors | Ashkan Ganj et.al. | 2407.18443v1 | link |
2024-07-25 | HDL-GPT: High-Quality HDL is All You Need | Bhuvnesh Kumar et.al. | 2407.18423v1 | null |
2024-07-25 | Adapting Mouse Pathological Model to Human Glomerular Lesion Segmentation | Lining Yu et.al. | 2407.18390v1 | null |
2024-07-25 | Robust Claim Verification Through Fact Detection | Nazanin Jafari et.al. | 2407.18367v1 | link |
2024-07-25 | SSTD: Stripe-Like Space Target Detection using Single-Point Supervision | Zijian Zhu et.al. | 2407.18097v1 | null |
2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062v1 | link |
2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061v1 | null |
2024-07-25 | I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition | Yannis Vasilakis et.al. | 2407.18058v1 | link |
2024-07-25 | Amortized Active Learning for Nonparametric Functions | Cen-You Li et.al. | 2407.17992v1 | null |
2024-07-25 | BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation | Xiang Zhang et.al. | 2407.17952v1 | null |
2024-07-25 | DAM: Towards A Foundation Model for Time Series Forecasting | Luke Darlow et.al. | 2407.17880v1 | null |
2024-07-25 | Exploring Description-Augmented Dataless Intent Classification | Ruoyu Hu et.al. | 2407.17862v1 | link |
2024-07-25 | Scaling A Simple Approach to Zero-Shot Speech Recognition | Jinming Zhao et.al. | 2407.17852v1 | link |
2024-07-24 | Large Language Models for Anomaly Detection in Computational Workflows: from Supervised Fine-Tuning to In-Context Learning | Hongwei Jin et.al. | 2407.17545v1 | link |
2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398v1 | null |
2024-07-24 | Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition | Ke Bao et.al. | 2407.17344v1 | null |
2024-07-24 | Multi-label Cluster Discrimination for Visual Representation Learning | Xiang An et.al. | 2407.17331v1 | link |
2024-07-24 | DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture | Akshaya Athwale et.al. | 2407.17328v1 | null |
2024-07-24 | Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification? | Johannes Kiechle et.al. | 2407.17219v1 | link |
2024-07-24 | Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model | Jan Lehečka et.al. | 2407.17167v1 | null |
2024-07-23 | PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer | Samhita Marri et.al. | 2407.16829v1 | null |
2024-07-23 | Fusion and Cross-Modal Transfer for Zero-Shot Human Action Recognition | Abhi Kamboj et.al. | 2407.16803v1 | null |
2024-07-23 | Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions | Kai Liu et.al. | 2407.16725v1 | link |
2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615v1 | null |
2024-07-23 | Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning | Xinwei Liu et.al. | 2407.16307v1 | link |
2024-07-23 | PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment | Jiahuan Li et.al. | 2407.16222v1 | link |
2024-07-23 | No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation | Shuai Chen et.al. | 2407.16182v1 | null |
2024-07-23 | Improved Few-Shot Image Classification Through Multiple-Choice Questions | Dipika Khullar et.al. | 2407.16145v1 | null |
2024-07-22 | Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models | Raza Imam et.al. | 2407.15913v1 | link |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850v1 | link |
2024-07-22 | Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget | Vikash Sehwag et.al. | 2407.15811v1 | null |
2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795v1 | link |
2024-07-22 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning | Emanuele Frascaroli et.al. | 2407.15793v1 | link |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731v1 | null |
2024-07-23 | Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition | Jinfu Liu et.al. | 2407.15706v2 | link |
2024-07-22 | SLVideo: A Sign Language Video Moment Retrieval Framework | Gonçalo Vinagre Martins et.al. | 2407.15668v1 | null |
2024-07-23 | Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning | Xiangyan Qu et.al. | 2407.15613v2 | link |
2024-07-22 | High-flexibility reconstruction of small-scale motions in wall turbulence using a generalized zero-shot learning | Haokai Wu et.al. | 2407.15604v1 | null |
2024-07-22 | X-Recon: Learning-based Patient-specific High-Resolution CT Reconstruction from Orthogonal X-Ray Images | Yunpeng Wang et.al. | 2407.15356v1 | link |
2024-07-19 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu et.al. | 2407.14355v1 | link |
2024-07-19 | Multimodal Misinformation Detection using Large Vision-Language Models | Sahar Tahmasebi et.al. | 2407.14321v1 | null |
2024-07-19 | Foundation Models for Autonomous Robots in Unstructured Environments | Hossein Naderi et.al. | 2407.14296v1 | null |
2024-07-19 | OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Rafay Mohiuddin et.al. | 2407.14279v1 | null |
2024-07-19 | ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation | Qing Xu et.al. | 2407.14153v1 | link |
2024-07-19 | Zero-Shot Underwater Gesture Recognition | Sandipan Sarma et.al. | 2407.14103v1 | link |
2024-07-19 | Multi-modal Relation Distillation for Unified 3D Representation Learning | Huiqun Wang et.al. | 2407.14007v1 | null |
2024-07-19 | Enhancing Data-Limited Graph Neural Networks by Actively Distilling Knowledge from Large Language Models | Quan Li et.al. | 2407.13989v1 | null |
2024-07-18 | Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning | Ans Munir et.al. | 2407.13715v1 | link |
2024-07-18 | MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis | Ziming Zhong et.al. | 2407.13675v1 | link |
2024-07-18 | Robust Calibration of Large Vision-Language Adapters | Balamurali Murugesan et.al. | 2407.13588v1 | link |
2024-07-18 | Towards Zero-Shot Multimodal Machine Translation | Matthieu Futeral et.al. | 2407.13579v1 | link |
2024-07-18 | Pushing the Limits of Reactive Planning: Learning to Escape Local Minima | Isar Meijer et.al. | 2407.13530v1 | null |
2024-07-18 | INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages | Abhishek Kumar Singh et.al. | 2407.13522v1 | null |
2024-07-18 | Can Open-Source LLMs Compete with Commercial Models? Exploring the Few-Shot Performance of Current GPT Models in Biomedical Tasks | Samy Ateia et.al. | 2407.13511v1 | link |
2024-07-18 | SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders | Sheng-Wei Li et.al. | 2407.13460v1 | link |
2024-07-18 | BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models | Moon Ye-Bin et.al. | 2407.13442v1 | null |
2024-07-18 | Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols | Gertjan Burghouts et.al. | 2407.13382v1 | null |
2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642v1 | null |
2024-07-17 | Evaluating the transferability potential of deep learning models for climate downscaling | Ayush Prasad et.al. | 2407.12517v1 | null |
2024-07-17 | Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning | Mustafa Dogan et.al. | 2407.12498v1 | null |
2024-07-17 | TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | Arda Yüksel et.al. | 2407.12402v1 | link |
2024-07-17 | Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection | Zhenni Yu et.al. | 2407.12339v1 | link |
2024-07-17 | ModalChorus: Visual Probing and Alignment of Multi-modal Embeddings via Modal Fusion Map | Yilin Ye et.al. | 2407.12315v1 | link |
2024-07-17 | VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation | Zhen Qu et.al. | 2407.12276v1 | link |
2024-07-17 | Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge | Xuxiong Liu et.al. | 2407.12257v1 | null |
2024-07-17 | Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech | Haibin Wu et.al. | 2407.12229v1 | link |
2024-07-16 | Scaling Sign Language Translation | Biao Zhang et.al. | 2407.11855v1 | null |
2024-07-16 | Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection | Gaetan Lopez Latouche et.al. | 2407.11854v1 | null |
2024-07-16 | Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model | Dominik Winter et.al. | 2407.11664v1 | null |
2024-07-16 | A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | He Chang et.al. | 2407.11638v1 | null |
2024-07-16 | DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training | Guillermo Jimenez-Perez et.al. | 2407.11594v1 | null |
2024-07-16 | Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval | Yubao Tang et.al. | 2407.11504v1 | null |
2024-07-16 | Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes | Zhi Cai et.al. | 2407.11464v1 | link |
2024-07-16 | InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains | Yinzhu Quan et.al. | 2407.11384v1 | link |
2024-07-16 | Large Vision-Language Models as Emotion Recognizers in Context Awareness | Yuxuan Lei et.al. | 2407.11300v1 | null |
2024-07-16 | Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems | Yaşar Utku Alçalar et.al. | 2407.11288v1 | null |
2024-07-15 | Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation | Friedhelm Hamann et.al. | 2407.10802v1 | link |
2024-07-15 | Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education | Rui Yang et.al. | 2407.10794v1 | link |
2024-07-15 | Codebook LLMs: Adapting Political Science Codebooks for LLM Use and Adapting LLMs to Follow Codebooks | Andrew Halterman et.al. | 2407.10747v1 | null |
2024-07-15 | Anticipating Future Object Compositions without Forgetting | Youssef Zahran et.al. | 2407.10723v1 | null |
2024-07-16 | Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | Yulong Wang et.al. | 2407.10718v2 | link |
2024-07-15 | Fengyu Cai et.al. | 2407.10691v1 | link | |
2024-07-15 | OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | Yu Wang et.al. | 2407.10655v1 | link |
2024-07-16 | Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics | Yuang Zhang et.al. | 2407.10648v2 | null |
2024-07-15 | Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control | Yu-Hua Chen et.al. | 2407.10646v1 | null |
2024-07-15 | Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection | Barah Fazili et.al. | 2407.10582v1 | link |
2024-07-12 | Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting | Jinning Li et.al. | 2407.09475v1 | null |
2024-07-12 | From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation | Hanrong Shi et.al. | 2407.09191v1 | null |
2024-07-12 | STD-LLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with LLMs | Yiheng Huang et.al. | 2407.09096v1 | null |
2024-07-12 | OVExp: Open Vocabulary Exploration for Object-Oriented Navigation | Meng Wei et.al. | 2407.09016v1 | null |
2024-07-15 | Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation | Biqing Qi et.al. | 2407.08940v2 | link |
2024-07-11 | DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement | Benjamin A. Newman et.al. | 2407.08876v1 | null |
2024-07-11 | Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification | Wenshuo Peng et.al. | 2407.08787v1 | null |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735v1 | null |
2024-07-11 | Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data | Cherie Ho et.al. | 2407.08726v1 | null |
2024-07-11 | HACMan++: Spatially-Grounded Motion Primitives for Manipulation | Bowen Jiang et.al. | 2407.08585v1 | null |
2024-07-11 | Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | Ying Zhang et.al. | 2407.08532v1 | null |
2024-07-11 | Emergent Visual-Semantic Hierarchies in Image-Text Representations | Morris Alper et.al. | 2407.08521v1 | link |
2024-07-11 | Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization | Jinlong Li et.al. | 2407.08374v1 | null |
2024-07-11 | Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation | Tong Shao et.al. | 2407.08268v1 | link |
2024-07-11 | Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling | Noam Elata et.al. | 2407.08256v1 | null |
2024-07-11 | Leveraging LLMs to Predict Affective States via Smartphone Sensor Features | Tianyi Zhang et.al. | 2407.08240v1 | null |
2024-07-11 | Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning | Wenrui Li et.al. | 2407.08130v1 | null |
2024-07-10 | Learning In-Hand Translation Using Tactile Skin With Shear and Normal Force Sensing | Jessica Yin et.al. | 2407.07885v1 | null |
2024-07-11 | Toto: Time Series Optimized Transformer for Observability | Ben Cohen et.al. | 2407.07874v2 | null |
2024-07-10 | OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion | Hao Wang et.al. | 2407.07844v1 | link |
2024-07-10 | Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR | Nandan Thakur et.al. | 2407.07790v1 | link |
2024-07-11 | SaMoye: Zero-shot Singing Voice Conversion Based on Feature Disentanglement and Synthesis | Zihao Wang et.al. | 2407.07728v2 | link |
2024-07-10 | Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data | Motoshige Sato et.al. | 2407.07595v1 | null |
2024-07-10 | Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction | Yili Liu et.al. | 2407.07587v1 | null |
2024-07-11 | InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior | Chenguo Lin et.al. | 2407.07580v2 | null |
2024-07-10 | Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search | Kirill Paramonov et.al. | 2407.07541v1 | link |
2024-07-10 | IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection | Mingjin Zhang et.al. | 2407.07520v1 | link |
2024-07-09 | Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning | J. Crosbie et.al. | 2407.07011v1 | null |
2024-07-09 | Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning | Mayank Singh et.al. | 2407.06893v1 | null |
2024-07-09 | Rethinking Image-to-Video Adaptation: An Object-centric Perspective | Rui Qian et.al. | 2407.06871v1 | null |
2024-07-09 | PDEformer-1: A Foundation Model for One-Dimensional Partial Differential Equations | Zhanhong Ye et.al. | 2407.06664v1 | null |
2024-07-09 | Variational Zero-shot Multispectral Pansharpening | Xiangyu Rui et.al. | 2407.06633v1 | link |
2024-07-09 | CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding | Wenhao Xu et.al. | 2407.06611v1 | null |
2024-07-09 | VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving | Yibo Liu et.al. | 2407.06516v1 | null |
2024-07-08 | CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings | Anthony Varkey et.al. | 2407.06360v1 | link |
2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188v1 | null |
2024-07-08 | C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition | Rongchang Li et.al. | 2407.06113v1 | link |
2024-07-08 | Pseudo-triplet Guided Few-shot Composed Image Retrieval | Bohan Hou et.al. | 2407.06001v1 | null |
2024-07-08 | Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation | Jiaqi Chen et.al. | 2407.05890v1 | null |
2024-07-08 | HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Yingying Jiang et.al. | 2407.05795v1 | null |
2024-07-08 | When is the consistent prediction likely to be a correct prediction? | Alex Nguyen et.al. | 2407.05778v1 | null |
2024-07-08 | Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition | Seungju Kim et.al. | 2407.05733v1 | null |
2024-07-08 | Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification | Jiaying Shi et.al. | 2407.05647v1 | null |
2024-07-08 | GenFollower: Enhancing Car-Following Prediction with Large Language Models | Xianda Chen et.al. | 2407.05611v1 | null |
2024-07-08 | Open-world Multi-label Text Classification with Extremely Weak Supervision | Xintong Li et.al. | 2407.05609v1 | link |
2024-07-05 | LaRa: Efficient Large-Baseline Radiance Fields | Anpei Chen et.al. | 2407.04699v1 | null |
2024-07-05 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu et.al. | 2407.04693v1 | link |
2024-07-05 | RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | Yuxuan Kuang et.al. | 2407.04689v1 | link |
2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629v1 | null |
2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603v1 | link |
2024-07-05 | GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning | Aleksander Ficek et.al. | 2407.04528v1 | null |
2024-07-05 | AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | Petr Anokhin et.al. | 2407.04363v1 | link |
2024-07-05 | Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning | Mainak Singha et.al. | 2407.04207v1 | link |
2024-07-04 | Query-Guided Self-Supervised Summarization of Nursing Notes | Ya Gao et.al. | 2407.04125v1 | null |
2024-07-04 | FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | Tongyi SpeechTeam et.al. | 2407.04051v1 | link |
2024-07-03 | Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation | Marco Mistretta et.al. | 2407.03056v1 | link |
2024-07-03 | SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning | Bac Nguyen et.al. | 2407.03036v1 | null |
2024-07-03 | FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering | Xiaochen Wang et.al. | 2407.02964v1 | null |
2024-07-03 | LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation | Hongke Zhao et.al. | 2407.02833v1 | null |
2024-07-03 | ZEAL: Surgical Skill Assessment with Zero-shot Tool Inference Using Unified Foundation Model | Satoshi Kondo et.al. | 2407.02738v1 | null |
2024-07-02 | LLM-Select: Feature Selection with Large Language Models | Daniel P. Jeong et.al. | 2407.02694v1 | null |
2024-07-02 | Open Panoramic Segmentation | Junwei Zheng et.al. | 2407.02685v1 | link |
2024-07-02 | Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images | Furqan Shaukat et.al. | 2407.02625v1 | null |
2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473v1 | null |
2024-07-02 | SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation | Sayan Nag et.al. | 2407.02389v1 | null |
2024-07-02 | Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts | Chunlan Ma et.al. | 2407.02320v1 | null |
2024-07-02 | Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Yuchen Hu et.al. | 2407.02243v1 | null |
2024-07-02 | FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs | Haodong Chen et.al. | 2407.02157v1 | null |
2024-07-02 | Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model | Cong Cao et.al. | 2407.01960v1 | null |
2024-07-02 | Text-Aware Diffusion for Policy Learning | Calvin Luo et.al. | 2407.01903v1 | null |
2024-07-01 | DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models | Chang-Han Yeh et.al. | 2407.01519v1 | link |
2024-07-01 | Semantic Compositions Enhance Vision-Language Contrastive Learning | Maxwell Aladago et.al. | 2407.01408v1 | null |
2024-07-01 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | Xuan Yu et.al. | 2407.01349v1 | null |
2024-06-28 | STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical | Guohao Sun et.al. | 2406.19973v1 | link |
2024-06-28 | Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies | Pingcheng Jian et.al. | 2406.19971v1 | null |
2024-06-28 | Untangling the Unrestricted Web: Automatic Identification of Multilingual Registers | Erik Henriksson et.al. | 2406.19892v1 | link |
2024-06-28 | Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood | Yang Xu et.al. | 2406.19874v1 | link |
2024-06-27 | Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations | Ritam Dutt et.al. | 2406.19545v1 | link |
2024-06-27 | The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models | Xiliang Zhu et.al. | 2406.19358v1 | null |
2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349v1 | null |
2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255v1 | null |
2024-06-30 | Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO | Fuseini Mumuni et.al. | 2406.19057v2 | null |
2024-06-27 | Zero-shot domain adaptation based on dual-level mix and contrast | Yu Zhe et.al. | 2406.18996v1 | null |
2024-06-28 | Manipulate-Anything: Automating Real-World Robots using Vision-Language Models | Jiafei Duan et.al. | 2406.18915v2 | null |
2024-06-27 | DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment | Ke-Han Lu et.al. | 2406.18871v1 | null |
2024-06-27 | Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models | Yicheng Xu et.al. | 2406.18868v1 | link |
2024-06-27 | Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach | Yuxiang Huang et.al. | 2406.18837v1 | null |
2024-06-27 | Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs | Huaying Zhang et.al. | 2406.18836v1 | null |
2024-06-26 | Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation | Ahmed Njifenjou et.al. | 2406.18460v1 | null |
2024-06-26 | Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets | Simon Münker et.al. | 2406.18239v1 | null |
2024-06-26 | Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps | Dicong Qiu et.al. | 2406.18115v1 | null |
2024-06-26 | Boosting Soft Q-Learning by Bounding | Jacob Adamczyk et.al. | 2406.18033v1 | link |
2024-06-26 | E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS | Sefik Emre Eskimez et.al. | 2406.18009v1 | link |
2024-06-26 | Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model | Zhuo Zheng et.al. | 2406.17998v1 | link |
2024-06-25 | Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts | Xuyang Wu et.al. | 2406.17974v1 | link |
2024-06-25 | Efficient Document Ranking with Learnable Late Interactions | Ziwei Ji et.al. | 2406.17968v1 | null |
2024-06-25 | The Overcooked Generalisation Challenge | Constantin Ruhdorfer et.al. | 2406.17949v1 | null |
2024-06-25 | CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design | Nafis Neehal et.al. | 2406.17888v1 | link |
2024-06-25 | Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity | Chih-Hsuan Yang et.al. | 2406.17720v1 | link |
2024-06-25 | LaTable: Towards Large Tabular Models | Boris van Breugel et.al. | 2406.17673v1 | null |
2024-06-26 | SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond | Marco Comunità et.al. | 2406.17672v2 | null |
2024-06-26 | Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIP | Sedigheh Eslami et.al. | 2406.17639v2 | link |
2024-06-25 | Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images | Boyu Chen et.al. | 2406.17577v1 | link |
2024-06-25 | High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model | Joun Yeop Lee et.al. | 2406.17310v1 | null |
2024-06-25 | Zero-Shot Long-Form Video Understanding through Screenplay | Yongliang Wu et.al. | 2406.17309v1 | null |
2024-06-24 | CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation | Abe Bohan Hou et.al. | 2406.17186v1 | link |
2024-06-24 | Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Nisarg Patel et.al. | 2406.17169v1 | link |
2024-06-24 | Vastextures: Vast repository of textures and PBR materials extracted from real-world images using unsupervised methods | Sagi Eppel et.al. | 2406.17146v1 | null |
2024-06-24 | Can Quantum Computers Do Nothing? | Alexander Nico-Katz et.al. | 2406.16861v1 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long |
Mounika Marreddy et.al. | 2406.16833v1 | null |
2024-06-25 | Towards Zero-Shot Text-To-Speech for Arabic Dialects | Khai Duy Doan et.al. | 2406.16751v2 | null |
2024-06-24 | Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings | Andrea Posada et.al. | 2406.16611v1 | link |
2024-06-24 | eagerlearners at SemEval2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure | Hoorieh Sabzevari et.al. | 2406.16490v1 | link |
2024-06-24 | UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding | Dongyang Li et.al. | 2406.16372v1 | link |
2024-06-24 | EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records | Yeonsu Kwon et.al. | 2406.16341v1 | link |
2024-06-24 | DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task | Wenhan Liu et.al. | 2406.16332v1 | link |
2024-06-24 | Anomaly Detection of Tabular Data Using LLMs | Aodong Li et.al. | 2406.16308v1 | null |
2024-06-24 | LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments | Zixia Jia et.al. | 2406.16294v1 | link |
2024-06-21 | Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild | Nadav Orzech et.al. | 2406.15331v1 | null |
2024-06-21 | LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs | Ziyan Jiang et.al. | 2406.15319v1 | null |
2024-06-21 | Retrieval Augmented Zero-Shot Text Classification | Tassallah Abdullahi et.al. | 2406.15241v1 | link |
2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227v1 | link |
2024-06-21 | How Effective is GPT-4 Turbo in Generating School-Level Questions from Textbooks Based on Bloom's Revised Taxonomy? | Subhankar Maity et.al. | 2406.15211v1 | null |
2024-06-21 | Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding | Mohan Li et.al. | 2406.15209v1 | null |
2024-06-21 | Latent Space Translation via Inverse Relative Projection | Valentino Maiorca et.al. | 2406.15057v1 | null |
2024-06-21 | Behaviour Distillation | Andrei Lupu et.al. | 2406.15042v1 | link |
2024-06-21 | Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning | Suyi Li et.al. | 2406.14962v1 | link |
2024-06-21 | Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video | Zhengbang Yang et.al. | 2406.14877v1 | null |
2024-06-20 | Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps | Nikita Starodubcev et.al. | 2406.14539v1 | null |
2024-06-20 | APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking | Can Jin et.al. | 2406.14449v1 | null |
2024-06-20 | Transferable Boltzmann Generators | Leon Klein et.al. | 2406.14426v1 | null |
2024-06-20 | Zero-Shot Image Denoising for High-Resolution Electron Microscopy | Xuanyu Tian et.al. | 2406.14264v1 | link |
2024-06-20 | SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots | Weixing Wang et.al. | 2406.14208v1 | null |
2024-06-20 | A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning | Panagiotis Kaliosis et.al. | 2406.14164v1 | link |
2024-06-20 | One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging | Linhan Yang et.al. | 2406.14136v1 | null |
2024-06-20 | An Investigation of Prompt Variations for Zero-shot LLM-based Rankers | Shuoqi Sun et.al. | 2406.14117v1 | link |
2024-06-20 | Understanding Different Design Choices in Training Large Time Series Models | Yu-Neng Chuang et.al. | 2406.14045v1 | null |
2024-06-20 | Taxonomy-Guided Zero-Shot Recommendations with LLMs | Yueqing Liang et.al. | 2406.14043v1 | link |
2024-06-18 | Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation | Ning-Hsu Wang et.al. | 2406.12849v1 | null |
2024-06-18 | Generating Educational Materials with Different Levels of Readability using LLMs | Chieh-Yang Huang et.al. | 2406.12787v1 | null |
2024-06-18 | MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning | Shuo Xu et.al. | 2406.12757v1 | null |
2024-06-19 | Rationale-based Ensemble of Multiple QA Strategies for Zero-shot Knowledge-based VQA | Miaoyu Li et.al. | 2406.12746v2 | link |
2024-06-18 | Large Language Model as a Universal Clinical Multi-task Decoder | Yujiang Wu et.al. | 2406.12738v1 | null |
2024-06-18 | BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity | Zahra Gharaee et.al. | 2406.12723v1 | link |
2024-06-18 | GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models | Yongtao Ge et.al. | 2406.12671v1 | link |
2024-06-18 | Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model | Jiang-Xin Shi et.al. | 2406.12638v1 | link |
2024-06-18 | News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | Andreea Iana et.al. | 2406.12634v1 | link |
2024-06-18 | SeTAR: Out-of-Distribution Detection with Selective Low-Rank Approximation | Yixia Li et.al. | 2406.12629v1 | link |
2024-06-17 | Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity | Bingxiang He et.al. | 2406.11721v1 | link |
2024-06-17 | TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy | Yiqun Chen et.al. | 2406.11678v1 | link |
2024-06-17 | A Two-dimensional Zero-shot Dialogue State Tracking Evaluation Method using GPT-4 | Ming Gu et.al. | 2406.11651v1 | link |
2024-06-17 | AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection | Lingjie Kong et.al. | 2406.11643v1 | link |
2024-06-17 | Can Many-Shot In-Context Learning Help Long-Context LLM Judges? See More, Judge Better! | Mingyang Song et.al. | 2406.11629v1 | link |
2024-06-17 | Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency | Vasiliki Kougia et.al. | 2406.11486v1 | link |
2024-06-17 | How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment | Heyan Huang et.al. | 2406.11474v1 | null |
2024-06-17 | Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction | Shilong Li et.al. | 2406.11429v1 | link |
2024-06-17 | DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer | Keon Lee et.al. | 2406.11427v1 | null |
2024-06-17 | BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM | Zhewen Shen et.al. | 2406.11418v1 | null |
2024-06-14 | Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation | Nameer Hirschkind et.al. | 2406.10223v1 | null |
2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162v1 | link |
2024-06-14 | Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition | Guinan Li et.al. | 2406.10152v1 | null |
2024-06-14 | Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection | Mehar Khurana et.al. | 2406.10115v1 | link |
2024-06-14 | dGrasp: NeRF-Informed Implicit Grasp Policies with Supervised Optimization Slopes | Gergely Sóti et.al. | 2406.09939v1 | null |
2024-06-14 | POWN: Prototypical Open-World Node Classification | Marcel Hoffmann et.al. | 2406.09926v1 | link |
2024-06-14 | CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions | Mingyu Derek Ma et.al. | 2406.09923v1 | link |
2024-06-14 | Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy | Linhan Ma et.al. | 2406.09844v1 | null |
2024-06-14 | Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting | Ce Hao et.al. | 2406.09767v1 | null |
2024-06-14 | Learning Language Structures through Grounding | Freda Shi et.al. | 2406.09662v1 | null |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418v1 | link |
2024-06-13 | Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Youngtaek Oh et.al. | 2406.09388v1 | link |
2024-06-13 | Scale-Invariant Monocular Depth Estimation via SSI Depth | S. Mahdi H. Miangoleh et.al. | 2406.09374v1 | null |
2024-06-13 | **Learning from Nat |