Updated on 2025.12.26
2023-7
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-08-03 | Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter | Luca Crupi et.al. | 2308.01833 | null |
| 2023-08-03 | Active Acoustic Sensing for Robot Manipulation | Shihan Lu et.al. | 2308.01600 | null |
| 2023-08-02 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | Andrew Guo et.al. | 2308.01477 | null |
| 2023-08-01 | Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes | Bohao Fan et.al. | 2308.00628 | link |
| 2023-08-01 | Markerless human pose estimation for biomedical applications: a survey | Andrea Avogaro et.al. | 2308.00519 | null |
| 2023-08-01 | Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches | Pia Hanfeld et.al. | 2308.00344 | null |
| 2023-08-01 | Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis | Asish Bera et.al. | 2308.00323 | null |
| 2023-08-01 | Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) | Chaochao Zhou et.al. | 2308.00214 | null |
| 2023-07-31 | Lightweight Super-Resolution Head for Human Pose Estimation | Haonan Wang et.al. | 2307.16765 | link |
| 2023-07-31 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2307.16687 | null |
| 2023-07-30 | Touch if it’s transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction | Prajval Kumar Murali et.al. | 2307.16254 | null |
| 2023-07-30 | Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems | Cen Liu et.al. | 2307.16117 | null |
| 2023-07-29 | Iterative Graph Filtering Network for 3D Human Pose Estimation | Zaedul Islam et.al. | 2307.16074 | link |
| 2023-07-29 | HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation | Zuyan Liu et.al. | 2307.16061 | null |
| 2023-07-29 | Effective Whole-body Pose Estimation with Two-stages Distillation | Zhendong Yang et.al. | 2307.15880 | link |
| 2023-07-28 | Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation | Jaime Corsetti et.al. | 2307.15514 | null |
| 2023-07-28 | Robust Visual Sim-to-Real Transfer for Robotic Manipulation | Ricardo Garcia et.al. | 2307.15320 | null |
| 2023-07-27 | Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for Autonomous Driving | Peter Bauer et.al. | 2307.14889 | null |
| 2023-07-26 | Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control | Yijiong Lin et.al. | 2307.14510 | null |
| 2023-07-28 | CBGL: Fast Monte Carlo Passive Global Localisation of 2D LIDAR Sensor | Alexandros Filotheou et.al. | 2307.14247 | link |
| 2023-07-26 | Deep Robust Multi-Robot Re-localisation in Natural Environments | Milad Ramezani et.al. | 2307.13950 | null |
| 2023-07-25 | Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior | Jose Sosa et.al. | 2307.13361 | null |
| 2023-07-23 | TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation | Huijie Zhang et.al. | 2307.12400 | null |
| 2023-07-25 | FDCT: Fast Depth Completion for Transparent Objects | Tianan Li et.al. | 2307.12274 | link |
| 2023-07-22 | Challenges for Monocular 6D Object Pose Estimation in Robotics | Stefan Thalhammer et.al. | 2307.12172 | null |
| 2023-07-22 | Pyramid Semantic Graph-based Global Point Cloud Registration with Low Overlap | Zhijian Qiao et.al. | 2307.12116 | link |
| 2023-07-22 | Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence | Yang Tian et.al. | 2307.12106 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-08-14 | Global Features are All You Need for Image Retrieval and Reranking | Shihao Shao et.al. | 2308.06954 | link |
| 2023-08-14 | MixBCT: Towards Self-Adapting Backward-Compatible Training | Yu Liang et.al. | 2308.06948 | link |
| 2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459 | null |
| 2023-08-09 | AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities | Jingdan Zhang et.al. | 2308.04992 | link |
| 2023-08-08 | Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval | Yi Bin et.al. | 2308.04343 | link |
| 2023-08-08 | Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval | Yunquan Zhu et.al. | 2308.04008 | link |
| 2023-08-05 | A Comprehensive Analysis of Real-World Image Captioning and Scene Identification | Sai Suprabhanu Nallapaneni et.al. | 2308.02833 | null |
| 2023-08-03 | Similar image retrieval using Autoencoder. I. Automatic morphology classification of galaxies | Eunsuk Seo et.al. | 2308.01871 | null |
| 2023-08-01 | AnyLoc: Towards Universal Visual Place Recognition | Nikhil Keetha et.al. | 2308.00688 | link |
| 2023-07-31 | Guiding Image Captioning Models Toward More Specific Captions | Simon Kornblith et.al. | 2307.16686 | null |
| 2023-07-31 | Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks | Kousik Rajesh et.al. | 2307.16395 | null |
| 2023-07-28 | D2S: Representing local descriptors and global scene coordinates for camera relocalization | Bach-Thuan Bui et.al. | 2307.15250 | null |
| 2023-07-26 | Neural-based Cross-modal Search and Retrieval of Artwork | Yan Gong et.al. | 2307.14244 | null |
| 2023-07-26 | Boon: A Neural Search Engine for Cross-Modal Information Retrieval | Yan Gong et.al. | 2307.14240 | null |
| 2023-07-25 | Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network | Chull Hwan Song et.al. | 2307.13254 | null |
| 2023-07-28 | SACReg: Scene-Agnostic Coordinate Regression for Visual Localization | Jerome Revaud et.al. | 2307.11702 | null |
| 2023-07-19 | Lazy Visual Localization via Motion Averaging | Siyan Dong et.al. | 2307.09981 | null |
| 2023-07-19 | Quantum Optics based Algorithm for Measuring the Similarity between Images | Vivek Mehta et.al. | 2307.09789 | null |
| 2023-07-18 | Jean-Luc Picard at Touché 2023: Comparing Image Generation, Stance Detection and Feature Matching for Image Retrieval for Arguments | Max Moebius et.al. | 2307.09172 | null |
| 2023-07-19 | Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation | Rundong Luo et.al. | 2307.08779 | null |
| 2023-07-17 | Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition | Gabriele Trivigno et.al. | 2307.08417 | null |
| 2023-07-17 | Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification | Tengfei Liang et.al. | 2307.08316 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-08-22 | LDP-Feat: Image Features with Local Differential Privacy | Francesco Pittaluga et.al. | 2308.11223 | null |
| 2023-08-20 | Neural Interactive Keypoint Detection | Jie Yang et.al. | 2308.10174 | link |
| 2023-08-19 | ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment | Bingyang Zhou et.al. | 2308.09987 | null |
| 2023-08-16 | DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching | Johan Edstedt et.al. | 2308.08479 | link |
| 2023-08-15 | CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Hao Ouyang et.al. | 2308.07926 | link |
| 2023-08-15 | ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition | Wenyuan Xue et.al. | 2308.07743 | null |
| 2023-08-14 | DELO: Deep Evidential LiDAR Odometry using Partial Optimal Transport | Sk Aziz Ali et.al. | 2308.07153 | null |
| 2023-08-14 | 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds | Minhao Li et.al. | 2308.05667 | null |
| 2023-08-02 | Automated Hit-frame Detection for Badminton Match Analysis | Yu-Hang Chien et.al. | 2307.16000 | link |
| 2023-07-25 | Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception | Chuanyu Luo et.al. | 2307.13300 | null |
| 2023-07-21 | Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data | Sahar Almahfouz Nasser et.al. | 2307.10698 | link |
| 2023-07-19 | SAMConvex: Fast Discrete Optimization for CT Registration using Self-supervised Anatomical Embedding and Correlation Pyramid | Zi Li et.al. | 2307.09727 | null |
| 2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306 | link |
| 2023-06-27 | Detector-Free Structure from Motion | Xingyi He et.al. | 2306.15669 | link |
| 2023-06-26 | CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild | Li Ding et.al. | 2306.15073 | null |
| 2023-06-28 | Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset | Ziqiao Weng et.al. | 2306.07089 | link |
| 2023-06-07 | Learning Probabilistic Coordinate Fields for Robust Correspondences | Weiyue Zhao et.al. | 2306.04231 | null |
| 2023-06-03 | LDEB – Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues | Amitabha Dey et.al. | 2306.02193 | null |
| 2023-06-02 | Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images | Marcela Mera-Trujillo et.al. | 2306.01938 | null |
2023-6
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-08-15 | CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Hao Ouyang et.al. | 2308.07926 | link |
| 2023-08-15 | ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition | Wenyuan Xue et.al. | 2308.07743 | null |
| 2023-08-14 | DELO: Deep Evidential LiDAR Odometry using Partial Optimal Transport | Sk Aziz Ali et.al. | 2308.07153 | null |
| 2023-08-14 | 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds | Minhao Li et.al. | 2308.05667 | null |
| 2023-08-02 | Automated Hit-frame Detection for Badminton Match Analysis | Yu-Hang Chien et.al. | 2307.16000 | link |
| 2023-07-25 | Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception | Chuanyu Luo et.al. | 2307.13300 | null |
| 2023-07-21 | Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data | Sahar Almahfouz Nasser et.al. | 2307.10698 | link |
| 2023-07-19 | SAMConvex: Fast Discrete Optimization for CT Registration using Self-supervised Anatomical Embedding and Correlation Pyramid | Zi Li et.al. | 2307.09727 | null |
| 2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306 | link |
| 2023-06-27 | Detector-Free Structure from Motion | Xingyi He et.al. | 2306.15669 | link |
| 2023-06-26 | CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild | Li Ding et.al. | 2306.15073 | null |
| 2023-06-28 | Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset | Ziqiao Weng et.al. | 2306.07089 | link |
| 2023-06-07 | Learning Probabilistic Coordinate Fields for Robust Correspondences | Weiyue Zhao et.al. | 2306.04231 | null |
| 2023-06-03 | LDEB – Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues | Amitabha Dey et.al. | 2306.02193 | null |
| 2023-06-02 | Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images | Marcela Mera-Trujillo et.al. | 2306.01938 | null |
2023-8
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-09-05 | A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments | W. Jacob Wagner et.al. | 2309.02569 | null |
| 2023-09-05 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction | Youmin Zhang et.al. | 2309.02436 | null |
| 2023-09-05 | DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation | Lei Zhou et.al. | 2309.01925 | null |
| 2023-09-04 | On the Query Strategies for Efficient Online Active Distillation | Michele Boldo et.al. | 2309.01612 | null |
| 2023-09-04 | DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion | Cédric Rommel et.al. | 2309.01575 | null |
| 2023-09-06 | Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | Hanbing Liu et.al. | 2309.01365 | null |
| 2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | null |
| 2023-09-02 | Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis | Jerrin Bright et.al. | 2309.01010 | null |
| 2023-09-01 | Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture | Shaohua Pan et.al. | 2309.00310 | link |
| 2023-08-31 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild | Manuel Kaufmann et.al. | 2308.16894 | link |
| 2023-08-31 | SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects | Ning Gao et.al. | 2308.16528 | null |
| 2023-08-30 | Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports | İrem Üstek et.al. | 2308.16325 | link |
| 2023-08-30 | SignDiff: Learning Diffusion Models for American Sign Language Production | Sen Fang et.al. | 2308.16082 | null |
| 2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984 | null |
| 2023-08-30 | Reconstructing Groups of People with Hypergraph Relational Reasoning | Buzhen Huang et.al. | 2308.15844 | null |
| 2023-08-29 | 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking | Urs Waldmann et.al. | 2308.15316 | null |
| 2023-08-29 | Spatio-temporal MLP-graph network for 3D human pose estimation | Tanvir Hassan et.al. | 2308.15313 | link |
| 2023-08-29 | Pose-Free Neural Radiance Fields via Implicit Pose Regularization | Jiahui Zhang et.al. | 2308.15049 | null |
| 2023-08-28 | R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras | Aron Schmied et.al. | 2308.14713 | null |
| 2023-08-28 | Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson’s Disease | Gabriela T. Acevedo Trebbau et.al. | 2308.14679 | null |
| 2023-08-28 | Active Pose Refinement for Textureless Shiny Objects using the Structured Light Camera | Jun Yang et.al. | 2308.14665 | null |
| 2023-08-28 | CPFES: Physical Fitness Evaluation Based on Canadian Agility and Movement Skill Assessment | Pengcheng Dong et.al. | 2308.14324 | null |
| 2023-08-27 | LDL: Line Distance Functions for Panoramic Localization | Junho Kim et.al. | 2308.13989 | null |
| 2023-08-26 | Prior-guided Source-free Domain Adaptation for Human Pose Estimation | Dripta S. Raychaudhuri et.al. | 2308.13954 | null |
| 2023-08-26 | Vision-Based Human Pose Estimation via Deep Learning: A Survey | Gongjin Lan et.al. | 2308.13872 | null |
| 2023-08-24 | POCO: 3D Pose and Shape Estimation with Confidence | Sai Kumar Dwivedi et.al. | 2308.12965 | null |
| 2023-08-24 | Robot Pose Nowcasting: Forecast the Future to Improve the Present | Alessandro Simoni et.al. | 2308.12914 | null |
| 2023-08-23 | Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map | Timothy D Barfoot et.al. | 2308.12418 | null |
| 2023-08-22 | Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape | Jiacong Xu et.al. | 2308.11737 | null |
| 2023-08-22 | TrackFlow: Multi-Object Tracking with Normalizing Flows | Gianluca Mancusi et.al. | 2308.11513 | null |
| 2023-08-22 | A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles | Yusheng Wang et.al. | 2308.11492 | null |
| 2023-08-22 | PoseGraphNet++: Enriching 3D Human Pose with Orientation Estimation | Soubarna Banik et.al. | 2308.11440 | null |
| 2023-08-22 | Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views | Wentian Qu et.al. | 2308.11198 | null |
| 2023-08-21 | Spectral Graphormer: Spectral Graph-based Transformer for Egocentric Two-Hand Reconstruction using Multi-View Color Images | Tze Ho Elden Tse et.al. | 2308.11015 | null |
| 2023-08-21 | Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data | Patrick Ruhkamp et.al. | 2308.10627 | null |
| 2023-08-21 | GaitPT: Skeletons Are All You Need For Gait Recognition | Andy Catruna et.al. | 2308.10623 | null |
| 2023-08-21 | Approximately Equivariant Graph Networks | Ningyuan Huang et.al. | 2308.10436 | link |
| 2023-08-21 | In-Rack Test Tube Pose Estimation Using RGB-D Data | Hao Chen et.al. | 2308.10411 | null |
| 2023-08-20 | Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video | Yingxuan You et.al. | 2308.10305 | link |
| 2023-08-20 | OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision | Shujie Zhang et.al. | 2308.10146 | null |
| 2023-08-19 | 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation | Yi Zhang et.al. | 2308.10123 | link |
| 2023-08-19 | Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation | Yang Hai et.al. | 2308.10016 | link |
| 2023-08-19 | UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning | Meiqi Sun et.al. | 2308.09953 | null |
| 2023-08-22 | Scene-Aware Feature Matching | Xiaoyong Lu et.al. | 2308.09949 | null |
| 2023-08-18 | PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation | Hanbing Liu et.al. | 2308.09678 | link |
| 2023-08-18 | Improving 3D Pose Estimation for Sign Language | Maksym Ivashechkin et.al. | 2308.09525 | null |
| 2023-08-18 | Denoising Diffusion for 3D Hand Pose Estimation from Images | Maksym Ivashechkin et.al. | 2308.09523 | null |
| 2023-08-18 | ResQ: Residual Quantization for Video Perception | Davide Abati et.al. | 2308.09511 | null |
| 2023-08-17 | MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices | Dongyang Yu et.al. | 2308.09084 | null |
| 2023-08-17 | Pedestrian Environment Model for Automated Driving | Adrian Holzbock et.al. | 2308.09080 | null |
| 2023-08-17 | Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction | Yuhao Yang et.al. | 2308.08518 | null |
| 2023-08-16 | View Consistent Purification for Accurate Cross-View Localization | Shan Wang et.al. | 2308.08110 | null |
| 2023-08-15 | Learning Better Keypoints for Multi-Object 6DoF Pose Estimation | Yangzheng Wu et.al. | 2308.07827 | null |
| 2023-08-14 | Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation | Huan Liu et.al. | 2308.07313 | link |
| 2023-08-12 | 4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion | Guirong Zhuo et.al. | 2308.06573 | null |
| 2023-08-17 | EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes | Jiaxi Jiang et.al. | 2308.06493 | null |
| 2023-08-11 | Aggressive Aerial Grasping using a Soft Drone with Onboard Perception | Samuel Ubellacker et.al. | 2308.06351 | null |
| 2023-08-11 | VERF: Runtime Monitoring of Pose Estimation with Neural Radiance Fields | Dominic Maggio et.al. | 2308.05939 | null |
| 2023-08-10 | Toward Globally Optimal State Estimation Using Automatically Tightened Semidefinite Relaxations | Frederike Dümbgen et.al. | 2308.05783 | null |
| 2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459 | null |
| 2023-08-10 | How-to Augmented Lagrangian on Factor Graphs | Barbara Bazzana et.al. | 2308.05444 | null |
| 2023-08-10 | Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation | Jun Zhou et.al. | 2308.05438 | link |
| 2023-08-10 | Robust Localization with Visual-Inertial Odometry Constraints for Markerless Mobile AR | Changkun Liu et.al. | 2308.05394 | null |
| 2023-08-10 | Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | Hongbo Kang et.al. | 2308.05298 | link |
| 2023-08-09 | ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction | Weijie Chen et.al. | 2308.04956 | null |
| 2023-08-07 | SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention | Efimia Panagiotaki et.al. | 2308.03718 | null |
| 2023-08-07 | A Horse with no Labels: Self-Supervised Horse Pose Estimation from Unlabelled Images and Synthetic Prior | Jose Sosa et.al. | 2308.03411 | null |
| 2023-08-06 | Source-free Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2308.03202 | null |
| 2023-08-04 | Diffusion-Augmented Depth Prediction with Sparse Annotations | Jiaqi Li et.al. | 2308.02283 | null |
| 2023-08-04 | DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field | Haowen Wang et.al. | 2308.02239 | null |
| 2023-08-07 | Robust Self-Supervised Extrinsic Self-Calibration | Takayuki Kanai et.al. | 2308.02153 | null |
| 2023-08-03 | Sim-to-Real Vision-depth Fusion CNNs for Robust Pose Estimation Aboard Autonomous Nano-quadcopter | Luca Crupi et.al. | 2308.01833 | null |
| 2023-08-03 | Active Acoustic Sensing for Robot Manipulation | Shihan Lu et.al. | 2308.01600 | null |
| 2023-08-02 | HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions | Andrew Guo et.al. | 2308.01477 | null |
| 2023-08-06 | Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes | Bohao Fan et.al. | 2308.00628 | link |
| 2023-08-01 | Markerless human pose estimation for biomedical applications: a survey | Andrea Avogaro et.al. | 2308.00519 | null |
| 2023-08-01 | Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches | Pia Hanfeld et.al. | 2308.00344 | null |
| 2023-08-01 | Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis | Asish Bera et.al. | 2308.00323 | null |
| 2023-08-01 | Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF) | Chaochao Zhou et.al. | 2308.00214 | null |
| 2023-07-31 | Lightweight Super-Resolution Head for Human Pose Estimation | Haonan Wang et.al. | 2307.16765 | link |
| 2023-07-31 | DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation | Runyang Feng et.al. | 2307.16687 | null |
| 2023-07-30 | Touch if it’s transparent! ACTOR: Active Tactile-based Category-Level Transparent Object Reconstruction | Prajval Kumar Murali et.al. | 2307.16254 | null |
| 2023-07-30 | Successive Pose Estimation and Beam Tracking for mmWave Vehicular Communication Systems | Cen Liu et.al. | 2307.16117 | null |
| 2023-07-29 | Iterative Graph Filtering Network for 3D Human Pose Estimation | Zaedul Islam et.al. | 2307.16074 | link |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-13 | RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline | Mirko Usuelli et.al. | 2309.07094 | null |
| 2023-09-11 | Towards Content-based Pixel Retrieval in Revisited Oxford and Paris | Guoyuan An et.al. | 2309.05438 | link |
| 2023-09-08 | Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning | Hiroki Nakamura et.al. | 2309.04148 | null |
| 2023-09-05 | Dual Relation Alignment for Composed Image Retrieval | Xintong Jiang et.al. | 2309.02169 | null |
| 2023-09-04 | NLLB-CLIP – train performant multilingual image retrieval model on a budget | Alexander Visheratin et.al. | 2309.01859 | null |
| 2023-09-04 | Target-Guided Composed Image Retrieval | Haokun Wen et.al. | 2309.01366 | null |
| 2023-09-02 | Deep supervised hashing for fast retrieval of radio image cubes | Steven Ndung’u et.al. | 2309.00932 | null |
| 2023-08-31 | Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval | Prateksha Udhayanan et.al. | 2308.16649 | null |
| 2023-08-28 | Extending Cross-Modal Retrieval with Interactive Learning to Improve Image Retrieval Performance in Forensics | Nils Böhne et.al. | 2308.14786 | null |
| 2023-08-28 | CoVR: Learning Composed Video Retrieval from Web Video Captions | Lucas Ventura et.al. | 2308.14746 | link |
| 2023-08-27 | Deep Learning for Visual Localization and Mapping: A Survey | Changhao Chen et.al. | 2308.14039 | null |
| 2023-08-26 | Learning Efficient Representations for Image-Based Patent Retrieval | Hongsong Wang et.al. | 2308.13749 | null |
| 2023-08-25 | Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers | Mohammad Javad Rajabi et.al. | 2308.13671 | null |
| 2023-08-24 | Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities | Jinze Bai et.al. | 2308.12966 | link |
| 2023-08-23 | Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval | Huafeng Li et.al. | 2308.11994 | null |
| 2023-08-23 | OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes | Tao Xie et.al. | 2308.11928 | link |
| 2023-08-22 | Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features | Alberto Baldrati et.al. | 2308.11485 | link |
| 2023-08-22 | GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training | Xinchi Deng et.al. | 2308.11331 | null |
| 2023-08-22 | LDP-Feat: Image Features with Local Differential Privacy | Francesco Pittaluga et.al. | 2308.11223 | null |
| 2023-08-21 | EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition | Gabriele Berton et.al. | 2308.10832 | link |
| 2023-08-20 | FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory | Anwesan Pal et.al. | 2308.10170 | null |
| 2023-08-18 | 3D Model-free Visual localization System from Essential Matrix under Local Planar Motion | Yanmei Jiao et.al. | 2308.09566 | null |
| 2023-08-17 | FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings | Yulin Su et.al. | 2308.09012 | link |
| 2023-08-16 | Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval | Aishwarya Venkataramanan et.al. | 2308.08431 | link |
| 2023-08-16 | Ranking-aware Uncertainty for Text-guided Image Retrieval | Junyang Chen et.al. | 2308.08131 | null |
| 2023-08-19 | Global Features are All You Need for Image Retrieval and Reranking | Shihao Shao et.al. | 2308.06954 | link |
| 2023-08-14 | MixBCT: Towards Self-Adapting Backward-Compatible Training | Yu Liang et.al. | 2308.06948 | link |
| 2023-08-10 | KS-APR: Keyframe Selection for Robust Absolute Pose Regression | Changkun Liu et.al. | 2308.05459 | null |
| 2023-08-09 | AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities | Jingdan Zhang et.al. | 2308.04992 | link |
| 2023-08-08 | Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval | Yi Bin et.al. | 2308.04343 | link |
| 2023-08-08 | Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval | Yunquan Zhu et.al. | 2308.04008 | link |
| 2023-08-05 | A Comprehensive Analysis of Real-World Image Captioning and Scene Identification | Sai Suprabhanu Nallapaneni et.al. | 2308.02833 | null |
| 2023-08-03 | Similar image retrieval using Autoencoder. I. Automatic morphology classification of galaxies | Eunsuk Seo et.al. | 2308.01871 | null |
| 2023-08-01 | AnyLoc: Towards Universal Visual Place Recognition | Nikhil Keetha et.al. | 2308.00688 | link |
| 2023-07-31 | Guiding Image Captioning Models Toward More Specific Captions | Simon Kornblith et.al. | 2307.16686 | null |
| 2023-07-31 | Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks | Kousik Rajesh et.al. | 2307.16395 | null |
| 2023-07-28 | D2S: Representing local descriptors and global scene coordinates for camera relocalization | Bach-Thuan Bui et.al. | 2307.15250 | null |
| 2023-07-26 | Neural-based Cross-modal Search and Retrieval of Artwork | Yan Gong et.al. | 2307.14244 | null |
| 2023-07-26 | Boon: A Neural Search Engine for Cross-Modal Information Retrieval | Yan Gong et.al. | 2307.14240 | null |
| 2023-07-25 | Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network | Chull Hwan Song et.al. | 2307.13254 | null |
| 2023-07-28 | SACReg: Scene-Agnostic Coordinate Regression for Visual Localization | Jerome Revaud et.al. | 2307.11702 | null |
| 2023-07-19 | Lazy Visual Localization via Motion Averaging | Siyan Dong et.al. | 2307.09981 | null |
| 2023-07-19 | Quantum Optics based Algorithm for Measuring the Similarity between Images | Vivek Mehta et.al. | 2307.09789 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-09-26 | ObVi-SLAM: Long-Term Object-Visual SLAM | Amanda Adkins et.al. | 2309.15268 | null |
| 2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436 | link |
| 2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563 | null |
| 2023-09-17 | CryoAlign: feature-based method for global and local 3D alignment of EM density maps | Bintao He et.al. | 2309.09217 | null |
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750 | null |
| 2023-09-07 | InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | Zigang Geng et.al. | 2309.03895 | null |
| 2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | null |
| 2023-09-12 | Improving the matching of deformable objects by learning to detect keypoints | Felipe Cadar et.al. | 2309.00434 | link |
| 2023-08-31 | SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation | Jiaben Chen et.al. | 2308.16876 | null |
| 2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984 | null |
| 2023-08-29 | A lightweight 3D dense facial landmark estimation model from position map data | Shubhajit Basak et.al. | 2308.15170 | null |
| 2023-08-27 | Automatic coarse co-registration of point clouds from diverse scan geometries: a test of detectors and descriptors | Francesco Pirotti et.al. | 2308.14047 | null |
| 2023-08-24 | VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition | Gengxuan Tian et.al. | 2308.12870 | null |
| 2023-08-22 | LDP-Feat: Image Features with Local Differential Privacy | Francesco Pittaluga et.al. | 2308.11223 | null |
| 2023-08-20 | Neural Interactive Keypoint Detection | Jie Yang et.al. | 2308.10174 | link |
| 2023-08-19 | ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment | Bingyang Zhou et.al. | 2308.09987 | null |
| 2023-09-03 | DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching | Johan Edstedt et.al. | 2308.08479 | link |
| 2023-08-15 | CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Hao Ouyang et.al. | 2308.07926 | link |
| 2023-08-15 | ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition | Wenyuan Xue et.al. | 2308.07743 | null |
| 2023-08-14 | DELO: Deep Evidential LiDAR Odometry using Partial Optimal Transport | Sk Aziz Ali et.al. | 2308.07153 | null |
| 2023-08-14 | 2D3D-MATR: 2D-3D Matching Transformer for Detection-free Registration between Images and Point Clouds | Minhao Li et.al. | 2308.05667 | null |
| 2023-08-02 | Automated Hit-frame Detection for Badminton Match Analysis | Yu-Hang Chien et.al. | 2307.16000 | link |
| 2023-07-25 | Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception | Chuanyu Luo et.al. | 2307.13300 | null |
| 2023-07-21 | Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data | Sahar Almahfouz Nasser et.al. | 2307.10698 | link |
| 2023-07-19 | SAMConvex: Fast Discrete Optimization for CT Registration using Self-supervised Anatomical Embedding and Correlation Pyramid | Zi Li et.al. | 2307.09727 | null |
| 2023-07-01 | SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation | Fabian Duffhauss et.al. | 2307.00306 | link |
| 2023-06-27 | Detector-Free Structure from Motion | Xingyi He et.al. | 2306.15669 | link |
| 2023-06-26 | CLERA: A Unified Model for Joint Cognitive Load and Eye Region Analysis in the Wild | Li Ding et.al. | 2306.15073 | null |
| 2023-06-28 | Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset | Ziqiao Weng et.al. | 2306.07089 | link |
| 2023-06-07 | Learning Probabilistic Coordinate Fields for Robust Correspondences | Weiyue Zhao et.al. | 2306.04231 | null |
| 2023-06-03 | LDEB – Label Digitization with Emotion Binarization and Machine Learning for Emotion Recognition in Conversational Dialogues | Amitabha Dey et.al. | 2306.02193 | null |
2023-9
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-10-04 | Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC | Hongyi Fan et.al. | 2310.02719 | null |
| 2023-10-04 | USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields | Moyang Li et.al. | 2310.02687 | null |
| 2023-10-03 | Beyond the Benchmark: Detecting Diverse Anomalies in Videos | Yoav Arad et.al. | 2310.01904 | link |
| 2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897 | null |
| 2023-10-02 | LEAP: Liberate Sparse-view 3D Modeling from Camera Poses | Hanwen Jiang et.al. | 2310.01410 | null |
| 2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | null |
| 2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | link |
| 2023-09-30 | Diff-DOPE: Differentiable Deep Object Pose Estimation | Jonathan Tremblay et.al. | 2310.00463 | null |
| 2023-09-29 | Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration | Jungseok Hong et.al. | 2310.00146 | null |
| 2023-09-29 | Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation | Zhuoran Yu et.al. | 2310.00099 | null |
| 2023-09-29 | Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head | Qian Wu et.al. | 2309.17143 | link |
| 2023-09-29 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | Yunjiao Zhou et.al. | 2309.16964 | null |
| 2023-09-28 | End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon | Guillaume Bono et.al. | 2309.16634 | null |
| 2023-09-28 | Off-the-shelf bin picking workcell with visual pose estimation: A case study on the world robot summit 2018 kitting task | Frederik Hagelskjær et.al. | 2309.16221 | null |
| 2023-09-28 | Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing | Lu Dai et.al. | 2309.16189 | null |
| 2023-09-28 | Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation | Sameer Pai et.al. | 2309.16170 | null |
| 2023-09-28 | CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting | Shaoxiang Guo et.al. | 2309.16140 | null |
| 2023-09-28 | A Modular Bio-inspired Robotic Hand with High Sensitivity | Chao Liu et.al. | 2309.16081 | null |
| 2023-09-27 | Handbook on Leveraging Lines for Two-View Relative Pose Estimation | Petr Hruby et.al. | 2309.16040 | null |
| 2023-09-27 | Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature | Shengze Jin et.al. | 2309.16023 | null |
| 2023-09-27 | Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range | Xinran Li et.al. | 2309.15367 | null |
| 2023-09-26 | Unsupervised Reconstruction of 3D Human Pose Interactions From 2D Poses Alone | Peter Hardy et.al. | 2309.14865 | null |
| 2023-09-26 | Learning Vision-Based Bipedal Locomotion for Challenging Terrain | Helei Duan et.al. | 2309.14594 | null |
| 2023-09-25 | Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators | Yinan Meng et.al. | 2309.14279 | null |
| 2023-09-25 | Industrial Application of 6D Pose Estimation for Robotic Manipulation in Automotive Internal Logistics | Philipp Quentin et.al. | 2309.14265 | null |
| 2023-09-25 | BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation | Uyoung Jeong et.al. | 2309.14072 | link |
| 2023-09-24 | Towards Subcentimeter Accuracy Digital-Twin Tracking via An RGBD-based Transformer Model and A Comprehensive Mobile Dataset | Zixun Huang et.al. | 2309.13570 | null |
| 2023-09-21 | ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding | Yu Cheng et.al. | 2309.12183 | null |
| 2023-09-21 | ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers | Philipp Ausserlechner et.al. | 2309.11986 | null |
| 2023-09-21 | Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views | Taeho Kang et.al. | 2309.11962 | null |
| 2023-09-21 | A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose | Qingtian Wu et.al. | 2309.11773 | null |
| 2023-09-20 | Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation | Krishna Kanth Nakka et.al. | 2309.11667 | null |
| 2023-09-20 | Online Supervised Training of Spaceborne Vision during Proximity Operations using Adaptive Kalman Filtering | Tae Ha Park et.al. | 2309.11645 | null |
| 2023-09-20 | OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving | Heng Li et.al. | 2309.11011 | null |
| 2023-09-19 | Language-Conditioned Affordance-Pose Detection in 3D Point Clouds | Toan Nguyen et.al. | 2309.10911 | null |
| 2023-09-19 | MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings | Surbhi Madan et.al. | 2309.10765 | link |
| 2023-09-19 | SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction | Anilkumar Swamy et.al. | 2309.10748 | null |
| 2023-09-20 | GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild | Simon Schaefer et.al. | 2309.10369 | null |
| 2023-09-19 | RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery | Jiaxin Wei et.al. | 2309.10255 | null |
| 2023-09-18 | Hierarchical Attention and Graph Neural Networks: Toward Drift-Free Pose Estimation | Kathia Melbouci et.al. | 2309.09934 | null |
| 2023-09-18 | Application-driven Validation of Posteriors in Inverse Problems | Tim J. Adler et.al. | 2309.09764 | null |
| 2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563 | null |
| 2023-09-18 | Sparse and Privacy-enhanced Representation for Human Pose Estimation | Ting-Ying Lin et.al. | 2309.09515 | null |
| 2023-09-19 | RenderIH: A Large-scale Synthetic Dataset for 3D Interacting Hand Pose Estimation | Lijun Li et.al. | 2309.09301 | link |
| 2023-09-16 | Optimal Initialization Strategies for Range-Only Trajectory Estimation | Abhishek Goudar et.al. | 2309.09011 | null |
| 2023-09-16 | DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF | Mert Asim Karaoglu et.al. | 2309.08927 | null |
| 2023-09-16 | Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning | Pengyu Yin et.al. | 2309.08914 | null |
| 2023-09-15 | Towards Robust and Smooth 3D Multi-Person Pose Estimation from Monocular Videos in the Wild | Sungchan Park et.al. | 2309.08644 | null |
| 2023-09-15 | YCB-Ev: Event-vision dataset for 6DoF object pose estimation | Pavel Rojtberg et.al. | 2309.08482 | link |
| 2023-09-15 | Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM | Chenghao Shi et.al. | 2309.08086 | null |
| 2023-09-14 | Gradient based Grasp Pose Optimization on a NeRF that Approximates Grasp Success | Gergely Sóti et.al. | 2309.08040 | null |
| 2023-09-14 | TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting | Rohan Choudhury et.al. | 2309.07910 | null |
| 2023-09-14 | Towards Robust and Unconstrained Full Range of Rotation Head Pose Estimation | Thorsten Hempel et.al. | 2309.07654 | link |
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-14 | Unleashing the Power of Depth and Pose Estimation Neural Networks by Designing Compatible Endoscopic Images | Junyang Wu et.al. | 2309.07390 | null |
| 2023-09-13 | LInKs “Lifting Independent Keypoints” – Partial Pose Lifting for Occlusion Handling with Improved Accuracy in 2D-3D Human Pose Estimation | Peter Hardy et.al. | 2309.07243 | null |
| 2023-09-13 | 3D Active Metric-Semantic SLAM | Yuezhan Tao et.al. | 2309.06950 | null |
| 2023-09-11 | ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion | Hongyu Li et.al. | 2309.05662 | null |
| 2023-09-11 | Towards Intuitive HMI for UAV Control | Filip Zoric et.al. | 2309.05460 | null |
| 2023-09-12 | FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild | Jiong Wang et.al. | 2309.05073 | link |
| 2023-09-09 | Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation | Boyuan Jiang et.al. | 2309.04756 | link |
| 2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750 | null |
| 2023-09-08 | Robot Localization and Mapping Final Report – Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry | Akankshya Kar et.al. | 2309.04147 | null |
| 2023-09-07 | ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation | Hui Zhang et.al. | 2309.03891 | null |
| 2023-09-05 | An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria | Upender Kalwa et.al. | 2309.03235 | null |
| 2023-09-05 | A Robust Localization Solution for an Uncrewed Ground Vehicle in Unstructured Outdoor GNSS-Denied Environments | W. Jacob Wagner et.al. | 2309.02569 | null |
| 2023-09-05 | GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction | Youmin Zhang et.al. | 2309.02436 | link |
| 2023-09-05 | DR-Pose: A Two-stage Deformation-and-Registration Pipeline for Category-level 6D Object Pose Estimation | Lei Zhou et.al. | 2309.01925 | link |
| 2023-09-04 | On the Query Strategies for Efficient Online Active Distillation | Michele Boldo et.al. | 2309.01612 | null |
| 2023-09-04 | DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion | Cédric Rommel et.al. | 2309.01575 | null |
| 2023-09-06 | Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | Hanbing Liu et.al. | 2309.01365 | link |
| 2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | null |
| 2023-09-02 | Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis | Jerrin Bright et.al. | 2309.01010 | null |
| 2023-09-01 | Fusing Monocular Images and Sparse IMU Signals for Real-time Human Motion Capture | Shaohua Pan et.al. | 2309.00310 | link |
| 2023-08-31 | EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild | Manuel Kaufmann et.al. | 2308.16894 | link |
| 2023-08-31 | SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects | Ning Gao et.al. | 2308.16528 | null |
| 2023-08-30 | Two-Stage Violence Detection Using ViTPose and Classification Models at Smart Airports | İrem Üstek et.al. | 2308.16325 | link |
| 2023-08-30 | SignDiff: Learning Diffusion Models for American Sign Language Production | Sen Fang et.al. | 2308.16082 | null |
| 2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984 | null |
| 2023-08-30 | Reconstructing Groups of People with Hypergraph Relational Reasoning | Buzhen Huang et.al. | 2308.15844 | null |
| 2023-08-29 | 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking | Urs Waldmann et.al. | 2308.15316 | null |
| 2023-08-29 | Spatio-temporal MLP-graph network for 3D human pose estimation | Tanvir Hassan et.al. | 2308.15313 | link |
| 2023-08-29 | Pose-Free Neural Radiance Fields via Implicit Pose Regularization | Jiahui Zhang et.al. | 2308.15049 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-11-06 | TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains | Alexander Naumann et.al. | 2311.03124 | null |
| 2023-11-06 | An invariant feature extraction for multi-modal images matching | Chenzhong Gao et.al. | 2311.02842 | null |
| 2023-10-20 | Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification | Mateus Roder et.al. | 2310.13490 | null |
| 2023-10-12 | UniPose: Detecting Any Keypoints | Jie Yang et.al. | 2310.08530 | link |
| 2023-10-10 | l-dyno: framework to learn consistent visual features using robot’s motion | Kartikeya Singh et.al. | 2310.06249 | null |
| 2023-10-10 | Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face | Hao Zhang et.al. | 2310.05056 | null |
| 2023-10-13 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | link |
| 2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | link |
| 2023-10-22 | ObVi-SLAM: Long-Term Object-Visual SLAM | Amanda Adkins et.al. | 2309.15268 | link |
| 2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436 | link |
| 2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563 | null |
| 2023-09-17 | CryoAlign: feature-based method for global and local 3D alignment of EM density maps | Bintao He et.al. | 2309.09217 | null |
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750 | null |
| 2023-09-07 | InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | Zigang Geng et.al. | 2309.03895 | null |
| 2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | null |
| 2023-09-12 | Improving the matching of deformable objects by learning to detect keypoints | Felipe Cadar et.al. | 2309.00434 | link |
| 2023-08-31 | SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation | Jiaben Chen et.al. | 2308.16876 | null |
| 2023-08-30 | Learning Structure-from-Motion with Graph Attention Networks | Lucas Brynte et.al. | 2308.15984 | null |
| 2023-08-29 | A lightweight 3D dense facial landmark estimation model from position map data | Shubhajit Basak et.al. | 2308.15170 | null |
| 2023-08-27 | Automatic coarse co-registration of point clouds from diverse scan geometries: a test of detectors and descriptors | Francesco Pirotti et.al. | 2308.14047 | null |
| 2023-08-24 | VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition | Gengxuan Tian et.al. | 2308.12870 | null |
| 2023-08-22 | LDP-Feat: Image Features with Local Differential Privacy | Francesco Pittaluga et.al. | 2308.11223 | null |
| 2023-08-20 | Neural Interactive Keypoint Detection | Jie Yang et.al. | 2308.10174 | link |
| 2023-08-19 | ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment | Bingyang Zhou et.al. | 2308.09987 | null |
| 2023-09-03 | DeDoDe: Detect, Don’t Describe – Describe, Don’t Detect for Local Feature Matching | Johan Edstedt et.al. | 2308.08479 | link |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-10-06 | ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer | Yifan Xu et.al. | 2310.04099 | null |
| 2023-10-06 | Sub-token ViT Embedding via Stochastic Resonance Transformers | Dong Lao et.al. | 2310.03967 | null |
| 2023-10-04 | Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach | Matthew Hanlon et.al. | 2310.02650 | null |
| 2023-10-02 | NEUCORE: Neural Concept Reasoning for Composed Image Retrieval | Shu Zhao et.al. | 2310.01358 | null |
| 2023-10-02 | Leveraging Cutting Edge Deep Learning Based Image Matching for Reconstructing a Large Scene from Sparse Images | Georg Bökman et.al. | 2310.01092 | null |
| 2023-10-05 | PlaceNav: Topological Navigation through Place Recognition | Lauri Suomela et.al. | 2309.17260 | null |
| 2023-09-29 | Segment Anything Model is a Good Teacher for Local Feature Learning | Jingqian Wu et.al. | 2309.16992 | link |
| 2023-09-28 | Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning | Albert Mohwald et.al. | 2309.16351 | link |
| 2023-09-28 | FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding | Pengxiang Wu et.al. | 2309.16249 | link |
| 2023-09-28 | Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2309.16137 | null |
| 2023-09-27 | GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization | Vicente Vivanco Cepeda et.al. | 2309.16020 | null |
| 2023-09-27 | Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization | Zhenbo Song et.al. | 2309.15556 | null |
| 2023-09-26 | Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features | Hila Levi et.al. | 2309.14999 | null |
| 2023-09-23 | Resolving References in Visually-Grounded Dialogue via Text Generation | Bram Willemsen et.al. | 2309.13430 | link |
| 2023-09-21 | Face Identity-Aware Disentanglement in StyleGAN | Adrian Suwała et.al. | 2309.12033 | null |
| 2023-09-21 | On-the-Fly SfM: What you capture is What you get | Zongqian Zhan et.al. | 2309.11883 | null |
| 2023-09-20 | 2D-3D Pose Tracking with Multi-View Constraints | Huai Yu et.al. | 2309.11335 | null |
| 2023-09-19 | VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition | Adam D. Hines et.al. | 2309.10225 | link |
| 2023-09-11 | Introspective Deep Metric Learning | Chengkun Wang et.al. | 2309.09982 | null |
| 2023-09-18 | Decompose Semantic Shifts for Composed Image Retrieval | Xingyu Yang et.al. | 2309.09531 | null |
| 2023-09-16 | Efficient Object Rearrangement via Multi-view Fusion | Dehao Huang et.al. | 2309.08994 | null |
| 2023-09-16 | DynaMoN: Motion-Aware Fast And Robust Camera Localization for Dynamic NeRF | Mert Asim Karaoglu et.al. | 2309.08927 | null |
| 2023-09-15 | Active Learning for Fine-Grained Sketch-Based Image Retrieval | Himanshu Thakur et.al. | 2309.08743 | null |
| 2023-09-15 | Optimization of Rank Losses for Image Retrieval | Elias Ramzi et.al. | 2309.08250 | link |
| 2023-09-18 | Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer | Yaoting Wang et.al. | 2309.07929 | null |
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-13 | RadarLCD: Learnable Radar-based Loop Closure Detection Pipeline | Mirko Usuelli et.al. | 2309.07094 | null |
| 2023-09-11 | Towards Content-based Pixel Retrieval in Revisited Oxford and Paris | Guoyuan An et.al. | 2309.05438 | link |
| 2023-09-08 | Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning | Hiroki Nakamura et.al. | 2309.04148 | null |
| 2023-09-05 | Dual Relation Alignment for Composed Image Retrieval | Xintong Jiang et.al. | 2309.02169 | null |
| 2023-09-04 | NLLB-CLIP – train performant multilingual image retrieval model on a budget | Alexander Visheratin et.al. | 2309.01859 | null |
| 2023-09-04 | Target-Guided Composed Image Retrieval | Haokun Wen et.al. | 2309.01366 | null |
| 2023-09-02 | Deep supervised hashing for fast retrieval of radio image cubes | Steven Ndung’u et.al. | 2309.00932 | null |
| 2023-08-31 | Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval | Prateksha Udhayanan et.al. | 2308.16649 | null |
| 2023-08-28 | Extending Cross-Modal Retrieval with Interactive Learning to Improve Image Retrieval Performance in Forensics | Nils Böhne et.al. | 2308.14786 | null |
| 2023-08-28 | CoVR: Learning Composed Video Retrieval from Web Video Captions | Lucas Ventura et.al. | 2308.14746 | link |
| 2023-08-27 | Deep Learning for Visual Localization and Mapping: A Survey | Changhao Chen et.al. | 2308.14039 | null |
| 2023-08-26 | Learning Efficient Representations for Image-Based Patent Retrieval | Hongsong Wang et.al. | 2308.13749 | null |
| 2023-08-25 | Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers | Mohammad Javad Rajabi et.al. | 2308.13671 | null |
2023-10
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-11-13 | Pretrain like You Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval | Junyang Chen et.al. | 2311.07622 | null |
| 2023-11-13 | VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search | Shuting He et.al. | 2311.07514 | null |
| 2023-11-10 | Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval | Xin Lu et.al. | 2311.06067 | null |
| 2023-11-08 | Energy-efficient Wireless Image Retrieval for IoT Devices by Transmitting a TinyML Model | Junya Shiraishi et.al. | 2311.04788 | null |
| 2023-11-08 | Training CLIP models on Data from Scientific Papers | Calvin Metzger et.al. | 2311.04711 | link |
| 2023-11-07 | DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding | Kehinde Ajayi et.al. | 2311.04098 | link |
| 2023-11-06 | Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences | Zador Pataki et.al. | 2311.03345 | null |
| 2023-11-06 | FocusTune: Tuning Visual Localization through Focus-Guided Sampling | Son Tung Nguyen et.al. | 2311.02872 | link |
| 2023-11-01 | DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing | Gaoshuang Huang et.al. | 2311.00230 | null |
| 2023-10-29 | Identifiable Contrastive Learning with Automatic Feature Importance Discovery | Qi Zhang et.al. | 2310.18904 | link |
| 2023-10-27 | LipSim: A Provably Robust Perceptual Similarity Metric | Sara Ghazanfari et.al. | 2310.18274 | link |
| 2023-10-27 | Split Covariance Intersection Filter Based Visual Localization With Accurate AprilTag Map For Warehouse Robot Navigation | Susu Fang et.al. | 2310.17879 | null |
| 2023-10-25 | FoundLoc: Vision-based Onboard Aerial Localization in the Wild | Yao He et.al. | 2310.16299 | null |
| 2023-10-24 | Cross-view Self-localization from Synthesized Scene-graphs | Ryogo Yamamoto et.al. | 2310.15504 | null |
| 2023-10-23 | Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval | Xu Yuan et.al. | 2310.14637 | link |
| 2023-10-21 | Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation | Anastasia Kritharoula et.al. | 2310.14025 | link |
| 2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605 | null |
| 2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320 | link |
| 2023-10-27 | Representation Learning via Consistent Assignment of Views over Random Partitions | Thalles Silva et.al. | 2310.12692 | link |
| 2023-10-18 | Evaluating the Fairness of Discriminative Foundation Models in Computer Vision | Junaid Ali et.al. | 2310.11867 | null |
| 2023-10-17 | Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification | Shuanglin Yan et.al. | 2310.11210 | null |
| 2023-10-16 | Autonomous Mapping and Navigation using Fiducial Markers and Pan-Tilt Camera for Assisting Indoor Mobility of Blind and Visually Impaired People | Dharmateja Adapa et.al. | 2310.10290 | null |
| 2023-10-16 | EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge | Tom Bryan et.al. | 2310.10050 | null |
| 2023-10-15 | CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes | Yulei Qin et.al. | 2310.09761 | link |
| 2023-10-13 | Pairwise Similarity Learning is SimPLE | Yandong Wen et.al. | 2310.09449 | null |
| 2023-10-13 | Vision-by-Language for Training-Free Compositional Image Retrieval | Shyamgopal Karthik et.al. | 2310.09291 | null |
| 2023-10-12 | Hyp-UML: Hyperbolic Image Retrieval with Uncertainty-aware Metric Learning | Shiyang Yan et.al. | 2310.08390 | null |
| 2023-10-12 | Jointly Optimized Global-Local Visual Localization of UAVs | Haoling Li et.al. | 2310.08082 | null |
| 2023-10-10 | Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization | Le Chen et.al. | 2310.06984 | null |
| 2023-10-10 | Distillation Improves Visual Place Recognition for Low-Quality Queries | Anbang Yang et.al. | 2310.06906 | null |
| 2023-10-10 | Efficient Retrieval of Images with Irregular Patterns using Morphological Image Analysis: Applications to Industrial and Healthcare datasets | Jiajun Zhang et.al. | 2310.06566 | null |
| 2023-10-10 | Topological RANSAC for instance verification and retrieval without fine-tuning | Guoyuan An et.al. | 2310.06486 | null |
| 2023-10-10 | 3DS-SLAM: A 3D Object Detection based Semantic SLAM towards Dynamic Indoor Environments | Ghanta Sai Krishna et.al. | 2310.06385 | null |
| 2023-10-09 | Collaborative Visual Place Recognition | Yiming Li et.al. | 2310.05541 | null |
| 2023-10-09 | Sentence-level Prompts Benefit Composed Image Retrieval | Yang Bai et.al. | 2310.05473 | link |
| 2023-10-08 | AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition | Feng Lu et.al. | 2310.05184 | link |
| 2023-10-08 | LocoNeRF: A NeRF-based Approach for Local Structure from Motion for Precise Localization | Artem Nenashev et.al. | 2310.05134 | null |
| 2023-10-12 | ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer | Yifan Xu et.al. | 2310.04099 | null |
| 2023-10-06 | Sub-token ViT Embedding via Stochastic Resonance Transformers | Dong Lao et.al. | 2310.03967 | null |
| 2023-10-04 | Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach | Matthew Hanlon et.al. | 2310.02650 | null |
| 2023-10-02 | NEUCORE: Neural Concept Reasoning for Composed Image Retrieval | Shu Zhao et.al. | 2310.01358 | null |
| 2023-10-02 | Leveraging Cutting Edge Deep Learning Based Image Matching for Reconstructing a Large Scene from Sparse Images | Georg Bökman et.al. | 2310.01092 | null |
| 2023-10-05 | PlaceNav: Topological Navigation through Place Recognition | Lauri Suomela et.al. | 2309.17260 | null |
| 2023-09-29 | Segment Anything Model is a Good Teacher for Local Feature Learning | Jingqian Wu et.al. | 2309.16992 | link |
| 2023-09-28 | Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning | Albert Mohwald et.al. | 2309.16351 | link |
| 2023-09-28 | FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding | Pengxiang Wu et.al. | 2309.16249 | link |
| 2023-09-28 | Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2309.16137 | null |
| 2023-09-27 | GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization | Vicente Vivanco Cepeda et.al. | 2309.16020 | null |
| 2023-09-27 | Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization | Zhenbo Song et.al. | 2309.15556 | null |
| 2023-09-26 | Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features | Hila Levi et.al. | 2309.14999 | null |
| 2023-09-23 | Resolving References in Visually-Grounded Dialogue via Text Generation | Bram Willemsen et.al. | 2309.13430 | link |
| 2023-09-21 | Face Identity-Aware Disentanglement in StyleGAN | Adrian Suwała et.al. | 2309.12033 | null |
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312 | null |
| 2023-11-06 | Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction | Silvia Romero-Azpitarte et.al. | 2311.03146 | null |
| 2023-11-06 | Simultaneous Time Synchronization and Mutual Localization for Multi-robot System | Xiangyong Wen et.al. | 2311.02948 | null |
| 2023-11-06 | Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation | Xueyan Oh et.al. | 2311.02900 | null |
| 2023-11-06 | Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning | Nobline Yoo et.al. | 2311.02815 | link |
| 2023-11-03 | Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression | Jiaqi Wu et.al. | 2311.01782 | link |
| 2023-11-03 | Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation | Jiaqi Wu et.al. | 2311.01770 | null |
| 2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380 | link |
| 2023-11-01 | A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios | Wenyang Hu et.al. | 2311.00401 | null |
| 2023-10-31 | HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception | Junkun Yuan et.al. | 2310.20695 | link |
| 2023-10-31 | Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior | Qingqing Zhao et.al. | 2310.20249 | null |
| 2023-10-30 | FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Chaoyu Chen et.al. | 2310.19293 | null |
| 2023-10-29 | Distributed Nonlinear Filtering using Triangular Transport Maps | Daniel Grange et.al. | 2310.19000 | null |
| 2023-10-29 | TIC-TAC: A Framework To Learn And Evaluate Your Covariance | Megh Shukla et.al. | 2310.18953 | link |
| 2023-10-29 | Improving Multi-Person Pose Tracking with A Confidence Network | Zehua Fu et.al. | 2310.18920 | null |
| 2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874 | null |
| 2023-10-27 | ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation | Michael Zechmair et.al. | 2310.18009 | null |
| 2023-10-26 | Learning Extrinsic Dexterity with Parameterized Manipulation Primitives | Shih-Min Yang et.al. | 2310.17785 | null |
| 2023-10-26 | 6-DoF Stability Field via Diffusion Models | Takuma Yoneda et.al. | 2310.17649 | null |
| 2023-10-26 | SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation | Haobo Jiang et.al. | 2310.17359 | null |
| 2023-10-26 | Automatic Edge Error Judgment in Figure Skating Using 3D Pose Estimation from a Monocular Camera and IMUs | Ryota Tanaka et.al. | 2310.17193 | link |
| 2023-10-25 | Real-time 6-DoF Pose Estimation by an Event-based Camera using Active LED Markers | Gerald Ebmer et.al. | 2310.16618 | null |
| 2023-10-25 | ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors | Xiaoxuan Ma et.al. | 2310.16447 | link |
| 2023-10-25 | MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | Soroush Mehraban et.al. | 2310.16288 | link |
| 2023-10-25 | TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer | Xiao Lin et.al. | 2310.16279 | null |
| 2023-10-23 | Converting Depth Images and Point Clouds for Feature-based Pose Estimation | Robert Lösch et.al. | 2310.14924 | link |
| 2023-10-23 | Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings | Hazem Youssef et.al. | 2310.14914 | null |
| 2023-10-23 | Player Re-Identification Using Body Part Appearences | Mahesh Bhosale et.al. | 2310.14469 | null |
| 2023-10-20 | LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly | Bowen Fu et.al. | 2310.13819 | null |
| 2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605 | null |
| 2023-10-20 | ColAG: A Collaborative Air-Ground Framework for Perception-Limited UGVs’ Navigation | Zhehan Li et.al. | 2310.13324 | link |
| 2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320 | link |
| 2023-10-19 | Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey | Lijuan Zhou et.al. | 2310.13039 | null |
| 2023-10-19 | FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects | Mayank Lunayach et.al. | 2310.12974 | null |
| 2023-10-18 | Mesh Represented Recycle Learning for 3D Hand Pose and Mesh Estimation | Bosang Kim et.al. | 2310.12189 | null |
| 2023-10-18 | One-Shot Imitation Learning: A Pose Estimation Perspective | Pietro Vitiello et.al. | 2310.12077 | null |
| 2023-10-18 | ShapeGraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map | Ahmed Tawfik Aboukhadra et.al. | 2310.11811 | null |
| 2023-10-17 | Holistic Parking Slot Detection with Polygon-Shaped Representations | Lihao Wang et.al. | 2310.11629 | null |
| 2023-10-17 | Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication | Chelsey Edge et.al. | 2310.11536 | null |
| 2023-10-18 | AP $n$P: A Less-constrained P$n$ P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths | Jiaxin Wei et.al. | 2310.09982 | link |
| 2023-10-15 | Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior | Xiaotong Chen et.al. | 2310.09956 | null |
| 2023-10-15 | Socially reactive navigation models for mobile robots in dynamic environments | Ricarte Ribeiro et.al. | 2310.09916 | null |
| 2023-10-15 | MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection | David C. Jeong et.al. | 2310.09757 | null |
| 2023-10-16 | IMU Preintegration for Multi-Robot Systems in the Presence of Bias and Communication Constraints | Mohammed Ayman Shalaby et.al. | 2310.08686 | null |
| 2023-10-12 | Towards Design and Development of an ArUco Markers-Based Quantitative Surface Tactile Sensor | Ozdemir Can Kara et.al. | 2310.08398 | null |
| 2023-10-12 | Multimodal Active Measurement for Human Mesh Recovery in Close Proximity | Takahiro Maeda et.al. | 2310.08116 | null |
| 2023-10-12 | X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention | Yixuan Zhou et.al. | 2310.08042 | link |
| 2023-10-12 | PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction | Jia-Wang Bian et.al. | 2310.07449 | null |
| 2023-10-11 | SAGE-ICP: Semantic Information-Assisted ICP | Jiaming Cui et.al. | 2310.07237 | null |
| 2023-10-11 | DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation | Rong Wang et.al. | 2310.07206 | null |
| 2023-10-12 | FABind: Fast and Accurate Protein-Ligand Binding | Qizhi Pei et.al. | 2310.06763 | link |
| 2023-10-10 | EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation | Baichuan Huang et.al. | 2310.06751 | null |
| 2023-10-09 | Augmenting Vision-Based Human Pose Estimation with Rotation Matrix | Milad Vazan et.al. | 2310.06068 | null |
| 2023-10-07 | Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles | Elton F. de S. Soares et.al. | 2310.04837 | null |
| 2023-10-10 | 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report:A Concise Pipeline for Egocentric Hand Pose Reconstruction | Zhishan Zhou et.al. | 2310.04769 | null |
| 2023-10-06 | SwimXYZ: A large-scale dataset of synthetic swimming motions and videos | Fiche Guénolé et.al. | 2310.04360 | null |
| 2023-10-05 | BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields | Ágoston István Csehi et.al. | 2310.03563 | null |
| 2023-10-05 | 3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation | Chen Zhao et.al. | 2310.03534 | null |
| 2023-10-05 | RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation | Boshi An et.al. | 2310.03478 | null |
| 2023-10-05 | Cyber Physical System Information Collection: Robot Location and Navigation Method Based on QR Code | Hongwei Li et.al. | 2310.03470 | null |
| 2023-10-04 | Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC | Hongyi Fan et.al. | 2310.02719 | null |
| 2023-10-05 | USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields | Moyang Li et.al. | 2310.02687 | null |
| 2023-10-03 | Beyond the Benchmark: Detecting Diverse Anomalies in Videos | Yoav Arad et.al. | 2310.01904 | link |
| 2023-10-03 | MFOS: Model-Free & One-Shot Object Pose Estimation | JongMin Lee et.al. | 2310.01897 | null |
| 2023-10-02 | LEAP: Liberate Sparse-view 3D Modeling from Camera Poses | Hanwen Jiang et.al. | 2310.01410 | null |
| 2023-10-02 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | null |
| 2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | link |
| 2023-09-30 | Diff-DOPE: Differentiable Deep Object Pose Estimation | Jonathan Tremblay et.al. | 2310.00463 | null |
| 2023-09-29 | Diver Identification Using Anthropometric Data Ratios for Underwater Multi-Human-Robot Collaboration | Jungseok Hong et.al. | 2310.00146 | null |
| 2023-09-29 | Denoising and Selecting Pseudo-Heatmaps for Semi-Supervised Human Pose Estimation | Zhuoran Yu et.al. | 2310.00099 | null |
| 2023-09-29 | Revisiting Cephalometric Landmark Detection from the view of Human Pose Estimation with Lightweight Super-Resolution Head | Qian Wu et.al. | 2309.17143 | link |
| 2023-09-29 | AdaPose: Towards Cross-Site Device-Free Human Pose Estimation with Commodity WiFi | Yunjiao Zhou et.al. | 2309.16964 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-11-27 | A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor | Jialin Liu et.al. | 2311.15609 | null |
| 2023-11-21 | Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers | Bo Sun et.al. | 2311.12291 | null |
| 2023-11-20 | CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement | Boni Hu et.al. | 2311.11604 | link |
| 2023-11-17 | Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration | Paul J. Claasen et.al. | 2311.10361 | null |
| 2023-11-13 | Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning | Tomáš Kunzo et.al. | 2311.07398 | null |
| 2023-11-11 | CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer | Haoyu Ma et.al. | 2311.06443 | null |
| 2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699 | null |
| 2023-11-06 | TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains | Alexander Naumann et.al. | 2311.03124 | link |
| 2023-11-06 | An invariant feature extraction for multi-modal images matching | Chenzhong Gao et.al. | 2311.02842 | null |
| 2023-10-20 | Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification | Mateus Roder et.al. | 2310.13490 | null |
| 2023-10-12 | UniPose: Detecting Any Keypoints | Jie Yang et.al. | 2310.08530 | link |
| 2023-10-10 | l-dyno: framework to learn consistent visual features using robot’s motion | Kartikeya Singh et.al. | 2310.06249 | null |
| 2023-10-10 | Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face | Hao Zhang et.al. | 2310.05056 | null |
| 2023-10-13 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | link |
| 2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | link |
| 2023-10-22 | ObVi-SLAM: Long-Term Object-Visual SLAM | Amanda Adkins et.al. | 2309.15268 | link |
| 2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436 | link |
| 2023-09-18 | RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy | Mert Asim Karaoglu et.al. | 2309.09563 | null |
| 2023-09-17 | CryoAlign: feature-based method for global and local 3D alignment of EM density maps | Bintao He et.al. | 2309.09217 | null |
| 2023-09-14 | EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization | Minjung Kim et.al. | 2309.07471 | link |
| 2023-09-09 | Mirror-Aware Neural Humans | Daniel Ajisafe et.al. | 2309.04750 | null |
| 2023-09-07 | InstructDiffusion: A Generalist Modeling Interface for Vision Tasks | Zigang Geng et.al. | 2309.03895 | null |
| 2023-09-04 | SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras | Himanshu Pahadia et.al. | 2309.01324 | null |
2023-11
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141 | null |
| 2023-12-04 | SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Nikhil Keetha et.al. | 2312.02126 | null |
| 2023-12-04 | Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection | Xubin Zhong et.al. | 2312.01713 | null |
| 2023-12-04 | Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Yizhou Wang et.al. | 2312.01697 | link |
| 2023-12-04 | Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks | Yan Xu et.al. | 2312.01561 | null |
| 2023-12-01 | Object 6D pose estimation meets zero-shot learning | Andrea Caraffa et.al. | 2312.00947 | null |
| 2023-12-01 | Open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2312.00690 | null |
| 2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500 | null |
| 2023-12-01 | Learning Unorthogonalized Matrices for Rotation Estimation | Kerui Gu et.al. | 2312.00462 | null |
| 2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836 | null |
| 2023-11-30 | FoundPose: Unseen Object Pose Estimation with Foundation Features | Evin Pınar Örnek et.al. | 2311.18809 | null |
| 2023-11-30 | Pose Estimation and Tracking for ASIST | Ari Goodman et.al. | 2311.18665 | null |
| 2023-11-29 | A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem | Wolfgang Hoegele et.al. | 2311.18107 | null |
| 2023-11-29 | Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2311.17891 | link |
| 2023-11-29 | Cinematic Behavior Transfer via NeRF-based Differentiable Filming | Xuekun Jiang et.al. | 2311.17754 | null |
| 2023-11-29 | PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens | Sebastian Stapf et.al. | 2311.17504 | null |
| 2023-11-28 | On the Calibration of Human Pose Estimation | Kerui Gu et.al. | 2311.17105 | null |
| 2023-11-28 | Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence | Junyi Zhang et.al. | 2311.17034 | null |
| 2023-11-28 | HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors | Shutong Zhang et.al. | 2311.16552 | null |
| 2023-11-28 | Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement | Jian Wang et.al. | 2311.16495 | null |
| 2023-11-24 | UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | Zhongyu Jiang et.al. | 2311.16477 | null |
| 2023-11-27 | DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization | Zhaoyang Xia et.al. | 2311.16060 | link |
| 2023-11-27 | Uncertainty Quantification of Set-Membership Estimation in Control and Perception: Revisiting the Minimum Enclosing Ellipsoid | Yukai Tang et.al. | 2311.15962 | null |
| 2023-11-27 | Computer Vision for Carriers: PATRIOT | Ari Goodman et.al. | 2311.15914 | null |
| 2023-11-27 | SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation | Jiehong Lin et.al. | 2311.15707 | link |
| 2023-11-24 | RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling | Xiaoyue Wan et.al. | 2311.14242 | null |
| 2023-11-23 | Appearance-based gaze estimation enhanced with synthetic images using deep neural networks | Dmytro Herashchenko et.al. | 2311.14175 | link |
| 2023-11-23 | GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence | Van Nguyen Nguyen et.al. | 2311.14155 | link |
| 2023-11-23 | GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence | Pengyuan Wang et.al. | 2311.13777 | null |
| 2023-11-22 | HEViTPose: High-Efficiency Vision Transformer for Human Pose Estimation | Chengpeng Wu et.al. | 2311.13615 | link |
| 2023-11-24 | Calibration System and Algorithm Design for a Soft Hinged Micro Scanning Mirror with a Triaxial Hall Effect Sensor | Di Wang et.al. | 2311.12778 | null |
| 2023-11-21 | HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation | Yongliang Lin et.al. | 2311.12588 | null |
| 2023-11-21 | CoVOR-SLAM: Cooperative SLAM using Visual Odometry and Ranges for Multi-Robot Systems | Young-Hee Lee et.al. | 2311.12580 | null |
| 2023-11-21 | HCA-Net: Hierarchical Context Attention Network for Intervertebral Disc Semantic Labeling | Afshin Bozorgpour et.al. | 2311.12486 | link |
| 2023-11-21 | Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency | Christian Keilstrup Ingwersen et.al. | 2311.12421 | null |
| 2023-11-20 | Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models | Pooya Fayyazsanavi et.al. | 2311.12128 | link |
| 2023-11-20 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | Wenhao Li et.al. | 2311.12028 | null |
| 2023-11-20 | SniffyArt: The Dataset of Smelling Persons | Mathias Zinnen et.al. | 2311.11888 | null |
| 2023-11-21 | Robot Hand-Eye Calibration using Structure-from-Motion | Nicolas Andreff et.al. | 2311.11808 | null |
| 2023-11-18 | SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation | Yamei Chen et.al. | 2311.11125 | null |
| 2023-11-18 | Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment | Parth Rawal et.al. | 2311.11039 | null |
| 2023-11-18 | Multiple View Geometry Transformers for 3D Human Pose Estimation | Ziwei Liao et.al. | 2311.10983 | null |
| 2023-11-18 | Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process | Zixun Huang et.al. | 2311.10918 | null |
| 2023-11-17 | BiHRNet: A Binary high-resolution network for Human Pose Estimation | Zhicheng Zhang et.al. | 2311.10296 | null |
| 2023-11-16 | Match and Locate: low-frequency monocular odometry based on deep feature matching | Stepan Konev et.al. | 2311.10034 | null |
| 2023-11-16 | LIO-EKF: High Frequency LiDAR-Inertial Odometry using Extended Kalman Filters | Yibin Wu et.al. | 2311.09887 | null |
| 2023-11-16 | Improved TokenPose with Sparsity | Anning Li et.al. | 2311.09653 | null |
| 2023-11-16 | Pseudo-keypoints RKHS Learning for Self-supervised 6DoF Pose Estimation | Yangzheng Wu et.al. | 2311.09500 | null |
| 2023-11-15 | NormNet: Scale Normalization for 6D Pose Estimation in Stacked Scenarios | En-Te Lin et.al. | 2311.09269 | link |
| 2023-11-15 | Range-Visual-Inertial Sensor Fusion for Micro Aerial Vehicle Localization and Navigation | Abhishek Goudar et.al. | 2311.09056 | link |
| 2023-11-14 | LocaliseBot: Multi-view 3D object localisation with differentiable rendering for robot grasping | Sujal Vijayaraghavan et.al. | 2311.08438 | null |
| 2023-11-13 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | Ziyi Lin et.al. | 2311.07575 | link |
| 2023-11-13 | Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers | Luca Lach et.al. | 2311.07257 | link |
| 2023-11-10 | CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM | Ruben Sanchez-Garcia et.al. | 2311.06194 | null |
| 2023-11-10 | 2D Image head pose estimation via latent space regression under occlusion settings | José Celestino et.al. | 2311.06038 | link |
| 2023-11-10 | Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous | Ziwei Wang et.al. | 2311.05992 | null |
| 2023-11-10 | A Practical Guide to Implementing Off-Axis Stereo Projection Using Existing Ray Tracing Libraries | Stefan Zellmann et.al. | 2311.05887 | null |
| 2023-11-09 | Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking | Mederic Fourmy et.al. | 2311.05344 | null |
| 2023-11-09 | Spatial Attention-based Distribution Integration Network for Human Pose Estimation | Sihan Gao et.al. | 2311.05323 | null |
| 2023-11-09 | SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event Sensing | Arunkumar Rathinam et.al. | 2311.05310 | null |
| 2023-11-09 | Differentiable Cloth Parameter Identification and State Estimation in Manipulation | Dongzhe Zheng et.al. | 2311.05141 | null |
| 2023-11-09 | POISE: Pose Guided Human Silhouette Extraction under Occlusions | Arindam Dutta et.al. | 2311.05077 | link |
| 2023-11-08 | Active Transfer Learning for Efficient Video-Specific Human Pose Estimation | Hiromu Taketsugu et.al. | 2311.05041 | link |
| 2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699 | null |
| 2023-11-09 | Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations | Xiaoting Yin et.al. | 2311.04591 | link |
| 2023-11-08 | Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images | Nishant Jain et.al. | 2311.04521 | null |
| 2023-11-08 | PLV-IEKF: Consistent Visual-Inertial Odometry using Points, Lines, and Vanishing Points | Tong Hua et.al. | 2311.04477 | null |
| 2023-11-08 | UP-NeRF: Unconstrained Pose-Prior-Free Neural Radiance Fields | Injae Kim et.al. | 2311.03784 | null |
| 2023-11-06 | A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation | Qitao Zhao et.al. | 2311.03312 | null |
| 2023-11-06 | Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction | Silvia Romero-Azpitarte et.al. | 2311.03146 | null |
| 2023-11-06 | Simultaneous Time Synchronization and Mutual Localization for Multi-robot System | Xiangyong Wen et.al. | 2311.02948 | null |
| 2023-11-06 | Initialisation of Autonomous Aircraft Visual Inspection Systems via CNN-Based Camera Pose Estimation | Xueyan Oh et.al. | 2311.02900 | null |
| 2023-11-06 | Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning | Nobline Yoo et.al. | 2311.02815 | link |
| 2023-11-03 | Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression | Jiaqi Wu et.al. | 2311.01782 | link |
| 2023-11-03 | Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation | Jiaqi Wu et.al. | 2311.01770 | null |
| 2023-11-02 | Sim2Real Bilevel Adaptation for Object Surface Classification using Vision-Based Tactile Sensors | Gabriele M. Caddeo et.al. | 2311.01380 | link |
| 2023-11-01 | A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios | Wenyang Hu et.al. | 2311.00401 | null |
| 2023-10-31 | HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception | Junkun Yuan et.al. | 2310.20695 | link |
| 2023-10-31 | Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior | Qingqing Zhao et.al. | 2310.20249 | null |
| 2023-10-30 | FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound | Chaoyu Chen et.al. | 2310.19293 | null |
| 2023-10-29 | Distributed Nonlinear Filtering using Triangular Transport Maps | Daniel Grange et.al. | 2310.19000 | null |
| 2023-10-29 | TIC-TAC: A Framework To Learn And Evaluate Your Covariance | Megh Shukla et.al. | 2310.18953 | link |
| 2023-10-29 | Improving Multi-Person Pose Tracking with A Confidence Network | Zehua Fu et.al. | 2310.18920 | null |
| 2023-10-29 | HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration | Weiyi Xue et.al. | 2310.18874 | null |
| 2023-10-27 | ProcNet: Deep Predictive Coding Model for Robust-to-occlusion Visual Segmentation and Pose Estimation | Michael Zechmair et.al. | 2310.18009 | null |
| 2023-10-26 | Learning Extrinsic Dexterity with Parameterized Manipulation Primitives | Shih-Min Yang et.al. | 2310.17785 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2023-12-11 | Dynamic Weighted Combiner for Mixed-Modal Image Retrieval | Fuxiang Huang et.al. | 2312.06179 | null |
| 2023-12-06 | Lite-Mind: Towards Efficient and Versatile Brain Representation Network | Zixuan Gong et.al. | 2312.03781 | null |
| 2023-12-08 | FreestyleRet: Retrieving Images from Style-Diversified Queries | Hao Li et.al. | 2312.02428 | link |
| 2023-12-04 | Implicit Learning of Scene Geometry from Poses for Global Localization | Mohammad Altillawi et.al. | 2312.02029 | null |
| 2023-12-04 | Language-only Efficient Training of Zero-shot Composed Image Retrieval | Geonmo Gu et.al. | 2312.01998 | link |
| 2023-12-03 | G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training | Che Liu et.al. | 2312.01522 | null |
| 2023-12-01 | Improve Supervised Representation Learning with Masked Image Modeling | Kaifeng Chen et.al. | 2312.00950 | null |
| 2023-12-05 | Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | Walid Bousselham et.al. | 2312.00878 | link |
| 2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500 | null |
| 2023-11-30 | HKUST at SemEval-2023 Task 1: Visual Word Sense Disambiguation with Context Augmentation and Visual Assistance | Zhuohao Yin et.al. | 2311.18273 | link |
| 2023-11-30 | Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models | Raviteja Vemulapalli et.al. | 2311.18237 | null |
| 2023-11-29 | Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce | Chang Liu et.al. | 2311.17954 | null |
| 2023-11-28 | Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames | Chao Chen et.al. | 2311.17940 | null |
| 2023-11-29 | 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries | Huajian Huang et.al. | 2311.17389 | null |
| 2023-11-27 | Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation | Samuele Poppi et.al. | 2311.16254 | link |
| 2023-11-27 | Optimal Transport Aggregation for Visual Place Recognition | Sergio Izquierdo et.al. | 2311.15937 | link |
| 2023-11-27 | AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval | Shicheng Xu et.al. | 2311.14084 | null |
| 2023-11-23 | 3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology | Asma Ben Abacha et.al. | 2311.13752 | null |
| 2023-11-22 | Medical Image Retrieval Using Pretrained Embeddings | Farnaz Khun Jush et.al. | 2311.13547 | null |
| 2023-11-22 | Applications of Spiking Neural Networks in Visual Place Recognition | Somayeh Hussaini et.al. | 2311.13186 | link |
| 2023-11-21 | Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval | Xiu-Shen Wei et.al. | 2311.12894 | null |
| 2023-11-19 | From Categories to Classifier: Name-Only Continual Learning by Exploring the Web | Ameya Prabhu et.al. | 2311.11293 | null |
| 2023-11-18 | Lesion Search with Self-supervised Learning | Kristin Qi et.al. | 2311.11014 | null |
| 2023-11-15 | Flow reconstruction and particle characterization from inertial Lagrangian tracks | Ke Zhou et.al. | 2311.09076 | null |
| 2023-11-15 | Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval | Junyang Chen et.al. | 2311.07622 | null |
| 2023-11-13 | VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search | Shuting He et.al. | 2311.07514 | null |
| 2023-11-10 | Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval | Xin Lu et.al. | 2311.06067 | null |
| 2023-11-08 | Energy-efficient Wireless Image Retrieval for IoT Devices by Transmitting a TinyML Model | Junya Shiraishi et.al. | 2311.04788 | null |
| 2023-11-08 | Training CLIP models on Data from Scientific Papers | Calvin Metzger et.al. | 2311.04711 | link |
| 2023-11-07 | DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding | Kehinde Ajayi et.al. | 2311.04098 | link |
| 2023-11-06 | Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences | Zador Pataki et.al. | 2311.03345 | null |
| 2023-11-06 | FocusTune: Tuning Visual Localization through Focus-Guided Sampling | Son Tung Nguyen et.al. | 2311.02872 | link |
| 2023-11-01 | DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing | Gaoshuang Huang et.al. | 2311.00230 | null |
| 2023-10-29 | Identifiable Contrastive Learning with Automatic Feature Importance Discovery | Qi Zhang et.al. | 2310.18904 | link |
| 2023-10-27 | LipSim: A Provably Robust Perceptual Similarity Metric | Sara Ghazanfari et.al. | 2310.18274 | link |
| 2023-10-27 | Split Covariance Intersection Filter Based Visual Localization With Accurate AprilTag Map For Warehouse Robot Navigation | Susu Fang et.al. | 2310.17879 | null |
| 2023-10-25 | FoundLoc: Vision-based Onboard Aerial Localization in the Wild | Yao He et.al. | 2310.16299 | null |
| 2023-10-24 | Cross-view Self-localization from Synthesized Scene-graphs | Ryogo Yamamoto et.al. | 2310.15504 | null |
| 2023-10-23 | Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval | Xu Yuan et.al. | 2310.14637 | link |
| 2023-10-21 | Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation | Anastasia Kritharoula et.al. | 2310.14025 | link |
| 2023-10-20 | FMRT: Learning Accurate Feature Matching with Reconciliatory Transformer | Xinyu Zhang et.al. | 2310.13605 | null |
| 2023-10-20 | CylinderTag: An Accurate and Flexible Marker for Cylinder-Shape Objects Pose Estimation Based on Projective Invariants | Shaoan Wang et.al. | 2310.13320 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-27 | Bezier-based Regression Feature Descriptor for Deformable Linear Objects | Fangqing Chen et.al. | 2312.16502 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-22 | BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions | Elias Marks et.al. | 2312.14706 | null |
| 2023-12-19 | Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation | Jiaming Liu et.al. | 2312.12480 | null |
| 2023-12-19 | An effective image copy-move forgery detection using entropy image | Zhaowei Lu et.al. | 2312.11793 | null |
| 2023-12-11 | VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data | Jian Shi et.al. | 2312.08871 | link |
| 2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865 | null |
| 2023-12-01 | Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version) | Emma Cramer et.al. | 2312.00592 | null |
| 2023-11-30 | Utilizing Radiomic Feature Analysis For Automated MRI Keypoint Detection: Enhancing Graph Applications | Sahar Almahfouz Nasser et.al. | 2311.18281 | null |
| 2023-11-29 | Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features | Thomas Wimmer et.al. | 2311.18113 | null |
| 2023-11-28 | Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features | Niladri Shekhar Dutt et.al. | 2311.17024 | null |
| 2023-11-28 | Riemannian Self-Attention Mechanism for SPD Networks | Rui Wang et.al. | 2311.16738 | null |
| 2023-11-27 | A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor | Jialin Liu et.al. | 2311.15609 | null |
| 2023-11-21 | Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers | Bo Sun et.al. | 2311.12291 | null |
| 2023-11-20 | CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement | Boni Hu et.al. | 2311.11604 | link |
| 2023-11-17 | Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration | Paul J. Claasen et.al. | 2311.10361 | null |
| 2023-11-13 | Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning | Tomáš Kunzo et.al. | 2311.07398 | null |
| 2023-11-11 | CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer | Haoyu Ma et.al. | 2311.06443 | null |
| 2023-11-08 | 3D Pose Estimation of Tomato Peduncle Nodes using Deep Keypoint Detection and Point Cloud | Jianchao Ci et.al. | 2311.04699 | null |
| 2023-11-06 | TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains | Alexander Naumann et.al. | 2311.03124 | link |
| 2023-11-06 | An invariant feature extraction for multi-modal images matching | Chenzhong Gao et.al. | 2311.02842 | null |
| 2023-10-20 | Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification | Mateus Roder et.al. | 2310.13490 | null |
| 2023-10-12 | UniPose: Detecting Any Keypoints | Jie Yang et.al. | 2310.08530 | link |
| 2023-10-10 | l-dyno: framework to learn consistent visual features using robot’s motion | Kartikeya Singh et.al. | 2310.06249 | null |
| 2023-10-10 | Language-driven Open-Vocabulary Keypoint Detection for Animal Body and Face | Hao Zhang et.al. | 2310.05056 | null |
| 2023-10-13 | H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation | Yanjie Ze et.al. | 2310.01404 | link |
| 2023-10-04 | Self-supervised Learning of Contextualized Local Visual Embeddings | Thalles Santos Silva et.al. | 2310.00527 | link |
| 2023-10-22 | ObVi-SLAM: Long-Term Object-Visual SLAM | Amanda Adkins et.al. | 2309.15268 | link |
| 2023-09-19 | LiDAR-Generated Images Derived Keypoints Assisted Point Cloud Registration Scheme in Odometry Estimation | Haizhou Zhang et.al. | 2309.10436 | link |
2023-12
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-07 | RHOBIN Challenge: Reconstruction of Human Object Interaction | Xianghui Xie et.al. | 2401.04143 | null |
| 2024-01-08 | D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement | Danqi Yan et.al. | 2401.03914 | null |
| 2024-01-07 | Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems | Victor Adewopo et.al. | 2401.03587 | null |
| 2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383 | null |
| 2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher et.al. | 2401.02357 | null |
| 2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer et.al. | 2401.02281 | null |
| 2024-01-03 | Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique | Ekram Alam et.al. | 2401.01587 | null |
| 2024-01-05 | PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization | Jiaming He et.al. | 2401.01081 | link |
| 2023-12-30 | 3D Human Pose Perception from Egocentric Stereo Videos | Hiroyasu Akada et.al. | 2401.00889 | null |
| 2024-01-01 | Geometry Depth Consistency in RGBD Relative Pose Estimation | Sourav Kumar et.al. | 2401.00639 | null |
| 2023-12-30 | A comprehensive framework for occluded human pose estimation | Linhao Xu et.al. | 2401.00155 | null |
| 2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-29 | MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments | Andrew Fishberg et.al. | 2312.17731 | null |
| 2023-12-28 | iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views | Chin-Hsuan Wu et.al. | 2312.17250 | link |
| 2023-12-28 | EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion | Jianping Jiang et.al. | 2312.16933 | null |
| 2023-12-28 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Zikang Yuan et.al. | 2312.16800 | link |
| 2023-12-28 | L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry | Feiya Li et.al. | 2312.16787 | null |
| 2023-12-27 | HMP: Hand Motion Priors for Pose and Shape Estimation from Video | Enes Duran et.al. | 2312.16737 | null |
| 2023-12-27 | Camera calibration for the surround-view system: a benchmark and dataset | L Qin et.al. | 2312.16499 | null |
| 2023-12-24 | TEMP3D: Temporally Continuous 3D Human Pose Estimation Under Occlusions | Rohit Lal et.al. | 2312.16221 | null |
| 2023-12-26 | Graph Context Transformation Learning for Progressive Correspondence Pruning | Junwen Guo et.al. | 2312.15971 | null |
| 2023-12-25 | Lifting by Image – Leveraging Image Cues for Accurate 3D Human Pose Estimation | Feng Zhou et.al. | 2312.15636 | null |
| 2023-12-25 | APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond | Yuxiang Yang et.al. | 2312.15612 | null |
| 2023-12-23 | PACE: Pose Annotations in Cluttered Environments | Yang You et.al. | 2312.15130 | link |
| 2023-12-22 | PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF | Mohsen Gholami et.al. | 2312.14915 | link |
| 2023-12-22 | Harnessing Diffusion Models for Visual Perception with Meta Prompts | Qiang Wan et.al. | 2312.14733 | link |
| 2023-12-22 | Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization | Joaquin Rodriguez et.al. | 2312.14697 | null |
| 2023-12-22 | PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer | Neha Sengar et.al. | 2312.14577 | null |
| 2023-12-22 | Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning | Jay Shenoy et.al. | 2312.14432 | null |
| 2023-12-21 | 3D Pose Estimation of Two Interacting Hands from a Monocular Event Camera | Christen Millerdurai et.al. | 2312.14157 | null |
| 2023-12-21 | DUSt3R: Geometric 3D Vision Made Easy | Shuzhe Wang et.al. | 2312.14132 | null |
| 2023-12-20 | NeRF-VO: Real-Time Sparse Visual Odometry with Neural Radiance Fields | Jens Naumann et.al. | 2312.13471 | null |
| 2023-12-20 | Brain-Inspired Visual Odometry: Balancing Speed and Interpretability through a System of Systems Approach | Habib Boloorchi Tabrizi et.al. | 2312.13162 | null |
| 2023-12-18 | Unified framework for diffusion generative models in SO(3): applications in computer vision and astrophysics | Yesukhei Jagvaral et.al. | 2312.11707 | null |
| 2023-12-18 | Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface | Vicu-Mihalis Maer et.al. | 2312.11401 | null |
| 2023-12-17 | SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation | Xiaoqi An et.al. | 2312.10758 | link |
| 2023-12-17 | PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields | Boming Zhao et.al. | 2312.10649 | null |
| 2023-12-15 | SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation | David C. Jeong et.al. | 2312.10195 | null |
| 2023-12-14 | iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching | Yuan Sun et.al. | 2312.09031 | null |
| 2023-12-14 | Scene 3-D Reconstruction System in Scattering Medium | Zhuoyifan Zhang et.al. | 2312.09005 | null |
| 2023-12-14 | CattleEyeView: A Multi-task Top-down View Cattle Dataset for Smarter Precision Livestock Farming | Kian Eng Ong et.al. | 2312.08764 | link |
| 2023-12-20 | PnP for Two-Dimensional Pose Estimation | Joshua Wang et.al. | 2312.08488 | null |
| 2023-12-13 | Pose and shear-based tactile servoing | John Lloyd et.al. | 2312.08411 | null |
| 2023-12-13 | FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | Bowen Wen et.al. | 2312.08344 | null |
| 2023-12-13 | Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation | Arul Selvam Periyasamy et.al. | 2312.08268 | null |
| 2023-12-13 | CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation | Eugenio Chisari et.al. | 2312.08240 | null |
| 2023-12-13 | C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation | Florian Fervers et.al. | 2312.08060 | null |
| 2023-12-13 | Three-Filters-to-Normal+: Revisiting Discontinuity Discrimination in Depth-to-Normal Translation | Jingwei Yang et.al. | 2312.07964 | null |
| 2023-12-13 | Diffusion Models Enable Zero-Shot Pose Estimation for Lower-Limb Prosthetic Users | Tianxun Zhou et.al. | 2312.07854 | null |
| 2023-12-12 | RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | Peng Lu et.al. | 2312.07526 | link |
| 2023-12-12 | COLMAP-Free 3D Gaussian Splatting | Yang Fu et.al. | 2312.07504 | null |
| 2023-12-12 | RMS: Redundancy-Minimizing Point Cloud Sampling for Real-Time Pose Estimation in Degenerated Environments | Pavel Petracek et.al. | 2312.07337 | link |
| 2023-12-12 | Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs | Sunghwan Hong et.al. | 2312.07246 | link |
| 2023-12-12 | Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation | Yuchen Yang et.al. | 2312.07051 | null |
| 2023-12-12 | Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation | Nikhil Kashyap et.al. | 2312.06965 | null |
| 2023-12-12 | Exploring Novel Object Recognition and Spontaneous Location Recognition Machine Learning Analysis Techniques in Alzheimer’s Mice | Soham Bafana et.al. | 2312.06914 | link |
| 2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865 | null |
| 2023-12-11 | Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input | Trung-Hieu Hoang et.al. | 2312.06797 | null |
| 2023-12-11 | 3D Hand Pose Estimation in Egocentric Images in the Wild | Aditya Prakash et.al. | 2312.06583 | null |
| 2023-12-11 | PointVoxel: A Simple and Effective Pipeline for Multi-View Multi-Modal 3D Human Pose Estimation | Zhiyu Pan et.al. | 2312.06409 | null |
| 2023-12-11 | ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation | Cédric Rommel et.al. | 2312.06386 | null |
| 2023-12-10 | From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation | Javier Tirado-Garín et.al. | 2312.05995 | link |
| 2023-12-09 | You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception | Sheng Jin et.al. | 2312.05525 | null |
| 2023-12-07 | Image and AIS Data Fusion Technique for Maritime Computer Vision Applications | Emre Gülsoylu et.al. | 2312.05270 | null |
| 2023-12-07 | Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection | Kohei Yamashita et.al. | 2312.04527 | null |
| 2023-12-07 | Detecting and Restoring Non-Standard Hands in Stable Diffusion Generated Images | Yiqun Zhang et.al. | 2312.04236 | null |
| 2023-12-06 | Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning | Xinshun Wang et.al. | 2312.03703 | link |
| 2023-12-06 | Cooperative Probabilistic Trajectory Forecasting under Occlusion | Anshul Nayak et.al. | 2312.03296 | null |
| 2023-12-05 | A Unified Simulation Framework for Visual and Behavioral Fidelity in Crowd Analysis | Niccolò Bisagno et.al. | 2312.02613 | null |
| 2023-12-05 | 6D Assembly Pose Estimation by Point Cloud Registration for Robot Manipulation | K. Samarawickrama et.al. | 2312.02593 | link |
| 2023-12-05 | PolyFit: A Peg-in-hole Assembly Framework for Unseen Polygon Shapes via Sim-to-real Adaptation | Geonhyup Lee et.al. | 2312.02531 | null |
| 2023-12-04 | GenEM: Physics-Informed Generative Cryo-Electron Microscopy | Jiakai Zhang et.al. | 2312.02235 | null |
| 2023-12-02 | Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors | Yu Zhang et.al. | 2312.02196 | null |
| 2023-12-04 | iMatching: Imperative Correspondence Learning | Zitong Zhan et.al. | 2312.02141 | null |
| 2023-12-04 | SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM | Nikhil Keetha et.al. | 2312.02126 | null |
| 2023-12-04 | Disentangled Interaction Representation for One-Stage Human-Object Interaction Detection | Xubin Zhong et.al. | 2312.01713 | null |
| 2023-12-05 | Hulk: A Universal Knowledge Translator for Human-Centric Tasks | Yizhou Wang et.al. | 2312.01697 | link |
| 2023-12-04 | Multi-View Person Matching and 3D Pose Estimation with Arbitrary Uncalibrated Camera Networks | Yan Xu et.al. | 2312.01561 | null |
| 2023-12-01 | Object 6D pose estimation meets zero-shot learning | Andrea Caraffa et.al. | 2312.00947 | null |
| 2023-12-01 | Open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2312.00690 | null |
| 2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500 | null |
| 2023-12-01 | Learning Unorthogonalized Matrices for Rotation Estimation | Kerui Gu et.al. | 2312.00462 | null |
| 2023-11-30 | PoseGPT: Chatting about 3D Human Pose | Yao Feng et.al. | 2311.18836 | null |
| 2023-11-30 | FoundPose: Unseen Object Pose Estimation with Foundation Features | Evin Pınar Örnek et.al. | 2311.18809 | null |
| 2023-11-30 | Pose Estimation and Tracking for ASIST | Ari Goodman et.al. | 2311.18665 | null |
| 2023-11-29 | A Stochastic-Geometrical Framework for Object Pose Estimation based on Mixture Models Avoiding the Correspondence Problem | Wolfgang Hoegele et.al. | 2311.18107 | null |
| 2023-11-29 | Pose Anything: A Graph-Based Approach for Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2311.17891 | link |
| 2023-11-29 | Cinematic Behavior Transfer via NeRF-based Differentiable Filming | Xuekun Jiang et.al. | 2311.17754 | null |
| 2023-11-29 | PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens | Sebastian Stapf et.al. | 2311.17504 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-01-18 | Cross-Modality Perturbation Synergy Attack for Person Re-identification | Yunpeng Gong et.al. | 2401.10090 | null |
| 2024-01-16 | Siamese Content-based Search Engine for a More Transparent Skin and Breast Cancer Diagnosis through Histological Imaging | Zahra Tabatabaei et.al. | 2401.08272 | null |
| 2024-01-16 | Multi-Technique Sequential Information Consistency For Dynamic Visual Place Recognition In Changing Environments | Bruno Arcanjo et.al. | 2401.08263 | null |
| 2024-01-15 | Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing | Jakob Hackstein et.al. | 2401.07782 | link |
| 2024-01-14 | HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval | Zexuan Qiu et.al. | 2401.07212 | null |
| 2024-01-11 | UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization | Rouwan Wu et.al. | 2401.05971 | null |
| 2024-01-10 | Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval | Eunyi Lyou et.al. | 2401.04860 | null |
| 2024-01-05 | Benchmarking PathCLIP for Pathology Image Analysis | Sunyi Zheng et.al. | 2401.02651 | null |
| 2024-01-02 | BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving | Dafeng Wei et.al. | 2401.01065 | null |
| 2023-12-31 | Multi-Granularity Representation Learning for Sketch-based Dynamic Face Image Retrieval | Liang Wang et.al. | 2401.00371 | link |
| 2023-12-29 | Bayesian Recursive Information Optical Imaging: A Ghost Imaging Scheme Based on Bayesian Filtering | Long-Kun Du et.al. | 2401.00032 | null |
| 2023-12-27 | LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization | Sai Shubodh Puligilla et.al. | 2312.16648 | null |
| 2023-12-26 | Recursive Distillation for Open-Set Distributed Robot Localization | Kenta Tsukahara et.al. | 2312.15897 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-23 | CaLDiff: Camera Localization in NeRF via Pose Diffusion | Rashik Shrestha et.al. | 2312.15242 | null |
| 2023-12-20 | Aggregating Multiple Bio-Inspired Image Region Classifiers For Effective And Lightweight Visual Place Recognition | Bruno Arcanjo et.al. | 2312.12995 | null |
| 2023-12-19 | VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering | Chun-Mei Feng et.al. | 2312.12273 | link |
| 2023-12-18 | Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback | Boaz Lerner et.al. | 2312.11078 | link |
| 2023-12-17 | PNeRFLoc: Visual Localization with Point-based Neural Radiance Fields | Boming Zhao et.al. | 2312.10649 | null |
| 2023-12-17 | DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition | Sijie Wang et.al. | 2312.10616 | link |
| 2023-12-16 | Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval | Decheng Liu et.al. | 2312.10320 | link |
| 2023-12-15 | Data-Efficient Multimodal Fusion on a Single GPU | Noël Vouitsis et.al. | 2312.10144 | link |
| 2023-12-13 | Advancements in Content-Based Image Retrieval: A Comprehensive Survey of Relevance Feedback Techniques | Hamed Qazanfari et.al. | 2312.10089 | null |
| 2023-12-15 | Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval | Zhe Ma et.al. | 2312.09716 | link |
| 2023-12-14 | Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition | Oliver Grainge et.al. | 2312.09028 | null |
| 2023-12-14 | Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking | Shitong Sun et.al. | 2312.08924 | null |
| 2023-12-13 | C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation | Florian Fervers et.al. | 2312.08060 | null |
| 2023-12-12 | Contextually Affinitive Neighborhood Refinery for Deep Clustering | Chunlin Yu et.al. | 2312.07806 | link |
| 2023-12-12 | Collapse-Oriented Adversarial Training with Triplet Decoupling for Robust Image Retrieval | Qiwei Tian et.al. | 2312.07364 | null |
| 2023-12-11 | Dynamic Weighted Combiner for Mixed-Modal Image Retrieval | Fuxiang Huang et.al. | 2312.06179 | null |
| 2023-12-06 | Lite-Mind: Towards Efficient and Versatile Brain Representation Network | Zixuan Gong et.al. | 2312.03781 | null |
| 2023-12-08 | FreestyleRet: Retrieving Images from Style-Diversified Queries | Hao Li et.al. | 2312.02428 | link |
| 2023-12-04 | Implicit Learning of Scene Geometry from Poses for Global Localization | Mohammad Altillawi et.al. | 2312.02029 | null |
| 2023-12-04 | Language-only Efficient Training of Zero-shot Composed Image Retrieval | Geonmo Gu et.al. | 2312.01998 | link |
| 2023-12-03 | G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training | Che Liu et.al. | 2312.01522 | null |
| 2023-12-01 | Improve Supervised Representation Learning with Masked Image Modeling | Kaifeng Chen et.al. | 2312.00950 | null |
| 2023-12-05 | Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | Walid Bousselham et.al. | 2312.00878 | link |
| 2023-12-01 | Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras | Mohammad Altillawi et.al. | 2312.00500 | null |
| 2023-11-30 | HKUST at SemEval-2023 Task 1: Visual Word Sense Disambiguation with Context Augmentation and Visual Assistance | Zhuohao Yin et.al. | 2311.18273 | link |
| 2023-11-30 | Label-efficient Training of Small Task-specific Models by Leveraging Vision Foundation Models | Raviteja Vemulapalli et.al. | 2311.18237 | null |
| 2023-11-29 | Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce | Chang Liu et.al. | 2311.17954 | null |
| 2023-11-28 | Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames | Chao Chen et.al. | 2311.17940 | null |
| 2023-11-29 | 360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries | Huajian Huang et.al. | 2311.17389 | null |
| 2023-11-27 | Removing NSFW Concepts from Vision-and-Language Models for Text-to-Image Retrieval and Generation | Samuele Poppi et.al. | 2311.16254 | link |
| 2023-11-27 | Optimal Transport Aggregation for Visual Place Recognition | Sergio Izquierdo et.al. | 2311.15937 | link |
| 2023-11-27 | AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval | Shicheng Xu et.al. | 2311.14084 | null |
| 2023-11-23 | 3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology | Asma Ben Abacha et.al. | 2311.13752 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-03-05 | Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion | Meng Zheng et.al. | 2403.03217 | null |
| 2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | Chengzhang Yu et.al. | 2402.14241 | null |
| 2024-02-25 | A Feature Matching Method Based on Multi-Level Refinement Strategy | Shaojie Zhang et.al. | 2402.13488 | null |
| 2024-03-05 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172 | null |
| 2024-02-25 | Region Feature Descriptor Adapted to High Affine Transformations | Shaojie Zhang et.al. | 2402.09724 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-17 | To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection | Luyi Han et.al. | 2401.09336 | link |
| 2024-01-08 | Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach | Huanyu Liu et.al. | 2401.03742 | null |
| 2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-27 | Bezier-based Regression Feature Descriptor for Deformable Linear Objects | Fangqing Chen et.al. | 2312.16502 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-22 | BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions | Elias Marks et.al. | 2312.14706 | null |
| 2023-12-19 | Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation | Jiaming Liu et.al. | 2312.12480 | null |
| 2023-12-19 | An effective image copy-move forgery detection using entropy image | Zhaowei Lu et.al. | 2312.11793 | null |
| 2023-12-11 | VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data | Jian Shi et.al. | 2312.08871 | link |
| 2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865 | null |
| 2023-12-01 | Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version) | Emma Cramer et.al. | 2312.00592 | null |
| 2023-11-30 | Utilizing Radiomic Feature Analysis For Automated MRI Keypoint Detection: Enhancing Graph Applications | Sahar Almahfouz Nasser et.al. | 2311.18281 | null |
| 2023-11-29 | Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features | Thomas Wimmer et.al. | 2311.18113 | null |
| 2023-11-28 | Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features | Niladri Shekhar Dutt et.al. | 2311.17024 | null |
| 2023-11-28 | Riemannian Self-Attention Mechanism for SPD Networks | Rui Wang et.al. | 2311.16738 | null |
| 2023-11-27 | A manometric feature descriptor with linear-SVM to distinguish esophageal contraction vigor | Jialin Liu et.al. | 2311.15609 | null |
| 2023-11-21 | Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers | Bo Sun et.al. | 2311.12291 | null |
| 2023-11-20 | CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement | Boni Hu et.al. | 2311.11604 | link |
| 2023-11-17 | Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration | Paul J. Claasen et.al. | 2311.10361 | null |
| 2023-11-13 | Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning | Tomáš Kunzo et.al. | 2311.07398 | null |
2024-1
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-02-05 | A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model | Murad Hasan et.al. | 2402.03417 | null |
| 2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Mingrui Li et.al. | 2402.03246 | null |
| 2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | Yujing Sun et.al. | 2402.02800 | link |
| 2024-02-04 | Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation | Ti Wang et.al. | 2402.02339 | null |
| 2024-02-01 | mmID: High-Resolution mmWave Imaging for Human Identification | Sakila S. Jayaweera et.al. | 2402.00996 | null |
| 2024-02-01 | In-Bed Pose Estimation: A Review | Ziya Ata Yazıcı et.al. | 2402.00700 | null |
| 2024-02-01 | WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness | Mateus Valverde Gasparino et.al. | 2402.00683 | null |
| 2024-02-02 | CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration | Daniele Cattaneo et.al. | 2402.00129 | null |
| 2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083 | link |
| 2024-01-30 | Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics | Nastaran Darabi et.al. | 2401.17481 | null |
| 2024-01-30 | MESA: Matching Everything by Segmenting Anything | Yesheng Zhang et.al. | 2401.16741 | null |
| 2024-01-30 | Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers | Jianbin Jiao et.al. | 2401.16700 | null |
| 2024-01-29 | Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation | Jaewoo Park et.al. | 2401.16284 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-28 | Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras | Yu-Jhe Li et.al. | 2401.15616 | null |
| 2024-01-30 | Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization | Kihoon Shin et.al. | 2401.15313 | null |
| 2024-01-26 | Adaptive Deep Learning for Efficient Visual Pose Estimation aboard Ultra-low-power Nano-drones | Beatrice Alessandra Motetti et.al. | 2401.15236 | null |
| 2024-01-26 | SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras | Hanz Cuevas-Velasquez et.al. | 2401.14785 | null |
| 2024-01-24 | Synthetic data enables faster annotation and robust segmentation for multi-object grasping in clutter | Dongmyoung Lee et.al. | 2401.13405 | null |
| 2024-01-24 | Linear Relative Pose Estimation Founded on Pose-only Imaging Geometry | Qi Cai et.al. | 2401.13357 | null |
| 2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076 | link |
| 2024-01-24 | RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos | Hongchi Xia et.al. | 2401.12592 | null |
| 2024-01-26 | MobileARLoc: On-device Robust Absolute Localisation for Pervasive Markerless Mobile AR | Changkun Liu et.al. | 2401.11511 | null |
| 2024-01-19 | SCENES: Subpixel Correspondence Estimation With Epipolar Supervision | Dominik A. Kloepfer et.al. | 2401.10886 | null |
| 2024-01-19 | Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation | Prakhar Kaushik et.al. | 2401.10848 | null |
| 2024-01-22 | TEXterity: Tactile Extrinsic deXterity | Antonia Bronars et.al. | 2401.10230 | null |
| 2024-01-18 | Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework | Junkun Jiang et.al. | 2401.09836 | null |
| 2024-01-17 | DK-SLAM: Monocular Visual SLAM with Deep Keypoints Adaptive Learning, Tracking and Loop-Closing | Hao Qu et.al. | 2401.09160 | null |
| 2024-01-17 | PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Yue Pan et.al. | 2401.09101 | link |
| 2024-01-16 | AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization | Qi Liao et.al. | 2401.08360 | null |
| 2024-01-16 | S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera | Thanh Nguyen Canh et.al. | 2401.08134 | null |
| 2024-01-15 | Collaboratively Self-supervised Video Representation Learning for Action Recognition | Jie Zhang et.al. | 2401.07584 | null |
| 2024-01-14 | 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework | Fan Zhang et.al. | 2401.07251 | null |
| 2024-01-11 | On the representation and methodology for wide and short range head pose estimation | Alejandro Cobo et.al. | 2401.05807 | link |
| 2024-01-10 | Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects | Tianhang Cheng et.al. | 2401.05236 | link |
| 2024-01-10 | Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits | Helena Russello et.al. | 2401.05202 | null |
| 2024-01-10 | Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton | Hongbo Kang et.al. | 2401.04921 | null |
| 2024-01-07 | RHOBIN Challenge: Reconstruction of Human Object Interaction | Xianghui Xie et.al. | 2401.04143 | null |
| 2024-01-08 | D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement | Danqi Yan et.al. | 2401.03914 | null |
| 2024-01-07 | Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision Systems | Victor Adewopo et.al. | 2401.03587 | null |
| 2024-01-04 | Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance Applications | Darshan Venkatrayappa et.al. | 2401.02383 | null |
| 2024-01-04 | Fit-NGP: Fitting Object Models to Neural Graphics Primitives | Marwan Taher et.al. | 2401.02357 | null |
| 2024-01-04 | PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation | Lukas Meyer et.al. | 2401.02281 | null |
| 2024-01-03 | Real-Time Human Fall Detection using a Lightweight Pose Estimation Technique | Ekram Alam et.al. | 2401.01587 | null |
| 2024-01-05 | PLE-SLAM: A Visual-Inertial SLAM Based on Point-Line Features and Efficient IMU Initialization | Jiaming He et.al. | 2401.01081 | link |
| 2023-12-30 | 3D Human Pose Perception from Egocentric Stereo Videos | Hiroyasu Akada et.al. | 2401.00889 | null |
| 2024-01-01 | Geometry Depth Consistency in RGBD Relative Pose Estimation | Sourav Kumar et.al. | 2401.00639 | null |
| 2023-12-30 | A comprehensive framework for occluded human pose estimation | Linhao Xu et.al. | 2401.00155 | null |
| 2024-01-02 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-29 | MURP: Multi-Agent Ultra-Wideband Relative Pose Estimation with Constrained Communications in 3D Environments | Andrew Fishberg et.al. | 2312.17731 | null |
| 2023-12-28 | iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views | Chin-Hsuan Wu et.al. | 2312.17250 | link |
| 2023-12-28 | EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion | Jianping Jiang et.al. | 2312.16933 | null |
| 2023-12-28 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | Zikang Yuan et.al. | 2312.16800 | link |
| 2023-12-28 | L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry | Feiya Li et.al. | 2312.16787 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-02-14 | Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency | Yannis Kalantidis et.al. | 2402.09237 | null |
| 2024-02-13 | Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Xiangming Gu et.al. | 2402.08567 | link |
| 2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359 | link |
| 2024-02-10 | Semantic Object-level Modeling for Robust Visual Camera Relocalization | Yifan Zhu et.al. | 2402.06951 | null |
| 2024-02-09 | Large Language Models for Captioning and Retrieving Remote Sensing Images | João Daniel Silva et.al. | 2402.06475 | null |
| 2024-02-09 | PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes | Xinggang Hu et.al. | 2402.06131 | null |
| 2024-02-04 | Region-Based Representations Revisited | Michal Shlapentokh-Rothman et.al. | 2402.02352 | null |
| 2024-02-03 | Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization | Bo Yang et.al. | 2402.02141 | null |
| 2024-02-01 | Night-Rider: Nocturnal Vision-aided Localization in Streetlight Maps Using Invariant Extended Kalman Filtering | Tianxiao Gao et.al. | 2402.00330 | link |
| 2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083 | link |
| 2024-01-31 | Local Feature Matching Using Deep Learning: A Survey | Shibiao Xu et.al. | 2401.17592 | null |
| 2024-01-29 | Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors | Shiyin Dong et.al. | 2401.16459 | null |
| 2024-01-29 | Cross-Modal Coordination Across a Diverse Set of Input Modalities | Jorge Sánchez et.al. | 2401.16347 | null |
| 2024-01-29 | Regressing Transformers for Data-efficient Visual Place Recognition | María Leyva-Vallina et.al. | 2401.16304 | null |
| 2024-01-27 | Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval | Ayush Dubey et.al. | 2401.15362 | null |
| 2024-01-24 | Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode | Naresh Kumar Lahajal et.al. | 2401.13613 | null |
| 2024-01-23 | PlaceFormer: Transformer-based Visual Place Recognition using Multi-Scale Patch Selection and Fusion | Shyam Sundar Kannan et.al. | 2401.13082 | null |
| 2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076 | link |
| 2024-01-25 | CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios | Xiangshuo Qiao et.al. | 2401.10475 | link |
| 2024-01-19 | PhotoScout: Synthesis-Powered Multi-Modal Image Search | Celeste Barnaby et.al. | 2401.10464 | null |
| 2024-01-19 | Cross-Modality Perturbation Synergy Attack for Person Re-identification | Yunpeng Gong et.al. | 2401.10090 | null |
| 2024-01-16 | Siamese Content-based Search Engine for a More Transparent Skin and Breast Cancer Diagnosis through Histological Imaging | Zahra Tabatabaei et.al. | 2401.08272 | null |
| 2024-01-16 | Multi-Technique Sequential Information Consistency For Dynamic Visual Place Recognition In Changing Environments | Bruno Arcanjo et.al. | 2401.08263 | null |
| 2024-01-15 | Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing | Jakob Hackstein et.al. | 2401.07782 | link |
| 2024-01-14 | HiHPQ: Hierarchical Hyperbolic Product Quantization for Unsupervised Image Retrieval | Zexuan Qiu et.al. | 2401.07212 | null |
| 2024-01-11 | UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization | Rouwan Wu et.al. | 2401.05971 | null |
| 2024-01-10 | Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval | Eunyi Lyou et.al. | 2401.04860 | null |
| 2024-01-05 | Benchmarking PathCLIP for Pathology Image Analysis | Sunyi Zheng et.al. | 2401.02651 | null |
| 2024-01-02 | BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving | Dafeng Wei et.al. | 2401.01065 | null |
| 2023-12-31 | Multi-Granularity Representation Learning for Sketch-based Dynamic Face Image Retrieval | Liang Wang et.al. | 2401.00371 | link |
| 2023-12-29 | Bayesian Recursive Information Optical Imaging: A Ghost Imaging Scheme Based on Bayesian Filtering | Long-Kun Du et.al. | 2401.00032 | null |
| 2023-12-27 | LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization | Sai Shubodh Puligilla et.al. | 2312.16648 | null |
| 2023-12-26 | Recursive Distillation for Open-Set Distributed Robot Localization | Kenta Tsukahara et.al. | 2312.15897 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-23 | CaLDiff: Camera Localization in NeRF via Pose Diffusion | Rashik Shrestha et.al. | 2312.15242 | null |
| 2023-12-20 | Aggregating Multiple Bio-Inspired Image Region Classifiers For Effective And Lightweight Visual Place Recognition | Bruno Arcanjo et.al. | 2312.12995 | null |
| 2023-12-19 | VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering | Chun-Mei Feng et.al. | 2312.12273 | link |
| 2023-12-18 | Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback | Boaz Lerner et.al. | 2312.11078 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-03-28 | Towards Long Term SLAM on Thermal Imagery | Colin Keil et.al. | 2403.19885 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-18 | FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events | Xiangyuan Wang et.al. | 2403.11662 | link |
| 2024-03-05 | Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion | Meng Zheng et.al. | 2403.03217 | null |
| 2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | Chengzhang Yu et.al. | 2402.14241 | null |
| 2024-02-25 | A Feature Matching Method Based on Multi-Level Refinement Strategy | Shaojie Zhang et.al. | 2402.13488 | null |
| 2024-03-05 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172 | null |
| 2024-02-25 | Region Feature Descriptor Adapted to High Affine Transformations | Shaojie Zhang et.al. | 2402.09724 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-17 | To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection | Luyi Han et.al. | 2401.09336 | link |
| 2024-01-08 | Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach | Huanyu Liu et.al. | 2401.03742 | null |
| 2024-03-22 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-27 | Bezier-based Regression Feature Descriptor for Deformable Linear Objects | Fangqing Chen et.al. | 2312.16502 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-22 | BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions | Elias Marks et.al. | 2312.14706 | null |
| 2023-12-19 | Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation | Jiaming Liu et.al. | 2312.12480 | null |
| 2023-12-19 | An effective image copy-move forgery detection using entropy image | Zhaowei Lu et.al. | 2312.11793 | null |
| 2023-12-11 | VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data | Jian Shi et.al. | 2312.08871 | link |
| 2023-12-11 | Keypoint-based Stereophotoclinometry for Characterizing and Navigating Small Bodies: A Factor Graph Approach | Travis Driver et.al. | 2312.06865 | null |
| 2023-12-01 | Tracking Object Positions in Reinforcement Learning: A Metric for Keypoint Detection (extended version) | Emma Cramer et.al. | 2312.00592 | null |
| 2023-11-30 | Utilizing Radiomic Feature Analysis For Automated MRI Keypoint Detection: Enhancing Graph Applications | Sahar Almahfouz Nasser et.al. | 2311.18281 | null |
2024-2
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263 | null |
| 2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110 | null |
| 2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988 | null |
| 2024-03-04 | TEXterity – Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049 | null |
| 2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062 | null |
| 2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844 | link |
| 2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330 | link |
| 2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320 | null |
| 2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196 | null |
| 2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066 | null |
| 2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504 | null |
| 2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062 | link |
| 2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640 | null |
| 2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607 | null |
| 2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308 | null |
| 2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175 | null |
| 2024-02-25 | VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Xudong Cai et.al. | 2402.15961 | link |
| 2024-02-24 | CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge | Xiao Lin et.al. | 2402.15726 | null |
| 2024-02-23 | Optimized Deployment of Deep Neural Networks for Visual Pose Estimation on Nano-drones | Matteo Risso et.al. | 2402.15273 | null |
| 2024-02-22 | Cameras as Rays: Pose Estimation via Ray Diffusion | Jason Y. Zhang et.al. | 2402.14817 | null |
| 2024-02-22 | S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR | Jialun Pei et.al. | 2402.14461 | null |
| 2024-02-22 | VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning | Jingyao Li et.al. | 2402.14456 | null |
| 2024-02-22 | Modeling 3D Infant Kinetics Using Adaptive Graph Convolutional Networks | Daniel Holmberg et.al. | 2402.14400 | link |
| 2024-02-22 | Secure Navigation using Landmark-based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2402.14280 | null |
| 2024-02-21 | SecurePose: Automated Face Blurring and Human Movement Kinematics Extraction from Videos Recorded in Clinical Settings | Rishabh Bajpai et.al. | 2402.14143 | null |
| 2024-02-21 | High-throughput Visual Nano-drone to Nano-drone Relative Localization using Onboard Fully Convolutional Networks | Luca Crupi et.al. | 2402.13756 | null |
| 2024-02-21 | EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization | Zhendong Xiao et.al. | 2402.13537 | null |
| 2024-02-20 | DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation | Takuya Ikeda et.al. | 2402.12647 | null |
| 2024-02-19 | Landmark-based Localization using Stereo Vision and Deep Learning in GPS-Denied Battlefield Environment | Ganesh Sapkota et.al. | 2402.12551 | null |
| 2024-02-18 | Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training | Huayi Zhou et.al. | 2402.11566 | link |
| 2024-02-17 | Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review | Merryn D. Constable et.al. | 2402.11288 | null |
| 2024-02-17 | Dense Matchers for Dense Tracking | Tomáš Jelínek et.al. | 2402.11287 | null |
| 2024-02-16 | Occlusion Resilient 3D Human Pose Estimation | Soumava Kumar Roy et.al. | 2402.11036 | null |
| 2024-02-16 | 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Tsung-Wei Ke et.al. | 2402.10885 | null |
| 2024-02-15 | Lester: rotoscope animation through video object segmentation and tracking | Ruben Tous et.al. | 2402.09883 | link |
| 2024-02-15 | Foul prediction with estimated poses from soccer broadcast video | Jiale Fang et.al. | 2402.09650 | null |
| 2024-02-16 | IMUOptimize: A Data-Driven Approach to Optimal IMU Placement for Human Pose Estimation with Transformer Architecture | Varun Ramani et.al. | 2402.08923 | null |
| 2024-02-13 | Are Semi-Dense Detector-Free Methods Good at Matching Local Features? | Matthieu Vilain et.al. | 2402.08671 | null |
| 2024-02-13 | Gaussian-Sum Filter for Range-based 3D Relative Pose Estimation in the Presence of Ambiguities | Syed S. Ahmed et.al. | 2402.08566 | null |
| 2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359 | link |
| 2024-02-12 | Extending 3D body pose estimation for robotic-assistive therapies of autistic children | Laura Santos et.al. | 2402.08006 | null |
| 2024-02-12 | GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance | Shiyu Li et.al. | 2402.07677 | null |
| 2024-02-12 | UAV-assisted Visual SLAM Generating Reconstructed 3D Scene Graphs in GPS-denied Environments | Ahmed Radwan et.al. | 2402.07537 | null |
| 2024-02-09 | Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation | Peter Hönig et.al. | 2402.06436 | null |
| 2024-02-08 | Real-time Holistic Robot Pose Estimation with Unknown States | Shikun Ban et.al. | 2402.05655 | link |
| 2024-02-08 | Extending 6D Object Pose Estimators for Stereo Vision | Thomas Pöllabauer et.al. | 2402.05610 | null |
| 2024-02-09 | NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction | Zhongqun Zhang et.al. | 2402.05532 | null |
| 2024-02-07 | Detection and Pose Estimation of flat, Texture-less Industry Objects on HoloLens using synthetic Training | Thomas Pöllabauer et.al. | 2402.04979 | null |
| 2024-02-07 | 4-Dimensional deformation part model for pose estimation using Kalman filter constraints | Enrique Martinez-Berti et.al. | 2402.04953 | null |
| 2024-02-07 | STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation | Peter Hönig et.al. | 2402.04878 | null |
| 2024-02-05 | A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model | Murad Hasan et.al. | 2402.03417 | null |
| 2024-02-05 | SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM | Mingrui Li et.al. | 2402.03246 | null |
| 2024-02-05 | Extreme Two-View Geometry From Object Poses with Diffusion Models | Yujing Sun et.al. | 2402.02800 | link |
| 2024-02-04 | Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation | Ti Wang et.al. | 2402.02339 | null |
| 2024-02-01 | mmID: High-Resolution mmWave Imaging for Human Identification | Sakila S. Jayaweera et.al. | 2402.00996 | null |
| 2024-02-01 | In-Bed Pose Estimation: A Review | Ziya Ata Yazıcı et.al. | 2402.00700 | null |
| 2024-02-01 | WayFASTER: a Self-Supervised Traversability Prediction for Increased Navigation Awareness | Mateus Valverde Gasparino et.al. | 2402.00683 | null |
| 2024-02-02 | CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration | Daniele Cattaneo et.al. | 2402.00129 | null |
| 2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083 | link |
| 2024-01-30 | Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics | Nastaran Darabi et.al. | 2401.17481 | null |
| 2024-01-30 | MESA: Matching Everything by Segmenting Anything | Yesheng Zhang et.al. | 2401.16741 | null |
| 2024-01-30 | Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers | Jianbin Jiao et.al. | 2401.16700 | null |
| 2024-01-29 | Leveraging Positional Encoding for Robust Multi-Reference-Based Object 6D Pose Estimation | Jaewoo Park et.al. | 2401.16284 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-28 | Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras | Yu-Jhe Li et.al. | 2401.15616 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-03-08 | LHMap-loc: Cross-Modal Monocular Localization Using LiDAR Point Cloud Heat Map | Xinrui Wu et.al. | 2403.05002 | null |
| 2024-03-07 | Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | Yifan Wang et.al. | 2403.04765 | null |
| 2024-03-06 | Self-supervised Photographic Image Layout Representation Learning | Zhaoran Zhao et.al. | 2403.03740 | link |
| 2024-03-04 | Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models | Benedikt Blumenstiel et.al. | 2403.02059 | link |
| 2024-03-03 | Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval | Yongchao Du et.al. | 2403.01431 | null |
| 2024-03-01 | Asymmetric Feature Fusion for Image Retrieval | Hui Wu et.al. | 2403.00671 | null |
| 2024-03-01 | Structure Similarity Preservation Learning for Asymmetric Image Retrieval | Hui Wu et.al. | 2403.00648 | link |
| 2024-02-29 | CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition | Feng Lu et.al. | 2402.19231 | link |
| 2024-02-28 | Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport | Bin Li et.al. | 2402.18411 | link |
| 2024-02-28 | Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning | Hanyao Wang et.al. | 2402.18400 | null |
| 2024-02-28 | Representing 3D sparse map points and lines for camera relocalization | Bach-Thuan Bui et.al. | 2402.18011 | link |
| 2024-02-27 | Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control | Thong Nguyen et.al. | 2402.17535 | link |
| 2024-02-29 | Active propulsion noise shaping for multi-rotor aircraft localization | Gabriele Serussi et.al. | 2402.17289 | link |
| 2024-02-27 | NocPlace: Nocturnal Visual Place Recognition Using Generative and Inherited Knowledge Transfer | Bingxi Liu et.al. | 2402.17159 | null |
| 2024-02-25 | Deep Homography Estimation for Visual Place Recognition | Feng Lu et.al. | 2402.16086 | link |
| 2024-02-25 | VOLoc: Visual Place Recognition by Querying Compressed Lidar Map | Xudong Cai et.al. | 2402.15961 | link |
| 2024-02-28 | Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries | Zijun Long et.al. | 2402.15276 | null |
| 2024-02-23 | Fine-tuning CLIP Text Encoders with Two-step Paraphrasing | Hyunjae Kim et.al. | 2402.15120 | null |
| 2024-02-22 | Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition | Feng Lu et.al. | 2402.14505 | link |
| 2024-02-16 | Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition | Chenming Hu et.al. | 2402.10476 | null |
| 2024-02-15 | Self-Supervised Learning of Visual Robot Localization Using LED State Prediction as a Pretext Task | Mirko Nava et.al. | 2402.09886 | link |
| 2024-02-14 | Weatherproofing Retrieval for Localization with Generative AI and Geometric Consistency | Yannis Kalantidis et.al. | 2402.09237 | null |
| 2024-02-13 | Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Xiangming Gu et.al. | 2402.08567 | link |
| 2024-02-13 | Learning to Produce Semi-dense Correspondences for Visual Localization | Khang Truong Giang et.al. | 2402.08359 | link |
| 2024-02-10 | Semantic Object-level Modeling for Robust Visual Camera Relocalization | Yifan Zhu et.al. | 2402.06951 | null |
| 2024-02-09 | Large Language Models for Captioning and Retrieving Remote Sensing Images | João Daniel Silva et.al. | 2402.06475 | null |
| 2024-02-09 | PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes | Xinggang Hu et.al. | 2402.06131 | null |
| 2024-02-04 | Region-Based Representations Revisited | Michal Shlapentokh-Rothman et.al. | 2402.02352 | null |
| 2024-02-03 | Zero-shot sketch-based remote sensing image retrieval based on multi-level and attention-guided tokenization | Bo Yang et.al. | 2402.02141 | null |
| 2024-02-01 | Night-Rider: Nocturnal Vision-aided Localization in Streetlight Maps Using Invariant Extended Kalman Filtering | Tianxiao Gao et.al. | 2402.00330 | link |
| 2024-01-31 | Improved Scene Landmark Detection for Camera Localization | Tien Do et.al. | 2401.18083 | link |
| 2024-01-31 | Local Feature Matching Using Deep Learning: A Survey | Shibiao Xu et.al. | 2401.17592 | null |
| 2024-01-29 | Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors | Shiyin Dong et.al. | 2401.16459 | null |
| 2024-01-29 | Cross-Modal Coordination Across a Diverse Set of Input Modalities | Jorge Sánchez et.al. | 2401.16347 | null |
| 2024-01-29 | Regressing Transformers for Data-efficient Visual Place Recognition | María Leyva-Vallina et.al. | 2401.16304 | null |
| 2024-01-27 | Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval | Ayush Dubey et.al. | 2401.15362 | null |
| 2024-01-24 | Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP Mode | Naresh Kumar Lahajal et.al. | 2401.13613 | null |
| 2024-01-23 | PlaceFormer: Transformer-based Visual Place Recognition using Multi-Scale Patch Selection and Fusion | Shyam Sundar Kannan et.al. | 2401.13082 | null |
| 2024-01-23 | SemanticSLAM: Learning based Semantic Map Construction and Robust Camera Localization | Mingyang Li et.al. | 2401.13076 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-04-30 | A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images | Wang Zhang et.al. | 2404.19311 | null |
| 2024-04-25 | Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach | Tahmim Hossain et.al. | 2404.14560 | null |
| 2024-04-19 | SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers | Vandad Davoodnia et.al. | 2404.12625 | null |
| 2024-04-17 | Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images | Junbiao Pang et.al. | 2404.10985 | null |
| 2024-03-28 | Towards Long Term SLAM on Thermal Imagery | Colin Keil et.al. | 2403.19885 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-18 | FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events | Xiangyuan Wang et.al. | 2403.11662 | link |
| 2024-03-05 | Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion | Meng Zheng et.al. | 2403.03217 | null |
| 2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | Chengzhang Yu et.al. | 2402.14241 | null |
| 2024-02-25 | A Feature Matching Method Based on Multi-Level Refinement Strategy | Shaojie Zhang et.al. | 2402.13488 | null |
| 2024-03-05 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172 | null |
| 2024-02-25 | Region Feature Descriptor Adapted to High Affine Transformations | Shaojie Zhang et.al. | 2402.09724 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-17 | To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection | Luyi Han et.al. | 2401.09336 | link |
| 2024-01-08 | Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach | Huanyu Liu et.al. | 2401.03742 | null |
| 2024-03-22 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-27 | Bezier-based Regression Feature Descriptor for Deformable Linear Objects | Fangqing Chen et.al. | 2312.16502 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
| 2023-12-22 | BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions | Elias Marks et.al. | 2312.14706 | null |
| 2023-12-19 | Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation | Jiaming Liu et.al. | 2312.12480 | null |
| 2023-12-19 | An effective image copy-move forgery detection using entropy image | Zhaowei Lu et.al. | 2312.11793 | null |
2024-3
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193 | null |
| 2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518 | link |
| 2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256 | null |
| 2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159 | link |
| 2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567 | null |
| 2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544 | link |
| 2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125 | null |
| 2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041 | null |
| 2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891 | null |
| 2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691 | null |
| 2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676 | null |
| 2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658 | link |
| 2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132 | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
| 2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104 | null |
| 2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080 | null |
| 2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893 | null |
| 2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837 | link |
| 2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827 | null |
| 2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788 | null |
| 2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103 | null |
| 2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490 | null |
| 2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
| 2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411 | null |
| 2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400 | null |
| 2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238 | null |
| 2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198 | null |
| 2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705 | null |
| 2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612 | null |
| 2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571 | null |
| 2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
| 2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272 | null |
| 2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203 | null |
| 2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048 | null |
| 2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973 | null |
| 2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743 | null |
| 2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559 | null |
| 2024-03-23 | Exploring 3D Human Pose Estimation and Forecasting from the Robot’s Perspective: The HARPER Dataset | Andrea Avogaro et.al. | 2403.14447 | null |
| 2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326 | null |
| 2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279 | null |
| 2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683 | link |
| 2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647 | link |
| 2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality – Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434 | null |
| 2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405 | null |
| 2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365 | null |
| 2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348 | null |
| 2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960 | null |
| 2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959 | null |
| 2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728 | link |
| 2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682 | null |
| 2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676 | null |
| 2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440 | null |
| 2024-03-20 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434 | null |
| 2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370 | null |
| 2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011 | null |
| 2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665 | null |
| 2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639 | null |
| 2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627 | link |
| 2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510 | null |
| 2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310 | null |
| 2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247 | null |
| 2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874 | null |
| 2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773 | null |
| 2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683 | null |
| 2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990 | null |
| 2024-03-14 | ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Image | Fangqiang Ding et.al. | 2403.09871 | null |
| 2024-03-14 | BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects | Tomas Hodan et.al. | 2403.09799 | null |
| 2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596 | null |
| 2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437 | null |
| 2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407 | null |
| 2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317 | link |
| 2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309 | null |
| 2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650 | null |
| 2024-03-15 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586 | null |
| 2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156 | null |
| 2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125 | null |
| 2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019 | null |
| 2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741 | null |
| 2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535 | null |
| 2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437 | null |
| 2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219 | null |
| 2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862 | null |
| 2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
| 2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164 | link |
| 2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090 | null |
| 2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666 | null |
| 2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450 | null |
| 2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936 | null |
| 2024-03-07 | That’s My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755 | null |
| 2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444 | null |
| 2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381 | null |
| 2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221 | null |
| 2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122 | null |
| 2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
| 2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751 | null |
| 2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913 | link |
| 2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
| 2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517 | null |
| 2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263 | null |
| 2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110 | null |
| 2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988 | null |
| 2024-03-04 | TEXterity – Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049 | null |
| 2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062 | null |
| 2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844 | link |
| 2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330 | link |
| 2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320 | null |
| 2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196 | null |
| 2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066 | null |
| 2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504 | null |
| 2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062 | link |
| 2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640 | null |
| 2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-04-11 | PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Fei Xue et.al. | 2404.07785 | null |
| 2024-04-11 | Semantically-correlated memories in a dense associative model | Thomas F Burns et.al. | 2404.07123 | link |
| 2024-04-09 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation | Luca Barsellotti et.al. | 2404.06542 | null |
| 2024-04-09 | Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Anas Gouda et.al. | 2404.06277 | null |
| 2024-04-07 | Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval | Jinpeng Wang et.al. | 2404.04998 | link |
| 2024-04-06 | Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning | Juncheng Yang et.al. | 2404.04538 | null |
| 2024-04-02 | TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation | Yehui Shen et.al. | 2404.01587 | link |
| 2024-04-01 | On Train-Test Class Overlap and Detection for Image Retrieval | Chull Hwan Song et.al. | 2404.01524 | link |
| 2024-04-01 | NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification | Juyeop Han et.al. | 2404.01400 | null |
| 2024-03-31 | On the Estimation of Image-matching Uncertainty in Visual Place Recognition | Mubariz Zaffar et.al. | 2404.00546 | null |
| 2024-03-31 | NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation | Diwei Sheng et.al. | 2404.00504 | null |
| 2024-03-30 | SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs | Yang Miao et.al. | 2404.00469 | null |
| 2024-03-30 | Do Vision-Language Models Understand Compound Nouns? | Sonal Kumar et.al. | 2404.00419 | null |
| 2024-04-05 | FairRAG: Fair Human Generation via Fair Retrieval Augmentation | Robik Shrestha et.al. | 2403.19964 | null |
| 2024-03-28 | JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition | Gabriele Berton et.al. | 2403.19787 | link |
| 2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Kai Zhang et.al. | 2403.19651 | null |
| 2024-03-27 | AIR-HLoc: Adaptive Image Retrieval for Efficient Visual Localisation | Changkun Liu et.al. | 2403.18281 | null |
| 2024-03-26 | Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge | Dongjin Kim et.al. | 2403.17420 | link |
| 2024-03-25 | Enhancing Visual Place Recognition via Fast and Slow Adaptive Biasing in Event Cameras | Gokul B. Nair et.al. | 2403.16425 | null |
| 2024-03-24 | Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval | Yucheng Suo et.al. | 2403.16005 | null |
| 2024-03-24 | BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Yinda Chen et.al. | 2403.15992 | null |
| 2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | link |
| 2024-03-22 | A Multimodal Approach for Cross-Domain Image Retrieval | Lucas Iijima et.al. | 2403.15152 | null |
| 2024-03-22 | Piecewise-Linear Manifolds for Deep Metric Learning | Shubhang Bhatnagar et.al. | 2403.14977 | null |
| 2024-03-21 | Enhancing Historical Image Retrieval with Compositional Cues | Tingyu Lin et.al. | 2403.14287 | link |
| 2024-03-20 | Leveraging High-Resolution Features for Improved Deep Hashing-based Image Retrieval | Aymene Berriche et.al. | 2403.13747 | null |
| 2024-03-20 | Flickr30K-CFQ: A Compact and Fragmented Query Dataset for Text-image Retrieval | Haoyu Liu et.al. | 2403.13317 | null |
| 2024-03-19 | Learning Neural Volumetric Pose Features for Camera Localization | Jingyu Lin et.al. | 2403.12800 | null |
| 2024-03-19 | Quantixar: High-performance Vector Data Management System | Gulshan Yadav et.al. | 2403.12583 | null |
| 2024-03-17 | 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization | Peng Jiang et.al. | 2403.11367 | null |
| 2024-03-17 | MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data | Paul S. Scotti et.al. | 2403.11207 | link |
| 2024-03-16 | Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval | Shunsuke Tsubaki et.al. | 2403.10756 | null |
| 2024-03-16 | Vector search with small radiuses | Gergely Szilvasy et.al. | 2403.10746 | null |
| 2024-03-13 | Training Self-localization Models for Unseen Unfamiliar Places via Teacher-to-Student Data-Free Knowledge Transfer | Kenta Tsukahara et.al. | 2403.10552 | null |
| 2024-03-20 | Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression | Huy-Hoang Bui et.al. | 2403.10297 | link |
| 2024-03-15 | Local positional graphs and attentive local features for a data and runtime-efficient hierarchical place recognition pipeline | Fangming Yuan et.al. | 2403.10283 | null |
| 2024-03-14 | The NeRFect Match: Exploring NeRF Features for Visual Localization | Qunjie Zhou et.al. | 2403.09577 | null |
| 2024-03-14 | VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition | Benjamin Ramtoula et.al. | 2403.09025 | null |
| 2024-03-13 | PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models | Siddharth Mishra-Sharma et.al. | 2403.08851 | link |
| 2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156 | null |
| 2024-03-12 | It’s All About Your Sketch: Democratising Sketch Control in Diffusion Models | Subhadeep Koley et.al. | 2403.07234 | link |
| 2024-03-12 | You’ll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval | Subhadeep Koley et.al. | 2403.07222 | null |
| 2024-03-12 | Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers | Subhadeep Koley et.al. | 2403.07214 | null |
| 2024-03-11 | How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval? | Subhadeep Koley et.al. | 2403.07203 | null |
| 2024-03-11 | EarthLoc: Astronaut Photography Localization by Indexing Earth from Space | Gabriele Berton et.al. | 2403.06758 | link |
| 2024-03-11 | BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues | Fudong Ge et.al. | 2403.06600 | null |
| 2024-03-11 | Leveraging Foundation Models for Content-Based Medical Image Retrieval in Radiology | Stefan Denner et.al. | 2403.06567 | null |
| 2024-03-10 | Texture image retrieval using a classification and contourlet-based features | Asal Rouhafzay et.al. | 2403.06048 | null |
| 2024-03-11 | LHMap-loc: Cross-Modal Monocular Localization Using LiDAR Point Cloud Heat Map | Xinrui Wu et.al. | 2403.05002 | link |
| 2024-03-11 | Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed | Yifan Wang et.al. | 2403.04765 | null |
| 2024-03-06 | Self-supervised Photographic Image Layout Representation Learning | Zhaoran Zhao et.al. | 2403.03740 | link |
| 2024-03-04 | Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models | Benedikt Blumenstiel et.al. | 2403.02059 | link |
| 2024-03-03 | Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval | Yongchao Du et.al. | 2403.01431 | null |
| 2024-03-01 | Asymmetric Feature Fusion for Image Retrieval | Hui Wu et.al. | 2403.00671 | null |
| 2024-03-01 | Structure Similarity Preservation Learning for Asymmetric Image Retrieval | Hui Wu et.al. | 2403.00648 | link |
| 2024-02-29 | CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition | Feng Lu et.al. | 2402.19231 | link |
| 2024-02-28 | Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport | Bin Li et.al. | 2402.18411 | link |
| 2024-02-28 | Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning | Hanyao Wang et.al. | 2402.18400 | null |
| 2024-02-28 | Representing 3D sparse map points and lines for camera relocalization | Bach-Thuan Bui et.al. | 2402.18011 | link |
| 2024-02-27 | Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control | Thong Nguyen et.al. | 2402.17535 | link |
| 2024-02-29 | Active propulsion noise shaping for multi-rotor aircraft localization | Gabriele Serussi et.al. | 2402.17289 | link |
| 2024-02-27 | NocPlace: Nocturnal Visual Place Recognition Using Generative and Inherited Knowledge Transfer | Bingxi Liu et.al. | 2402.17159 | null |
| 2024-02-25 | Deep Homography Estimation for Visual Place Recognition | Feng Lu et.al. | 2402.16086 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-15 | Vector-Symbolic Architecture for Event-Based Optical Flow | Hongzhi You et.al. | 2405.08300 | null |
| 2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594 | null |
| 2024-05-08 | Unsupervised Skin Feature Tracking with Deep Neural Networks | Jose Chang et.al. | 2405.04943 | null |
| 2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
| 2024-04-30 | A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images | Wang Zhang et.al. | 2404.19311 | null |
| 2024-04-25 | Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach | Tahmim Hossain et.al. | 2404.14560 | null |
| 2024-04-19 | SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers | Vandad Davoodnia et.al. | 2404.12625 | null |
| 2024-04-17 | Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images | Junbiao Pang et.al. | 2404.10985 | null |
| 2024-03-28 | Towards Long Term SLAM on Thermal Imagery | Colin Keil et.al. | 2403.19885 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-18 | FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events | Xiangyuan Wang et.al. | 2403.11662 | link |
| 2024-03-05 | Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion | Meng Zheng et.al. | 2403.03217 | null |
| 2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | Chengzhang Yu et.al. | 2402.14241 | null |
| 2024-02-25 | A Feature Matching Method Based on Multi-Level Refinement Strategy | Shaojie Zhang et.al. | 2402.13488 | null |
| 2024-03-05 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172 | null |
| 2024-02-25 | Region Feature Descriptor Adapted to High Affine Transformations | Shaojie Zhang et.al. | 2402.09724 | null |
| 2024-01-29 | Reconstructing Close Human Interactions from Multiple Views | Qing Shuai et.al. | 2401.16173 | link |
| 2024-01-17 | To deform or not: treatment-aware longitudinal registration for breast DCE-MRI during neoadjuvant chemotherapy via unsupervised keypoints detection | Luyi Han et.al. | 2401.09336 | link |
| 2024-01-08 | Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach | Huanyu Liu et.al. | 2401.03742 | null |
| 2024-03-22 | 6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation | Li Xu et.al. | 2401.00029 | null |
| 2023-12-27 | Bezier-based Regression Feature Descriptor for Deformable Linear Objects | Fangqing Chen et.al. | 2312.16502 | null |
| 2023-12-24 | Residual Learning for Image Point Descriptors | Rashik Shrestha et.al. | 2312.15471 | null |
2024-4
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | null |
| 2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | null |
| 2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112 | null |
| 2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107 | null |
| 2024-05-02 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066 | null |
| 2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600 | null |
| 2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541 | link |
| 2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
| 2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279 | null |
| 2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
| 2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628 | null |
| 2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395 | null |
| 2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394 | null |
| 2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837 | null |
| 2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685 | null |
| 2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215 | null |
| 2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063 | link |
| 2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802 | null |
| 2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558 | null |
| 2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552 | link |
| 2024-04-25 | COBRA – COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471 | link |
| 2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370 | null |
| 2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136 | null |
| 2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276 | link |
| 2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885 | link |
| 2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835 | null |
| 2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634 | null |
| 2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025 | null |
| 2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896 | null |
| 2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698 | null |
| 2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346 | link |
| 2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440 | null |
| 2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183 | null |
| 2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144 | link |
| 2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
| 2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139 | null |
| 2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111 | link |
| 2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
| 2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687 | null |
| 2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438 | null |
| 2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213 | null |
| 2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212 | link |
| 2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748 | null |
| 2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
| 2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928 | link |
| 2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504 | null |
| 2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649 | null |
| 2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
| 2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124 | null |
| 2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094 | null |
| 2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926 | null |
| 2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337 | link |
| 2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050 | null |
| 2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705 | null |
| 2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626 | null |
| 2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518 | link |
| 2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414 | null |
| 2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151 | null |
| 2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193 | null |
| 2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518 | link |
| 2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256 | null |
| 2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159 | link |
| 2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567 | null |
| 2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544 | link |
| 2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125 | null |
| 2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041 | null |
| 2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891 | null |
| 2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691 | null |
| 2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676 | null |
| 2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658 | link |
| 2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132 | null |
| 2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
| 2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
| 2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104 | null |
| 2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080 | null |
| 2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893 | null |
| 2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837 | link |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-05-14 | HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval | Chao He et.al. | 2405.07524 | link |
| 2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429 | null |
| 2024-05-12 | BoQ: A Place is Worth a Bag of Learnable Queries | Amar Ali-bey et.al. | 2405.07364 | link |
| 2024-05-07 | Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction | Nematollah Saeidi et.al. | 2405.04211 | null |
| 2024-05-06 | A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions | Sharath Raghvendra et.al. | 2405.03664 | null |
| 2024-05-06 | Knowledge-aware Text-Image Retrieval for Remote Sensing Images | Li Mi et.al. | 2405.03373 | null |
| 2024-05-06 | Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval | Jiacheng Cheng et.al. | 2405.03190 | null |
| 2024-05-05 | iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | Lorenzo Agnolucci et.al. | 2405.02951 | link |
| 2024-05-01 | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval | Young Kyun Jang et.al. | 2405.00571 | null |
| 2024-04-30 | Large Language Model Informed Patent Image Retrieval | Hao-Cheng Lo et.al. | 2404.19360 | null |
| 2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
| 2024-04-29 | Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models | Hongyi Zhu et.al. | 2404.18746 | null |
| 2024-04-29 | Dual-Modal Prompting for Sketch-Based Image Retrieval | Liying Gao et.al. | 2404.18695 | null |
| 2024-05-01 | Semantic Line Combination Detector | Jinwon Ko et.al. | 2404.18399 | link |
| 2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
| 2024-04-25 | CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Samia Shafique et.al. | 2404.16972 | null |
| 2024-04-29 | Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval | Ryoya Nara et.al. | 2404.16398 | null |
| 2024-04-24 | Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval | Haokun Wen et.al. | 2404.15875 | link |
| 2024-04-24 | DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines | Xin Jiang et.al. | 2404.15771 | null |
| 2024-04-23 | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang et.al. | 2404.15516 | null |
| 2024-04-22 | EcoPull: Sustainable IoT Image Retrieval Empowered by TinyML Models | Mathias Thorsager et.al. | 2404.14236 | null |
| 2024-04-22 | Hierarchical localization with panoramic views and triplet loss functions | Marcos Alfaro et.al. | 2404.14117 | link |
| 2024-04-20 | High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces | Baoru Huang et.al. | 2404.13437 | null |
| 2024-04-20 | Collaborative Visual Place Recognition through Federated Learning | Mattia Dutto et.al. | 2404.13324 | null |
| 2024-04-18 | SPOT: Point Cloud Based Stereo Visual Place Recognition for Similar and Opposing Viewpoints | Spencer Carmichael et.al. | 2404.12339 | null |
| 2024-04-17 | Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives | Zhangchi Feng et.al. | 2404.11317 | null |
| 2024-04-17 | Spatial-Aware Image Retrieval: A Hyperdimensional Computing Approach for Efficient Similarity Hashing | Sanggeon Yun et.al. | 2404.11025 | null |
| 2024-04-16 | SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments | Niklas Gard et.al. | 2404.10527 | link |
| 2024-04-20 | CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning | Haojian Huang et.al. | 2404.09640 | link |
| 2024-04-11 | PRAM: Place Recognition Anywhere Model for Efficient Visual Localization | Fei Xue et.al. | 2404.07785 | null |
| 2024-04-11 | Semantically-correlated memories in a dense associative model | Thomas F Burns et.al. | 2404.07123 | link |
| 2024-04-09 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation | Luca Barsellotti et.al. | 2404.06542 | null |
| 2024-04-09 | Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Anas Gouda et.al. | 2404.06277 | null |
| 2024-04-07 | Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval | Jinpeng Wang et.al. | 2404.04998 | link |
| 2024-04-06 | Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning | Juncheng Yang et.al. | 2404.04538 | null |
| 2024-04-02 | TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation | Yehui Shen et.al. | 2404.01587 | link |
| 2024-04-01 | On Train-Test Class Overlap and Detection for Image Retrieval | Chull Hwan Song et.al. | 2404.01524 | link |
| 2024-04-01 | NVINS: Robust Visual Inertial Navigation Fused with NeRF-augmented Camera Pose Regressor and Uncertainty Quantification | Juyeop Han et.al. | 2404.01400 | null |
| 2024-03-31 | On the Estimation of Image-matching Uncertainty in Visual Place Recognition | Mubariz Zaffar et.al. | 2404.00546 | null |
| 2024-03-31 | NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation | Diwei Sheng et.al. | 2404.00504 | null |
| 2024-03-30 | SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs | Yang Miao et.al. | 2404.00469 | null |
| 2024-03-30 | Do Vision-Language Models Understand Compound Nouns? | Sonal Kumar et.al. | 2404.00419 | null |
| 2024-04-05 | FairRAG: Fair Human Generation via Fair Retrieval Augmentation | Robik Shrestha et.al. | 2403.19964 | null |
| 2024-03-28 | JIST: Joint Image and Sequence Training for Sequential Visual Place Recognition | Gabriele Berton et.al. | 2403.19787 | link |
| 2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Kai Zhang et.al. | 2403.19651 | null |
| 2024-03-27 | AIR-HLoc: Adaptive Image Retrieval for Efficient Visual Localisation | Changkun Liu et.al. | 2403.18281 | null |
| 2024-03-26 | Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge | Dongjin Kim et.al. | 2403.17420 | link |
| 2024-03-25 | Enhancing Visual Place Recognition via Fast and Slow Adaptive Biasing in Event Cameras | Gokul B. Nair et.al. | 2403.16425 | null |
| 2024-03-24 | Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval | Yucheng Suo et.al. | 2403.16005 | null |
| 2024-03-24 | BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval | Yinda Chen et.al. | 2403.15992 | null |
| 2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | link |
| 2024-03-22 | A Multimodal Approach for Cross-Domain Image Retrieval | Lucas Iijima et.al. | 2403.15152 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-23 | W-Net: A Facial Feature-Guided Face Super-Resolution Network | Hao Liu et.al. | 2406.00676 | null |
| 2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085 | null |
| 2024-06-01 | Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection – Towards Precise Fish Morphological Assessment in Aquaculture Breeding | Weizhen Liu et.al. | 2405.12476 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-15 | Vector-Symbolic Architecture for Event-Based Optical Flow | Hongzhi You et.al. | 2405.08300 | null |
| 2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594 | null |
| 2024-05-08 | Unsupervised Skin Feature Tracking with Deep Neural Networks | Jose Chang et.al. | 2405.04943 | null |
| 2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
| 2024-04-30 | A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images | Wang Zhang et.al. | 2404.19311 | null |
| 2024-04-25 | Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach | Tahmim Hossain et.al. | 2404.14560 | null |
| 2024-04-19 | SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers | Vandad Davoodnia et.al. | 2404.12625 | null |
| 2024-04-17 | Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images | Junbiao Pang et.al. | 2404.10985 | null |
| 2024-03-28 | Towards Long Term SLAM on Thermal Imagery | Colin Keil et.al. | 2403.19885 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-18 | FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events | Xiangyuan Wang et.al. | 2403.11662 | link |
| 2024-03-05 | Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion | Meng Zheng et.al. | 2403.03217 | null |
| 2024-02-22 | A Self-supervised Pressure Map human keypoint Detection Approch: Optimizing Generalization and Computational Efficiency Across Datasets | Chengzhang Yu et.al. | 2402.14241 | null |
| 2024-02-25 | A Feature Matching Method Based on Multi-Level Refinement Strategy | Shaojie Zhang et.al. | 2402.13488 | null |
| 2024-03-05 | 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training Data | Zhi-Yi Lin et.al. | 2402.13172 | null |
| 2024-02-25 | Region Feature Descriptor Adapted to High Affine Transformations | Shaojie Zhang et.al. | 2402.09724 | null |
2024-5
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340 | link |
| 2024-06-06 | Monocular Localization with Semantics Map for Autonomous Vehicles | Jixiang Wan et.al. | 2406.03835 | null |
| 2024-06-05 | Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach | Saehyung Lee et.al. | 2406.03411 | link |
| 2024-06-04 | MeshVPR: Citywide Visual Place Recognition Using 3D Meshes | Gabriele Berton et.al. | 2406.02776 | null |
| 2024-06-04 | Can CLIP help CLIP in learning 3D? | Cristian Sbrolli et.al. | 2406.02202 | null |
| 2024-06-03 | Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP | Sriram Balasubramanian et.al. | 2406.01583 | null |
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-02 | Visual place recognition for aerial imagery: A survey | Ivan Moskalenko et.al. | 2406.00885 | link |
| 2024-06-01 | NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization | Wugang Meng et.al. | 2406.00312 | null |
| 2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | null |
| 2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
| 2024-05-29 | ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions | Honglin Lin et.al. | 2405.19226 | null |
| 2024-05-30 | CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval | Xintong Jiang et.al. | 2405.19149 | null |
| 2024-05-29 | SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation | Zhenbei Wu et.al. | 2405.18801 | null |
| 2024-05-29 | Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs | Jialiang Xu et.al. | 2405.18740 | link |
| 2024-05-28 | EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition | Issar Tzachor et.al. | 2405.18065 | null |
| 2024-05-28 | AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval | Sihe Zhang et.al. | 2405.17718 | null |
| 2024-05-26 | MCGMapper: Light-Weight Incremental Structure from Motion and Visual Localization With Planar Markers and Camera Groups | Yusen Xie et.al. | 2405.16599 | null |
| 2024-05-29 | Composed Image Retrieval for Remote Sensing | Bill Psomas et.al. | 2405.15587 | link |
| 2024-05-24 | Self-distilled Dynamic Fusion Network for Language-based Fashion Retrieval | Yiming Wu et.al. | 2405.15451 | null |
| 2024-05-20 | UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization | Wenjia Xu et.al. | 2405.11936 | link |
| 2024-05-19 | Register assisted aggregation for Visual Place Recognition | Xuan Yu et.al. | 2405.11526 | null |
| 2024-05-16 | FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models | Adrian Bulat et.al. | 2405.10286 | null |
| 2024-05-15 | Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study | Farnaz Khun Jush et.al. | 2405.09334 | null |
| 2024-05-14 | BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment | Lihong Jin et.al. | 2405.09001 | null |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-14 | HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval | Chao He et.al. | 2405.07524 | link |
| 2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429 | link |
| 2024-05-12 | BoQ: A Place is Worth a Bag of Learnable Queries | Amar Ali-bey et.al. | 2405.07364 | link |
| 2024-05-07 | Breast Histopathology Image Retrieval by Attention-based Adversarially Regularized Variational Graph Autoencoder with Contrastive Learning-Based Feature Extraction | Nematollah Saeidi et.al. | 2405.04211 | null |
| 2024-05-06 | A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions | Sharath Raghvendra et.al. | 2405.03664 | null |
| 2024-05-06 | Knowledge-aware Text-Image Retrieval for Remote Sensing Images | Li Mi et.al. | 2405.03373 | null |
| 2024-05-06 | Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval | Jiacheng Cheng et.al. | 2405.03190 | null |
| 2024-05-05 | iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | Lorenzo Agnolucci et.al. | 2405.02951 | link |
| 2024-05-01 | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval | Young Kyun Jang et.al. | 2405.00571 | null |
| 2024-04-30 | Large Language Model Informed Patent Image Retrieval | Hao-Cheng Lo et.al. | 2404.19360 | null |
| 2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
| 2024-04-29 | Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models | Hongyi Zhu et.al. | 2404.18746 | null |
| 2024-04-29 | Dual-Modal Prompting for Sketch-Based Image Retrieval | Liying Gao et.al. | 2404.18695 | null |
| 2024-05-01 | Semantic Line Combination Detector | Jinwon Ko et.al. | 2404.18399 | link |
| 2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
| 2024-04-25 | CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching | Samia Shafique et.al. | 2404.16972 | null |
| 2024-04-29 | Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval | Ryoya Nara et.al. | 2404.16398 | null |
| 2024-04-24 | Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval | Haokun Wen et.al. | 2404.15875 | link |
| 2024-04-24 | DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines | Xin Jiang et.al. | 2404.15771 | null |
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977 | null |
| 2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509 | null |
| 2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914 | null |
| 2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832 | link |
| 2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630 | null |
| 2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196 | null |
| 2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384 | link |
| 2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084 | null |
| 2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614 | null |
| 2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531 | null |
| 2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
| 2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940 | null |
| 2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421 | null |
| 2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397 | null |
| 2024-05-27 | $\text{Di}^2\text{Pose}$ : Discrete Diffusion Model for Occluded 3D Human Pose Estimation | Weiquan Wang et.al. | 2405.17016 | null |
| 2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867 | null |
| 2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm – Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464 | link |
| 2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008 | null |
| 2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731 | link |
| 2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
| 2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235 | null |
| 2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728 | null |
| 2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646 | link |
| 2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247 | null |
| 2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070 | link |
| 2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677 | link |
| 2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448 | null |
| 2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257 | null |
| 2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129 | link |
| 2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557 | null |
| 2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423 | null |
| 2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320 | null |
| 2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059 | null |
| 2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801 | link |
| 2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429 | link |
| 2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027 | null |
| 2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959 | null |
| 2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
| 2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241 | null |
| 2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858 | null |
| 2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817 | null |
| 2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807 | null |
| 2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | null |
| 2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428 | null |
| 2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216 | link |
| 2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164 | null |
| 2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890 | null |
| 2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609 | null |
| 2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969 | null |
| 2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959 | null |
| 2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689 | null |
| 2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545 | link |
| 2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055 | null |
| 2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880 | null |
| 2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241 | null |
| 2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114 | link |
| 2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918 | null |
| 2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673 | null |
| 2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | null |
| 2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | null |
| 2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112 | null |
| 2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107 | null |
| 2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066 | null |
| 2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600 | null |
| 2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541 | link |
| 2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
| 2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279 | null |
| 2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
| 2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628 | null |
| 2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395 | null |
| 2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394 | null |
| 2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837 | null |
| 2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-06-28 | Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics | Chengrui Gao et.al. | 2406.19672 | null |
| 2024-07-23 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837 | link |
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-23 | W-Net: A Facial Feature-Guided Face Super-Resolution Network | Hao Liu et.al. | 2406.00676 | null |
| 2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085 | null |
| 2024-06-01 | Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection – Towards Precise Fish Morphological Assessment in Aquaculture Breeding | Weizhen Liu et.al. | 2405.12476 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-15 | Vector-Symbolic Architecture for Event-Based Optical Flow | Hongzhi You et.al. | 2405.08300 | null |
| 2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594 | null |
| 2024-05-08 | Unsupervised Skin Feature Tracking with Deep Neural Networks | Jose Chang et.al. | 2405.04943 | null |
| 2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
| 2024-04-30 | A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images | Wang Zhang et.al. | 2404.19311 | null |
| 2024-04-25 | Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach | Tahmim Hossain et.al. | 2404.14560 | null |
| 2024-04-19 | SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers | Vandad Davoodnia et.al. | 2404.12625 | null |
| 2024-04-17 | Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images | Junbiao Pang et.al. | 2404.10985 | null |
| 2024-03-28 | Towards Long Term SLAM on Thermal Imagery | Colin Keil et.al. | 2403.19885 | link |
| 2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
| 2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
| 2024-03-18 | FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events | Xiangyuan Wang et.al. | 2403.11662 | link |
2024-6
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-06-28 | Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics | Chengrui Gao et.al. | 2406.19672 | null |
| 2024-07-23 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837 | link |
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-23 | W-Net: A Facial Feature-Guided Face Super-Resolution Network | Hao Liu et.al. | 2406.00676 | null |
| 2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085 | null |
| 2024-06-01 | Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection – Towards Precise Fish Morphological Assessment in Aquaculture Breeding | Weizhen Liu et.al. | 2405.12476 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-15 | Vector-Symbolic Architecture for Event-Based Optical Flow | Hongzhi You et.al. | 2405.08300 | null |
| 2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594 | null |
| 2024-05-08 | Unsupervised Skin Feature Tracking with Deep Neural Networks | Jose Chang et.al. | 2405.04943 | null |
| 2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
| 2024-04-30 | A Light-weight Transformer-based Self-supervised Matching Network for Heterogeneous Images | Wang Zhang et.al. | 2404.19311 | null |
| 2024-04-25 | Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach | Tahmim Hossain et.al. | 2404.14560 | null |
| 2024-04-19 | SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers | Vandad Davoodnia et.al. | 2404.12625 | null |
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990 | null |
| 2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918 | link |
| 2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455 | null |
| 2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129 | null |
| 2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104 | null |
| 2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811 | null |
| 2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303 | null |
| 2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013 | null |
| 2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848 | null |
| 2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | null |
| 2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots’oehli et.al. | 2407.00252 | null |
| 2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726 | null |
| 2024-06-28 | CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services | DongKi Noh et.al. | 2406.19634 | null |
| 2024-06-27 | Multimodal Visual-haptic pose estimation in the presence of transient occlusion | Michael Zechmair et.al. | 2406.19323 | null |
| 2024-06-27 | Human Modelling and Pose Estimation Overview | Pawel Knap et.al. | 2406.19290 | null |
| 2024-06-26 | Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference | Yuan Gao et.al. | 2406.18453 | link |
| 2024-06-27 | Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods | Filipe Gama et.al. | 2406.17382 | null |
| 2024-06-24 | High-resolution open-vocabulary object 6D pose estimation | Jaime Corsetti et.al. | 2406.16384 | null |
| 2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204 | link |
| 2024-06-21 | Efficient Human Pose Estimation: Leveraging Advanced Techniques with MediaPipe | Sandeep Singh Sengar et.al. | 2406.15649 | link |
| 2024-06-24 | Investigating the impact of 2D gesture representation on co-speech gesture generation | Teo Guichoux et.al. | 2406.15111 | null |
| 2024-06-20 | Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data | Moira Shooter et.al. | 2406.14412 | null |
| 2024-06-20 | PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions | Sihan Ma et.al. | 2406.14367 | null |
| 2024-06-19 | NeRF-Feat: 6D Object Pose Estimation using Feature Rendering | Shishir Reddy Vutukur et.al. | 2406.13796 | null |
| 2024-06-19 | CNN Based Flank Predictor for Quadruped Animal Species | Vanessa Suessle et.al. | 2406.13588 | null |
| 2024-06-19 | MVSBoost: An Efficient Point Cloud-based 3D Reconstruction | Umair Haroon et.al. | 2406.13515 | null |
| 2024-06-19 | An Efficient yet High-Performance Method for Precise Radar-Based Imaging of Human Hand Poses | Johanna Bräunig et.al. | 2406.13464 | null |
| 2024-06-18 | Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings | Ruijie Tang et.al. | 2406.13048 | null |
| 2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766 | null |
| 2024-06-17 | Domain Generalization for In-Orbit 6D Pose Estimation | Antoine Legrand et.al. | 2406.11743 | null |
| 2024-06-17 | SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking | Tianhong Catherine Yu et.al. | 2406.11645 | null |
| 2024-06-14 | Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization | Wonho Song et.al. | 2406.11599 | null |
| 2024-06-15 | MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception | M. Mahbubur Rahman et.al. | 2406.10708 | null |
| 2024-06-15 | Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference | Shayan Shekarforoush et.al. | 2406.10455 | null |
| 2024-06-14 | The BabyView dataset: High-resolution egocentric videos of infants’ and young children’s everyday experiences | Bria Long et.al. | 2406.10447 | null |
| 2024-06-14 | OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics | Yoni Gozlan et.al. | 2406.09788 | null |
| 2024-06-13 | ImageNet3D: Towards General-Purpose Object-Level 3D Understanding | Wufei Ma et.al. | 2406.09613 | link |
| 2024-06-13 | Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV | Maneesha Wickramasuriya et.al. | 2406.09260 | link |
| 2024-06-14 | Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning | Huy Hoang Nguyen et.al. | 2406.09039 | null |
| 2024-06-14 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394 | link |
| 2024-06-12 | Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization | Jiaxin Deng et.al. | 2406.08001 | null |
| 2024-06-12 | IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes | Fengtian Lang et.al. | 2406.07937 | link |
| 2024-06-12 | From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers | Swaminathan Gurumurthy et.al. | 2406.07785 | link |
| 2024-06-12 | SPIN: Spacecraft Imagery for Navigation | Javier Montalvo et.al. | 2406.07500 | link |
| 2024-06-11 | Realistic Data Generation for 6D Pose Estimation of Surgical Instruments | Juan Antonio Barragan et.al. | 2406.07328 | link |
| 2024-06-11 | SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale | Shester Gueuwou et.al. | 2406.06907 | null |
| 2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374 | link |
| 2024-06-08 | A preprocessing-based planning framework for utilizing contacts in high-precision insertion tasks | Muhammad Suhail Saleem et.al. | 2406.05522 | null |
| 2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340 | link |
| 2024-06-06 | Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking | Jiyao Zhang et.al. | 2406.04316 | null |
| 2024-06-05 | Hi5: 2D Hand Pose Estimation with Zero Human Annotation | Masum Hasan et.al. | 2406.03599 | null |
| 2024-06-05 | Sparse Color-Code Net: Real-Time RGB-Based 6D Object Pose Estimation on Edge Devices | Xingjian Yang et.al. | 2406.02977 | null |
| 2024-06-04 | CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Dejia Xu et.al. | 2406.02509 | null |
| 2024-06-04 | HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model | Yu Tian et.al. | 2406.01914 | null |
| 2024-06-03 | A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios | Enrico Martini et.al. | 2406.01832 | link |
| 2024-06-01 | Equivariant amortized inference of poses for cryo-EM | Larissa de Ruijter et.al. | 2406.01630 | null |
| 2024-06-03 | 3D WholeBody Pose Estimation based on Semantic Graph Attention Network and Distance Information | Sihan Wen et.al. | 2406.01196 | null |
| 2024-06-01 | CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation | Matan Rusanovsky et.al. | 2406.00384 | link |
| 2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084 | null |
| 2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614 | null |
| 2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531 | null |
| 2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-07-08 | Pseudo-triplet Guided Few-shot Composed Image Retrieval | Bohan Hou et.al. | 2407.06001 | null |
| 2024-07-09 | HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Yingying Jiang et.al. | 2407.05795 | null |
| 2024-07-05 | Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning | Mainak Singha et.al. | 2407.04207 | null |
| 2024-07-04 | Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models | Chang-Sheng Kao et.al. | 2407.03615 | link |
| 2024-07-03 | Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach | Pronay Debnath et.al. | 2407.03486 | null |
| 2024-07-02 | Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition | Sergio Izquierdo et.al. | 2407.02422 | link |
| 2024-07-01 | Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval | Aneeshan Sain et.al. | 2407.01810 | null |
| 2024-07-01 | Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Hanwen Su et.al. | 2407.00979 | null |
| 2024-07-01 | Dynamically Modulating Visual Place Recognition Sequence Length For Minimum Acceptable Performance Scenarios | Connor Malone et.al. | 2407.00863 | null |
| 2024-06-27 | PathAlign: A vision-language model for whole slide images in histopathology | Faruk Ahmed et.al. | 2406.19578 | null |
| 2024-07-05 | 360 in the Wild: Dataset for Depth Prediction and View Synthesis | Kibaek Park et.al. | 2406.18898 | null |
| 2024-06-27 | Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs | Huaying Zhang et.al. | 2406.18836 | null |
| 2024-06-26 | WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images | Yannik Glaser et.al. | 2406.18765 | null |
| 2024-06-26 | View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis | Subin Varghese et.al. | 2406.18012 | null |
| 2024-06-25 | Tell Me Where You Are: Multimodal LLMs Meet Place Recognition | Zonglin Lyu et.al. | 2406.17520 | null |
| 2024-06-23 | Breaking the Frame: Image Retrieval by Visual Overlap Prediction | Tong Wei et.al. | 2406.16204 | link |
| 2024-06-19 | Towards a multimodal framework for remote sensing image change retrieval and captioning | Roger Ferrod et.al. | 2406.13424 | null |
| 2024-06-19 | CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval | Christian Lülf et.al. | 2406.13322 | link |
| 2024-06-17 | Matching Query Image Against Selected NeRF Feature for Efficient and Scalable Localization | Huaiji Zhou et.al. | 2406.11766 | null |
| 2024-06-22 | Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment | Jianan Jiang et.al. | 2406.11551 | link |
| 2024-06-17 | They’re All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias | Salma Abdel Magid et.al. | 2406.11331 | null |
| 2024-06-17 | Accurate and Fast Pixel Retrieval with Spatial and Uncertainty Aware Hypergraph Diffusion | Guoyuan An et.al. | 2406.11242 | null |
| 2024-06-14 | Annotation Cost-Efficient Active Learning for Deep Metric Learning Driven Remote Sensing Image Retrieval | Genc Hoxha et.al. | 2406.10107 | null |
| 2024-06-14 | BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval | Imanol Miranda et.al. | 2406.09952 | link |
| 2024-06-13 | Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases | Meng Wang et.al. | 2406.09317 | null |
| 2024-06-13 | Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval | Jaeseok Byun et.al. | 2406.09188 | null |
| 2024-06-13 | DenoiseReID: Denoising Model for Representation Learning of Person Re-Identification | Zhengrui Xu et.al. | 2406.08773 | null |
| 2024-06-12 | Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement | Maxime Pietrantoni et.al. | 2406.08463 | null |
| 2024-06-12 | ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery | Kam Woh Ng et.al. | 2406.08457 | link |
| 2024-06-11 | Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Renjie Pi et.al. | 2406.07502 | link |
| 2024-06-11 | Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning | Shuvendu Roy et.al. | 2406.07450 | link |
| 2024-06-11 | Fetch-A-Set: A Large-Scale OCR-Free Benchmark for Historical Document Retrieval | Adrià Molina et.al. | 2406.07315 | null |
| 2024-06-10 | Multicam-SLAM: Non-overlapping Multi-camera SLAM for Indirect Visual Localization and Navigation | Shenghao Li et.al. | 2406.06374 | link |
| 2024-06-09 | Unified Text-to-Image Generation and Retrieval | Leigang Qu et.al. | 2406.05814 | null |
| 2024-06-07 | The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better | Scott Geng et.al. | 2406.05184 | link |
| 2024-06-07 | PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction | Eduard Poesina et.al. | 2406.04746 | link |
| 2024-06-06 | GLACE: Global Local Accelerated Coordinate Encoding | Fangjinhua Wang et.al. | 2406.04340 | link |
| 2024-06-06 | Monocular Localization with Semantics Map for Autonomous Vehicles | Jixiang Wan et.al. | 2406.03835 | null |
| 2024-06-05 | Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach | Saehyung Lee et.al. | 2406.03411 | link |
| 2024-06-04 | MeshVPR: Citywide Visual Place Recognition Using 3D Meshes | Gabriele Berton et.al. | 2406.02776 | null |
| 2024-06-04 | Can CLIP help CLIP in learning 3D? | Cristian Sbrolli et.al. | 2406.02202 | null |
| 2024-06-03 | Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP | Sriram Balasubramanian et.al. | 2406.01583 | null |
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-02 | Visual place recognition for aerial imagery: A survey | Ivan Moskalenko et.al. | 2406.00885 | link |
| 2024-06-01 | NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization | Wugang Meng et.al. | 2406.00312 | null |
| 2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | null |
| 2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
| 2024-05-29 | ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions | Honglin Lin et.al. | 2405.19226 | null |
| 2024-05-30 | CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval | Xintong Jiang et.al. | 2405.19149 | null |
| 2024-05-29 | SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation | Zhenbei Wu et.al. | 2405.18801 | null |
2024-7
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285 | null |
| 2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110 | null |
| 2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945 | null |
| 2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850 | null |
| 2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841 | null |
| 2024-08-03 | E $^3$ NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images | Yunshan Qi et.al. | 2408.01840 | null |
| 2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728 | null |
| 2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655 | null |
| 2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566 | null |
| 2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-30 | HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation | Wencan Cheng et.al. | 2407.20542 | link |
| 2024-07-30 | Markers Identification for Relative Pose Estimation of an Uncooperative Target | Batu Candan et.al. | 2407.20515 | null |
| 2024-07-29 | BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation | Kieran Saunders et.al. | 2407.20437 | null |
| 2024-07-28 | Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph | Zhengcen Li et.al. | 2407.19497 | null |
| 2024-07-26 | Flexible graph convolutional network for 3D human pose estimation | Abu Taib Mohammed Shahjahan et.al. | 2407.19077 | null |
| 2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590 | null |
| 2024-07-28 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Zhenzhi Wang et.al. | 2407.17438 | link |
| 2024-07-24 | Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments | Wei Gao et.al. | 2407.17078 | null |
| 2024-07-30 | DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction | Xiaobiao Du et.al. | 2407.16988 | link |
| 2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961 | null |
| 2024-07-23 | COALA: A Practical and Vision-Centric Federated Learning Platform | Weiming Zhuang et.al. | 2407.16560 | link |
| 2024-07-23 | Probabilistic Parameter Estimators and Calibration Metrics for Pose Estimation from Image Features | Romeo Valentin et.al. | 2407.16223 | null |
| 2024-07-23 | Optimal camera-robot pose estimation in linear time from points and lines | Guangyang Zeng et.al. | 2407.16151 | null |
| 2024-07-23 | 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images | Jie Zhao et.al. | 2407.16137 | null |
| 2024-07-21 | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models | Zheng Chong et.al. | 2407.15886 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-22 | Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection | Kangqi Ma et.al. | 2407.15771 | null |
| 2024-07-22 | 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model | Matteo Bortolon et.al. | 2407.15484 | null |
| 2024-07-23 | Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions | Yihao Ai et.al. | 2407.15451 | null |
| 2024-07-22 | avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality | Dizhi Ma et.al. | 2407.15373 | null |
| 2024-07-20 | From Underground Mines to Offices: A Versatile and Robust Framework for Range-Inertial SLAM | Lorenzo Montano-Oliván et.al. | 2407.14797 | null |
| 2024-07-19 | ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose Estimation | Luke Bidulka et.al. | 2407.14605 | null |
| 2024-07-19 | 6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry | Sungho Chun et.al. | 2407.14136 | link |
| 2024-07-18 | RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark | Yuan-Hao Ho et.al. | 2407.13930 | null |
| 2024-07-19 | GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation | Bangyan Liao et.al. | 2407.13537 | null |
| 2024-07-18 | SCAPE: A Simple and Strong Category-Agnostic Pose Estimator | Yujia Liang et.al. | 2407.13483 | link |
| 2024-07-17 | SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization | Yiyang Chen et.al. | 2407.12667 | link |
| 2024-07-17 | Invertible Neural Warp for NeRF | Shin-Fang Chng et.al. | 2407.12354 | null |
| 2024-07-16 | NeuSurfEmb: A Complete Pipeline for Dense Correspondence-based 6D Object Pose Estimation without CAD Models | Francesco Milano et.al. | 2407.12207 | link |
| 2024-07-16 | Monocular pose estimation of articulated surgical instruments in open surgery | Robert Spektor et.al. | 2407.12138 | null |
| 2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736 | link |
| 2024-07-16 | TCFormer: Visual Recognition via Token Clustering Transformer | Wang Zeng et.al. | 2407.11321 | link |
| 2024-07-15 | A BlueROV2-based platform for underwater mapping experiments | Tudor Alinei-Poiana et.al. | 2407.10901 | null |
| 2024-07-15 | LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning | Zhuozhu Jian et.al. | 2407.10782 | null |
| 2024-07-15 | Domain Generalization for 6D Pose Estimation Through NeRF-based Image Synthesis | Antoine Legrand et.al. | 2407.10762 | null |
| 2024-07-16 | GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation | Haonan Wang et.al. | 2407.10756 | null |
| 2024-07-15 | Learning to Estimate the Pose of a Peer Robot in a Camera Image by Predicting the States of its LEDs | Nicholas Carlotti et.al. | 2407.10661 | null |
| 2024-07-15 | Deep-Learning-Based Markerless Pose Estimation Systems in Gait Analysis: DeepLabCut Custom Training and the Refinement Function | Giulia Panconi et.al. | 2407.10590 | null |
| 2024-07-14 | 3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects | Weiming Zhi et.al. | 2407.10331 | null |
| 2024-07-16 | psifx – Psychological and Social Interactions Feature Extraction Package | Guillaume Rochette et.al. | 2407.10266 | null |
| 2024-07-14 | PAFUSE: Part-based Diffusion for 3D Whole-Body Pose Estimation | Nermin Samet et.al. | 2407.10220 | null |
| 2024-07-14 | 3DEgo: 3D Editing on the Go! | Umar Khalid et.al. | 2407.10102 | null |
| 2024-07-12 | iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning | Tom Fischer et.al. | 2407.09271 | null |
| 2024-07-12 | HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation | Manuel Birlo et.al. | 2407.09215 | null |
| 2024-07-12 | KGpose: Keypoint-Graph Driven End-to-End Multi-Object 6D Pose Estimation via Point-Wise Pose Voting | Andrew Jeong et.al. | 2407.08909 | null |
| 2024-07-11 | RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation | Tao Jiang et.al. | 2407.08634 | link |
| 2024-07-11 | SRPose: Two-view Relative Pose Estimation with Sparse Keypoints | Rui Yin et.al. | 2407.08199 | link |
| 2024-07-11 | SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM | Neng Wang et.al. | 2407.08106 | null |
| 2024-07-10 | RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects | Jiahao Nick Li et.al. | 2407.08081 | null |
| 2024-07-10 | Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization | Jinjie Mai et.al. | 2407.08023 | link |
| 2024-07-10 | Greit-HRNet: Grouped Lightweight High-Resolution Network for Human Pose Estimation | Junjia Han et.al. | 2407.07389 | null |
| 2024-07-09 | Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images | Chuanrui Zhang et.al. | 2407.06984 | null |
| 2024-07-09 | Computer vision tasks for intelligent aerospace missions: An overview | Huilin Chen et.al. | 2407.06513 | null |
| 2024-07-08 | GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields | Weiyi Xue et.al. | 2407.05597 | null |
| 2024-07-10 | On the power of data augmentation for head pose estimation | Michael Welter et.al. | 2407.05357 | null |
| 2024-07-07 | SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning | Yi Feng et.al. | 2407.05283 | link |
| 2024-07-05 | Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos | Leonhard Sommer et.al. | 2407.04384 | link |
| 2024-07-04 | Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation | Laiyan Ding et.al. | 2407.04041 | null |
| 2024-07-04 | Markerless Multi-view 3D Human Pose Estimation: a survey | Ana Filipa Rodrigues Nogueira et.al. | 2407.03817 | null |
| 2024-07-04 | A Fast Dynamic Point Detection Method for LiDAR-Inertial Odometry in Driving Scenarios | Zikang Yuan et.al. | 2407.03590 | null |
| 2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990 | null |
| 2024-07-03 | Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction | Jiaxin Guo et.al. | 2407.02918 | link |
| 2024-07-02 | SUPER: Seated Upper Body Pose Estimation using mmWave Radars | Bo Zhang et.al. | 2407.02455 | null |
| 2024-07-02 | ReliaAvatar: A Robust Real-Time Avatar Animator with Integrated Motion Prediction | Bo Qian et.al. | 2407.02129 | null |
| 2024-07-02 | Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval | Nicola Messina et.al. | 2407.02104 | null |
| 2024-07-01 | Active Human Pose Estimation via an Autonomous UAV Agent | Jingxi Chen et.al. | 2407.01811 | null |
| 2024-07-01 | RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields | Haochen Jiang et.al. | 2407.01303 | null |
| 2024-07-01 | Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization | Ruofei Bai et.al. | 2407.01013 | null |
| 2024-06-30 | Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation | Adnan Abdullah et.al. | 2407.00848 | null |
| 2024-06-29 | When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration | Philipp Allgeuer et.al. | 2407.00518 | null |
| 2024-06-28 | Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review | Moseli Mots’oehli et.al. | 2407.00252 | null |
| 2024-06-28 | EPOCH: Jointly Estimating the 3D Pose of Cameras and Humans | Nicola Garau et.al. | 2406.19726 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-08-19 | Fashion Image-to-Image Translation for Complementary Item Retrieval | Matteo Attimonelli et.al. | 2408.09847 | null |
| 2024-08-20 | MambaLoc: Efficient Camera Localisation via State Space Model | Jialu Wang et.al. | 2408.09680 | null |
| 2024-08-15 | DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions | Ryosuke Korekata et.al. | 2408.07910 | null |
| 2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648 | null |
| 2024-08-10 | Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network | Junyan Ye et.al. | 2408.05475 | link |
| 2024-08-09 | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos | Heeseung Yun et.al. | 2408.05364 | null |
| 2024-08-06 | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval | Pavel Suma et.al. | 2408.03282 | null |
| 2024-08-05 | CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2408.02394 | null |
| 2024-08-02 | On Validation of Search & Retrieval of Tissue Images in Digital Pathology | H. R. Tizhoosh et.al. | 2408.01570 | null |
| 2024-07-31 | VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning | Yuhang Ming et.al. | 2407.21416 | null |
| 2024-07-30 | Re-localization acceleration with Medoid Silhouette Clustering | Hongyi Zhang et.al. | 2407.20749 | null |
| 2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590 | null |
| 2024-07-24 | Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation | Yongqi Li et.al. | 2407.17274 | null |
| 2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961 | null |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-19 | Double-Layer Soft Data Fusion for Indoor Robot WiFi-Visual Localization | Yuehua Ding et.al. | 2407.14643 | null |
| 2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
| 2024-07-17 | Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM | Markus Weißflog et.al. | 2407.12408 | null |
| 2024-07-17 | GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection | Jingwen Yu et.al. | 2407.11736 | link |
| 2024-07-16 | EndoFinder: Online Image Retrieval for Explainable Colorectal Polyp Diagnosis | Ruijie Yang et.al. | 2407.11401 | null |
| 2024-07-15 | No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | Walter Simoncini et.al. | 2407.10964 | link |
| 2024-07-15 | DINO Pre-training for Vision-based End-to-end Autonomous Driving | Shubham Juneja et.al. | 2407.10803 | null |
| 2024-07-15 | Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval | Youngsun Lim et.al. | 2407.10683 | null |
| 2024-07-15 | An evaluation of CNN models and data augmentation techniques in hierarchical localization of mobile robots | J. J. Cabrera et.al. | 2407.10596 | link |
| 2024-07-15 | An experimental evaluation of Siamese Neural Networks for robot localization using omnidirectional imaging in indoor environments | J. J. Cabrera et.al. | 2407.10536 | null |
| 2024-07-12 | Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval | Vaibhav Balloli et.al. | 2407.08908 | link |
| 2024-07-11 | Improving Visual Place Recognition Based Robot Navigation Through Verification of Localization Estimates | Owen Claxton et.al. | 2407.08162 | link |
| 2024-07-12 | Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal | Xinyu Zhu et.al. | 2407.08153 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-09 | CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding | Wenhao Xu et.al. | 2407.06611 | null |
| 2024-07-08 | Pseudo-triplet Guided Few-shot Composed Image Retrieval | Bohan Hou et.al. | 2407.06001 | null |
| 2024-07-09 | HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels | Yingying Jiang et.al. | 2407.05795 | null |
| 2024-07-05 | Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning | Mainak Singha et.al. | 2407.04207 | link |
| 2024-07-04 | Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models | Chang-Sheng Kao et.al. | 2407.03615 | link |
| 2024-07-03 | Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach | Pronay Debnath et.al. | 2407.03486 | null |
| 2024-07-02 | Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition | Sergio Izquierdo et.al. | 2407.02422 | link |
| 2024-07-01 | Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval | Aneeshan Sain et.al. | 2407.01810 | null |
| 2024-07-01 | Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval | Hanwen Su et.al. | 2407.00979 | null |
| 2024-07-01 | Dynamically Modulating Visual Place Recognition Sequence Length For Minimum Acceptable Performance Scenarios | Connor Malone et.al. | 2407.00863 | null |
| 2024-06-27 | PathAlign: A vision-language model for whole slide images in histopathology | Faruk Ahmed et.al. | 2406.19578 | null |
| 2024-07-05 | 360 in the Wild: Dataset for Depth Prediction and View Synthesis | Kibaek Park et.al. | 2406.18898 | null |
| 2024-06-27 | Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs | Huaying Zhang et.al. | 2406.18836 | null |
| 2024-06-26 | WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images | Yannik Glaser et.al. | 2406.18765 | null |
| 2024-06-26 | View-Invariant Pixelwise Anomaly Detection in Multi-object Scenes with Adaptive View Synthesis | Subin Varghese et.al. | 2406.18012 | null |
| 2024-06-25 | Tell Me Where You Are: Multimodal LLMs Meet Place Recognition | Zonglin Lyu et.al. | 2406.17520 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection | Hongru Yan et.al. | 2410.01404 | null |
| 2024-09-30 | OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Changsheng Lu et.al. | 2409.19899 | null |
| 2024-10-07 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-20 | Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators | Niloufar Amiri et.al. | 2409.13668 | null |
| 2024-09-25 | Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding | Rania Hossam et.al. | 2409.08695 | link |
| 2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
| 2024-10-01 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-06-28 | Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics | Chengrui Gao et.al. | 2406.19672 | null |
| 2024-07-23 | A Certifiable Algorithm for Simultaneous Shape Estimation and Object Tracking | Lorenzo Shaikewitz et.al. | 2406.16837 | link |
| 2024-06-03 | Scale-Free Image Keypoints Using Differentiable Persistent Homology | Giovanni Barbarani et.al. | 2406.01315 | link |
| 2024-06-23 | W-Net: A Facial Feature-Guided Face Super-Resolution Network | Hao Liu et.al. | 2406.00676 | null |
| 2024-05-25 | Deep-PE: A Learning-Based Pose Evaluator for Point Cloud Registration | Junjie Gao et.al. | 2405.16085 | null |
| 2024-06-01 | Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection – Towards Precise Fish Morphological Assessment in Aquaculture Breeding | Weizhen Liu et.al. | 2405.12476 | link |
| 2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
| 2024-05-15 | Vector-Symbolic Architecture for Event-Based Optical Flow | Hongzhi You et.al. | 2405.08300 | null |
| 2024-05-13 | RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration | Congjia Chen et.al. | 2405.07594 | null |
2024-8
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715 | null |
| 2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581 | null |
| 2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224 | null |
| 2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011 | null |
| 2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879 | link |
| 2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002 | null |
| 2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869 | null |
| 2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744 | link |
| 2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736 | null |
| 2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449 | null |
| 2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373 | null |
| 2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297 | null |
| 2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168 | null |
| 2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690 | null |
| 2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547 | link |
| 2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540 | null |
| 2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536 | null |
| 2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810 | link |
| 2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761 | link |
| 2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750 | null |
| 2024-08-28 | Benchmarking ML Approaches to UWB-Based Range-Only Posture Recognition for Human Robot-Interaction | Salma Salimi et.al. | 2408.15717 | null |
| 2024-08-26 | Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model | Abu Saleh Musa Miah et.al. | 2408.14111 | null |
| 2024-08-25 | InterTrack: Tracking Human Object Interaction without Object Templates | Xianghui Xie et.al. | 2408.13953 | null |
| 2024-08-24 | Temporally-consistent 3D Reconstruction of Birds | Johannes Hägerlind et.al. | 2408.13629 | null |
| 2024-08-24 | Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation | Jianing Song et.al. | 2408.13587 | null |
| 2024-08-27 | Sapiens: Foundation for Human Vision Models | Rawal Khirodkar et.al. | 2408.12569 | null |
| 2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085 | null |
| 2024-08-20 | ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data | Elia Bonetto et.al. | 2408.10831 | null |
| 2024-08-20 | MPL: Lifting 3D Human Pose from Multi-view 2D Poses | Seyed Abolfazl Ghasemzadeh et.al. | 2408.10805 | link |
| 2024-08-19 | RUMI: Rummaging Using Mutual Information | Sheng Zhong et.al. | 2408.10450 | null |
| 2024-08-19 | SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views | Chao Xu et.al. | 2408.10195 | null |
| 2024-08-19 | SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition | Wiktor Mucha et.al. | 2408.10037 | link |
| 2024-08-19 | Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation | Qianhui Men et.al. | 2408.09931 | null |
| 2024-08-18 | OPPH: A Vision-Based Operator for Measuring Body Movements for Personal Healthcare | Chen Long-fei et.al. | 2408.09409 | null |
| 2024-08-17 | An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface | Kevin Jose Thomas et.al. | 2408.09311 | link |
| 2024-08-16 | ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation | Hao Tang et.al. | 2408.09042 | null |
| 2024-08-16 | Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS | Wei Sun et.al. | 2408.08723 | null |
| 2024-08-16 | SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis | Xingyue Lin et.al. | 2408.08623 | null |
| 2024-08-15 | HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning | Hongyu Li et.al. | 2408.08312 | null |
| 2024-08-15 | Comparative Evaluation of 3D Reconstruction Methods for Object Pose Estimation | Varun Burde et.al. | 2408.08234 | link |
| 2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-08-15 | Your Turn: Real-World Turning Angle Estimation for Parkinson’s Disease Severity Assessment | Qiushuo Cheng et.al. | 2408.08182 | null |
| 2024-08-15 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang et.al. | 2408.07975 | null |
| 2024-08-15 | GOReloc: Graph-based Object-Level Relocalization for Visual SLAM | Yutong Wang et.al. | 2408.07917 | link |
| 2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648 | null |
| 2024-08-12 | UniT: Unified Tactile Representation for Robot Learning | Zhengtong Xu et.al. | 2408.06481 | link |
| 2024-08-12 | Moo-ving Beyond Tradition: Revolutionizing Cattle Behavioural Phenotyping with Pose Estimation Techniques | Navid Ghassemi et.al. | 2408.06336 | null |
| 2024-08-12 | CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments | Yanpeng Jia et.al. | 2408.05981 | null |
| 2024-08-12 | PAFormer: Part Aware Transformer for Person Re-identification | Hyeono Jung et.al. | 2408.05918 | null |
| 2024-08-11 | SABER-6D: Shape Representation Based Implicit Object Pose Estimation | Shishir Reddy Vutukur et.al. | 2408.05867 | null |
| 2024-08-10 | Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis | Zhongche Qu et.al. | 2408.05635 | null |
| 2024-08-10 | Anticipation through Head Pose Estimation: a preliminary study | Federico Figari Tomenotti et.al. | 2408.05516 | null |
| 2024-08-09 | Mesh-based Object Tracking for Dynamic Semantic 3D Scene Graphs via Ray Tracing | Lennart Niecksch et.al. | 2408.04979 | null |
| 2024-08-07 | PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model | Yunlong Huang et.al. | 2408.03540 | null |
| 2024-08-06 | Line-based 6-DoF Object Pose Estimation and Tracking With an Event Camera | Zibin Liu et.al. | 2408.03225 | link |
| 2024-08-06 | Training on the Fly: On-device Self-supervised Learning aboard Nano-drones within 20 mW | Elia Cereda et.al. | 2408.03168 | null |
| 2024-08-06 | BodySLAM: A Generalized Monocular Visual SLAM Framework for Surgical Applications | G. Manni et.al. | 2408.03078 | link |
| 2024-08-07 | Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network | Xinyi Zhang et.al. | 2408.02922 | null |
| 2024-08-05 | Analyzing Data Efficiency and Performance of Machine Learning Algorithms for Assessing Low Back Pain Physical Rehabilitation Exercises | Aleksa Marusic et.al. | 2408.02855 | null |
| 2024-08-05 | Joint-Motion Mutual Learning for Pose Estimation in Videos | Sifan Wu et.al. | 2408.02285 | null |
| 2024-08-04 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | Feichi Lu et.al. | 2408.02110 | null |
| 2024-08-04 | Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem | Tian Zhan et.al. | 2408.01945 | null |
| 2024-08-03 | MotionTrace: IMU-based Field of View Prediction for Smartphone AR Interactions | Rahul Islam et.al. | 2408.01850 | null |
| 2024-08-03 | BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles | Lun Luo et.al. | 2408.01841 | null |
| 2024-08-03 | E $^3$ NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images | Yunshan Qi et.al. | 2408.01840 | null |
| 2024-08-03 | Survey on Emotion Recognition through Posture Detection and the possibility of its application in Virtual Reality | Leina Elansary et.al. | 2408.01728 | null |
| 2024-08-03 | Stimulating Imagination: Towards General-purpose Object Rearrangement | Jianyang Wu et.al. | 2408.01655 | null |
| 2024-08-02 | Full-range Head Pose Geometric Data Augmentations | Huei-Chung Hu et.al. | 2408.01566 | null |
| 2024-07-31 | Adapting Skills to Novel Grasps: A Self-Supervised Approach | Georgios Papagiannis et.al. | 2408.00178 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-09-04 | Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications | Abby Stylianou et.al. | 2409.03012 | null |
| 2024-09-04 | NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval | Sepanta Zeighami et.al. | 2409.02343 | link |
| 2024-09-03 | Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment | Konstantin Schall et.al. | 2409.01936 | link |
| 2024-09-02 | A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches | Kim Jinwoo et.al. | 2409.01219 | null |
| 2024-09-02 | Evidential Transformers for Improved Image Retrieval | Danilo Dordevic et.al. | 2409.01082 | null |
| 2024-09-05 | EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System | Bonan Liu et.al. | 2409.00343 | null |
| 2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373 | null |
| 2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
| 2024-08-29 | A compact neuromorphic system for ultra energy-efficient, on-device robot localization | Adam D. Hines et.al. | 2408.16754 | link |
| 2024-08-29 | Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models | Kengo Nakata et.al. | 2408.16296 | null |
| 2024-08-28 | Temporal Attention for Cross-View Sequential Image Localization | Dong Yuan et.al. | 2408.15569 | null |
| 2024-08-27 | Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild | Tianqi Wei et.al. | 2408.14723 | null |
| 2024-08-25 | LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task | Ali Asgarov et.al. | 2408.13909 | link |
| 2024-08-15 | Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval | Lifeng Zhou et.al. | 2408.13705 | null |
| 2024-08-15 | Coarse-to-fine Alignment Makes Better Speech-image Retrieval | Lifeng Zhou et.al. | 2408.13119 | null |
| 2024-08-21 | FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization | Son Tung Nguyen et.al. | 2408.12037 | link |
| 2024-08-21 | Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations | Lintong Zhang et.al. | 2408.11966 | null |
| 2024-08-21 | UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao et.al. | 2408.11305 | link |
| 2024-08-20 | GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting | Changkun Liu et.al. | 2408.11085 | null |
| 2024-08-19 | BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval | Zhenyu Lu et.al. | 2408.10383 | null |
| 2024-08-23 | Fashion Image-to-Image Translation for Complementary Item Retrieval | Matteo Attimonelli et.al. | 2408.09847 | null |
| 2024-08-20 | MambaLoc: Efficient Camera Localisation via State Space Model | Jialu Wang et.al. | 2408.09680 | null |
| 2024-08-15 | DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions | Ryosuke Korekata et.al. | 2408.07910 | null |
| 2024-08-13 | A Miniature Vision-Based Localization System for Indoor Blimps | Shicong Ma et.al. | 2408.06648 | null |
| 2024-08-10 | Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network | Junyan Ye et.al. | 2408.05475 | link |
| 2024-08-09 | Spherical World-Locking for Audio-Visual Localization in Egocentric Videos | Heeseung Yun et.al. | 2408.05364 | null |
| 2024-08-06 | AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval | Pavel Suma et.al. | 2408.03282 | null |
| 2024-08-05 | CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration | Gongxin Yao et.al. | 2408.02394 | null |
| 2024-08-02 | On Validation of Search & Retrieval of Tissue Images in Digital Pathology | H. R. Tizhoosh et.al. | 2408.01570 | null |
| 2024-07-31 | VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning | Yuhang Ming et.al. | 2407.21416 | null |
| 2024-07-30 | Re-localization acceleration with Medoid Silhouette Clustering | Hongyi Zhang et.al. | 2407.20749 | null |
| 2024-07-26 | From 2D to 3D: AISG-SLA Visual Localization Challenge | Jialin Gao et.al. | 2407.18590 | null |
| 2024-07-24 | Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation | Yongqi Li et.al. | 2407.17274 | null |
| 2024-07-24 | Pose Estimation from Camera Images for Underwater Inspection | Luyuan Peng et.al. | 2407.16961 | null |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-19 | Double-Layer Soft Data Fusion for Indoor Robot WiFi-Visual Localization | Yuehua Ding et.al. | 2407.14643 | null |
| 2024-07-18 | Visual Haystacks: Answering Harder Questions About Sets of Images | Tsung-Han Wu et.al. | 2407.13766 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-09-30 | OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Changsheng Lu et.al. | 2409.19899 | null |
| 2024-09-26 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-20 | Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators | Niloufar Amiri et.al. | 2409.13668 | null |
| 2024-09-25 | Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding | Rania Hossam et.al. | 2409.08695 | link |
| 2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
| 2024-08-15 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
| 2024-06-28 | Beyond First-Order: A Multi-Scale Approach to Finger Knuckle Print Biometrics | Chengrui Gao et.al. | 2406.19672 | null |
2024-9
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643 | null |
| 2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618 | null |
| 2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293 | null |
| 2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061 | null |
| 2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713 | link |
| 2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589 | null |
| 2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469 | null |
| 2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237 | null |
| 2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127 | link |
| 2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111 | null |
| 2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986 | null |
| 2024-09-29 | PPLNs: Parametric Piecewise Linear Networks for Event-Based Temporal Modeling and Beyond | Chen Song et.al. | 2409.19772 | null |
| 2024-09-29 | GelSlim 4.0: Focusing on Touch and Reproducibility | Andrea Sipos et.al. | 2409.19770 | null |
| 2024-09-27 | Robust Proximity Operations using Probabilistic Markov Models | Deep Parikh et.al. | 2409.19062 | null |
| 2024-09-27 | Exploiting Motion Prior for Accurate Pose Estimation of Dashboard Cameras | Yipeng Lu et.al. | 2409.18673 | null |
| 2024-09-27 | DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences | Jingwei Song et.al. | 2409.18457 | null |
| 2024-09-26 | Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation | Mengchen Zhang et.al. | 2409.18261 | null |
| 2024-09-26 | AI-Powered Augmented Reality for Satellite Assembly, Integration and Test | Alvaro Patricio et.al. | 2409.18101 | null |
| 2024-09-27 | Leveraging Anthropometric Measurements to Improve Human Mesh Estimation and Ensure Consistent Body Shapes | Katja Ludwig et.al. | 2409.17671 | null |
| 2024-09-25 | Safe Leaf Manipulation for Accurate Shape and Pose Estimation of Occluded Fruits | Shaoxiong Yao et.al. | 2409.17389 | null |
| 2024-09-25 | Hierarchical Tri-manual Planning for Vision-assisted Fruit Harvesting with Quadrupedal Robots | Zhichao Liu et.al. | 2409.17116 | null |
| 2024-09-25 | Self-Sensing for Proprioception and Contact Detection in Soft Robots Using Shape Memory Alloy Artificial Muscles | Ran Jing et.al. | 2409.17111 | null |
| 2024-09-25 | Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation | Lucas Carvalho de Lima et.al. | 2409.16680 | null |
| 2024-09-25 | FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation | Jingyi Tang et.al. | 2409.16600 | null |
| 2024-09-25 | Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots | Masoud Dayani Najafabadi et.al. | 2409.16595 | null |
| 2024-09-24 | PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings | Sutharsan Mahendren et.al. | 2409.15832 | null |
| 2024-09-24 | LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation | Ruida Zhang et.al. | 2409.15727 | null |
| 2024-09-23 | Framework for Robust Localization of UUVs and Mapping of Net Pens | David Botta et.al. | 2409.15475 | null |
| 2024-09-23 | FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera | Guoyang Zhao et.al. | 2409.15054 | link |
| 2024-09-23 | BranchPoseNet: Characterizing tree branching with a deep learning-based pose estimation approach | Stefano Puliti et.al. | 2409.14755 | link |
| 2024-09-23 | ERPoT: Effective and Reliable Pose Tracking for Mobile Robots Based on Lightweight and Compact Polygon Maps | Haiming Gao et.al. | 2409.14723 | null |
| 2024-09-22 | Tactile Functasets: Neural Implicit Representations of Tactile Datasets | Sikai Li et.al. | 2409.14592 | null |
| 2024-09-22 | AR Overlay: Training Image Pose Estimation on Curved Surface in a Synthetic Way | Sining Huang et.al. | 2409.14577 | null |
| 2024-09-22 | DROP: Dexterous Reorientation via Online Planning | Albert H. Li et.al. | 2409.14562 | null |
| 2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269 | null |
| 2024-09-18 | SpotLight: Robotic Scene Understanding through Interaction and Affordance Detection | Tim Engelbracht et.al. | 2409.11870 | null |
| 2024-09-18 | End-to-End Probabilistic Geometry-Guided Regression for 6DoF Object Pose Estimation | Thomas Pöllabauer et.al. | 2409.11819 | null |
| 2024-09-18 | Bridging Domain Gap for Flight-Ready Spaceborne Vision | Tae Ha Park et.al. | 2409.11661 | null |
| 2024-09-17 | Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification | Frederik Hagelskjær et.al. | 2409.11512 | null |
| 2024-09-17 | Training Datasets Generation for Machine Learning: Application to Vision Based Navigation | Jérémy Lebreton et.al. | 2409.11383 | null |
| 2024-09-17 | OmniGen: Unified Image Generation | Shitao Xiao et.al. | 2409.11340 | link |
| 2024-09-17 | ULOC: Learning to Localize in Complex Large-Scale Environments with Ultra-Wideband Ranges | Thien-Minh Nguyen et.al. | 2409.11122 | link |
| 2024-09-17 | Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB | Alessandro Simoni et.al. | 2409.11104 | null |
| 2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925 | null |
| 2024-09-17 | Pose estimation of CubeSats via sensor fusion and Error-State Extended Kalman Filter | Deep Parikh et.al. | 2409.10815 | null |
| 2024-09-16 | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | Jingpei Lu et.al. | 2409.10441 | null |
| 2024-09-16 | HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models | Vineet Bhat et.al. | 2409.10419 | null |
| 2024-09-16 | 2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation? | Téo Guichoux et.al. | 2409.10357 | null |
| 2024-09-16 | Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference | Huy-Dung Nguyen et.al. | 2409.10095 | null |
| 2024-09-15 | Precise Pick-and-Place using Score-Based Diffusion Networks | Shih-Wei Guo et.al. | 2409.09725 | null |
| 2024-09-15 | Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild | Nie Lin et.al. | 2409.09714 | null |
| 2024-09-15 | Proximity operations of CubeSats via sensor fusion of ultra-wideband range measurements with rate gyroscopes, accelerometers and monocular vision | Deep Parikh et.al. | 2409.09665 | null |
| 2024-09-15 | A Scalable Tabletop Satellite Automation Testbed:Design And Experiments | Deep Parikh et.al. | 2409.09633 | null |
| 2024-09-14 | MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry | Yuheng Qiu et.al. | 2409.09479 | null |
| 2024-09-14 | Distributed Invariant Kalman Filter for Object-level Multi-robot Pose SLAM | Haoying Li et.al. | 2409.09410 | null |
| 2024-09-13 | Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry | Yunus Bilge Kurt et.al. | 2409.08769 | link |
| 2024-09-13 | WheelPoser: Sparse-IMU Based Body Pose Estimation for Wheelchair Users | Yunzhi Li et.al. | 2409.08494 | null |
| 2024-09-12 | Bayesian Inverse Graphics for Few-Shot Concept Learning | Octavio Arriaga et.al. | 2409.08351 | null |
| 2024-09-12 | Touch2Touch: Cross-Modal Tactile Generation for Object Manipulation | Samanta Rodriguez et.al. | 2409.08269 | null |
| 2024-09-12 | Covariance Intersection-based Invariant Kalman Filtering(DInCIKF) for Distributed Pose Estimation | Haoying Li et.al. | 2409.07933 | null |
| 2024-09-12 | GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions | Liang Feng et.al. | 2409.07798 | null |
| 2024-09-12 | GatedUniPose: A Novel Approach for Pose Estimation Combining UniRepLKNet and Gated Convolution | Liang Feng et.al. | 2409.07752 | null |
| 2024-09-11 | FaVoR: Features via Voxel Rendering for Camera Relocalization | Vincenzo Polizzi et.al. | 2409.07571 | null |
| 2024-09-11 | Benchmarking 2D Egocentric Hand Pose Datasets | Olga Taran et.al. | 2409.07337 | null |
| 2024-09-11 | iKalibr-RGBD: Partially-Specialized Target-Free Visual-Inertial Spatiotemporal Calibration For RGBDs via Continuous-Time Velocity Estimation | Shuolong Chen et.al. | 2409.07116 | link |
| 2024-09-11 | Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry | Anbo Tao et.al. | 2409.06948 | null |
| 2024-09-13 | A Bayesian framework for active object recognition, pose estimation and shape transfer learning through touch | Haodong Zheng et.al. | 2409.06912 | null |
| 2024-09-11 | Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences | Shishir Reddy Vutukur et.al. | 2409.06683 | null |
| 2024-09-10 | PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation | Ginger Delmas et.al. | 2409.06535 | null |
| 2024-09-10 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | Mohsi Jawaid et.al. | 2409.06240 | null |
| 2024-09-09 | From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models | Tessa Pulli et.al. | 2409.05413 | null |
| 2024-09-08 | HelmetPoser: A Helmet-Mounted IMU Dataset for Data-Driven Estimation of Human Head Motion in Diverse Conditions | Jianping Li et.al. | 2409.05006 | null |
| 2024-09-06 | Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands | Yotam Erel et.al. | 2409.04397 | null |
| 2024-09-06 | GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers | Lorenza Prospero et.al. | 2409.04196 | null |
| 2024-09-06 | Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics | Woojin Cho et.al. | 2409.04033 | null |
| 2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998 | null |
| 2024-09-09 | The Influence of Faulty Labels in Data Sets on Human Pose Estimation | Arnold Schwarz et.al. | 2409.03887 | null |
| 2024-09-05 | MaskVal: Simple but Effective Uncertainty Quantification for 6D Pose Estimation | Philipp Quentin et.al. | 2409.03556 | null |
| 2024-09-05 | UAV (Unmanned Aerial Vehicles): Diverse Applications of UAV Datasets in Segmentation, Classification, Detection, and Tracking | Md. Mahfuzur Rahman et.al. | 2409.03245 | null |
| 2024-09-01 | Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach | Wenjun Huang et.al. | 2409.02715 | null |
| 2024-09-04 | Object Gaussian for Monocular 6D Pose Estimation from Sparse Views | Luqing Luo et.al. | 2409.02581 | null |
| 2024-09-03 | EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision | Yiming Zhao et.al. | 2409.02224 | null |
| 2024-09-03 | Deep learning for objective estimation of Parkinsonian tremor severity | Felipe Duque-Quiceno et.al. | 2409.02011 | null |
| 2024-09-03 | SPiKE: 3D Human Pose from Point Cloud Sequences | Irene Ballester et.al. | 2409.01879 | link |
| 2024-09-02 | Kalman Filtering for Precise Indoor Position and Orientation Estimation Using IMU and Acoustics on Riemannian Manifolds | Mohammed H. AlSharif et.al. | 2409.01002 | null |
| 2024-09-01 | Detection, Recognition and Pose Estimation of Tabletop Objects | Sanjuksha Nirgude et.al. | 2409.00869 | null |
| 2024-09-01 | DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation | Huixin Zhang et.al. | 2409.00744 | link |
| 2024-09-01 | MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds | Ziqiang Dang et.al. | 2409.00736 | null |
| 2024-08-31 | ActionPose: Pretraining 3D Human Pose Estimation with the Dark Knowledge of Action | Longyun Liao et.al. | 2409.00449 | null |
| 2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373 | null |
| 2024-08-30 | BOP-D: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities | Boris Meden et.al. | 2408.17297 | null |
| 2024-08-30 | EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs | Zhen Fan et.al. | 2408.17168 | null |
| 2024-09-01 | Generic Objects as Pose Probes for Few-Shot View Synthesis | Zhirui Gao et.al. | 2408.16690 | null |
| 2024-08-29 | OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation | Yuchen Che et.al. | 2408.16547 | link |
| 2024-08-29 | GRPose: Learning Graph Relations for Human Image Generation with Pose Priors | Xiangchen Yin et.al. | 2408.16540 | null |
| 2024-08-28 | Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators | Nikita Kister et.al. | 2408.16536 | null |
| 2024-08-28 | Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation | Laura Bragagnolo et.al. | 2408.15810 | link |
| 2024-08-30 | Addressing the challenges of loop detection in agricultural environments | Nicolás Soncini et.al. | 2408.15761 | link |
| 2024-08-28 | Str-L Pose: Integrating Point and Structured Line for Relative Pose Estimation in Dual-Graph | Zherong Zhang et.al. | 2408.15750 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-10-07 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
| 2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419 | null |
| 2024-10-02 | Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension | Zaiquan Yang et.al. | 2410.01544 | null |
| 2024-10-03 | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections | Francesc Net et.al. | 2410.01536 | link |
| 2024-10-04 | CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment | Safouane El Ghazouali et.al. | 2410.01411 | link |
| 2024-09-30 | Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation | Aleyna Kütük et.al. | 2410.00266 | null |
| 2024-09-28 | VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition | Ahmad Khaliq et.al. | 2409.19293 | link |
| 2024-09-27 | MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion | Bardienus Duisterhof et.al. | 2409.19152 | null |
| 2024-09-26 | Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval | Mankeerat Sidhu et.al. | 2409.18733 | null |
| 2024-09-26 | Revisit Anything: Visual Place Recognition via Image Segment Retrieval | Kartik Garg et.al. | 2409.18049 | link |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-23 | CamLoPA: A Hidden Wireless Camera Localization Framework via Signal Propagation Path Analysis | Xiang Zhang et.al. | 2409.15169 | null |
| 2024-09-21 | Combining Absolute and Semi-Generalized Relative Poses for Visual Localization | Vojtech Panek et.al. | 2409.14269 | null |
| 2024-09-21 | SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality | Hongjia Zhai et.al. | 2409.14067 | null |
| 2024-09-20 | Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval | Morris Florek et.al. | 2409.13513 | link |
| 2024-09-18 | Towards Global Localization using Multi-Modal Object-Instance Re-Identification | Aneesh Chavan et.al. | 2409.12002 | link |
| 2024-09-17 | Obfuscation Based Privacy Preserving Representations are Recoverable Using Neighborhood Information | Kunal Chelani et.al. | 2409.11536 | null |
| 2024-09-17 | Improving the Efficiency of Visually Augmented Language Models | Paula Ontalvilla et.al. | 2409.11148 | null |
| 2024-09-21 | HGSLoc: 3DGS-based Heuristic Camera Pose Refinement | Zhongyan Niu et.al. | 2409.10925 | null |
| 2024-09-16 | SOLVR: Submap Oriented LiDAR-Visual Re-Localisation | Joshua Knights et.al. | 2409.10247 | null |
| 2024-09-16 | Garment Attribute Manipulation with Multi-level Attention | Vittorio Casula et.al. | 2409.10206 | null |
| 2024-09-14 | Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval | Amirreza Mahbod et.al. | 2409.09430 | link |
| 2024-09-12 | Structured Pruning for Efficient Visual Place Recognition | Oliver Grainge et.al. | 2409.07834 | null |
| 2024-09-10 | GeoCalib: Learning Single-image Calibration with Geometric Optimization | Alexander Veicht et.al. | 2409.06704 | link |
| 2024-09-10 | Weakly-supervised Camera Localization by Ground-to-satellite Image Registration | Yujiao Shi et.al. | 2409.06471 | link |
| 2024-09-10 | A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions | Zhicong Wu et.al. | 2409.06381 | null |
| 2024-09-09 | Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding | Bram Willemsen et.al. | 2409.05721 | link |
| 2024-09-09 | Open-World Dynamic Prompt and Continual Visual Representation Learning | Youngeun Kim et.al. | 2409.05312 | null |
| 2024-09-12 | Training-free ZS-CIR via Weighted Modality Fusion and Similarity | Ren-Di Wu et.al. | 2409.04918 | null |
| 2024-09-12 | Zero-Shot Whole Slide Image Retrieval in Histopathology Using Embeddings of Foundation Models | Saghir Alfasly et.al. | 2409.04631 | null |
| 2024-09-06 | Reprojection Errors as Prompts for Efficient Scene Coordinate Regression | Ting-Ru Liu et.al. | 2409.04178 | null |
| 2024-09-06 | Matched Filtering based LiDAR Place Recognition for Urban and Natural Environments | Therese Joseph et.al. | 2409.03998 | null |
| 2024-09-04 | Design and Evaluation of Camera-Centric Mobile Crowdsourcing Applications | Abby Stylianou et.al. | 2409.03012 | null |
| 2024-09-04 | NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval | Sepanta Zeighami et.al. | 2409.02343 | link |
| 2024-09-03 | Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment | Konstantin Schall et.al. | 2409.01936 | link |
| 2024-09-02 | A Review of Image Retrieval Techniques: Data Augmentation and Adversarial Learning Approaches | Kim Jinwoo et.al. | 2409.01219 | null |
| 2024-09-02 | Evidential Transformers for Improved Image Retrieval | Danilo Dordevic et.al. | 2409.01082 | null |
| 2024-09-05 | EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System | Bonan Liu et.al. | 2409.00343 | null |
| 2024-09-04 | Augmented Reality without Borders: Achieving Precise Localization Without Maps | Albert Gassol Puigjaner et.al. | 2408.17373 | null |
| 2024-09-02 | RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance | Avideep Mukherjee et.al. | 2408.17095 | null |
| 2024-08-29 | A compact neuromorphic system for ultra energy-efficient, on-device robot localization | Adam D. Hines et.al. | 2408.16754 | link |
| 2024-08-29 | Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models | Kengo Nakata et.al. | 2408.16296 | null |
| 2024-08-28 | Temporal Attention for Cross-View Sequential Image Localization | Dong Yuan et.al. | 2408.15569 | null |
| 2024-08-27 | Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild | Tianqi Wei et.al. | 2408.14723 | null |
| 2024-08-25 | LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task | Ali Asgarov et.al. | 2408.13909 | link |
| 2024-08-21 | FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization | Son Tung Nguyen et.al. | 2408.12037 | link |
| 2024-08-21 | Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations | Lintong Zhang et.al. | 2408.11966 | null |
| 2024-08-21 | UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | Xiangyu Zhao et.al. | 2408.11305 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419 | null |
| 2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742 | null |
| 2024-10-16 | RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition | Asish Bera et.al. | 2410.12718 | null |
| 2024-10-01 | A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference | Yuan Li et.al. | 2410.11848 | null |
| 2024-10-11 | Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image | Marta Veganzones Rodriguez et.al. | 2410.09155 | null |
| 2024-10-08 | Unsupervised Model Diagnosis | Yinong Oliver Wang et.al. | 2410.06243 | null |
| 2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729 | link |
| 2024-10-16 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection | Hongru Yan et.al. | 2410.01404 | null |
| 2024-09-30 | OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Changsheng Lu et.al. | 2409.19899 | null |
| 2024-10-07 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-20 | Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators | Niloufar Amiri et.al. | 2409.13668 | null |
| 2024-09-25 | Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding | Rania Hossam et.al. | 2409.08695 | link |
| 2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
| 2024-10-01 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
| 2024-07-22 | RADA: Robust and Accurate Feature Learning with Domain Adaptation | Jingtai He et.al. | 2407.15791 | null |
| 2024-07-09 | LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition | Teng Wang et.al. | 2407.06730 | null |
| 2024-07-04 | PFGS: High Fidelity Point Cloud Rendering via Feature Splatting | Jiaxu Wang et.al. | 2407.03857 | link |
| 2024-07-03 | A Radiometric Correction based Optical Modeling Approach to Removing Reflection Noise in TLS Point Clouds of Urban Scenes | Li Fang et.al. | 2407.02830 | link |
| 2024-07-02 | Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning | Chengchao Shen et.al. | 2407.02014 | link |
2024-10
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292 | null |
| 2024-12-07 | Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures | Muhammad Umar Farooq et.al. | 2412.05487 | null |
| 2024-12-04 | Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything | Yongkyu Lee et.al. | 2412.03472 | null |
| 2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422 | null |
| 2024-11-23 | OCDet: Object Center Detection via Bounding Box-Aware Heatmap Prediction on Edge Devices with NPUs | Chen Xin et.al. | 2411.15653 | link |
| 2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676 | null |
| 2024-11-04 | Silver medal Solution for Image Matching Challenge 2024 | Yian Wang et.al. | 2411.01851 | null |
| 2024-11-04 | KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension | Jie Yang et.al. | 2411.01846 | null |
| 2024-10-31 | From Web Data to Real Fields: Low-Cost Unsupervised Domain Adaptation for Agricultural Robots | Vasileios Tzouras et.al. | 2410.23906 | null |
| 2024-10-04 | Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation | Aman Anand et.al. | 2410.14700 | null |
| 2024-11-27 | Sim2real Cattle Joint Estimation in 3D point clouds | Mohammad Okour et.al. | 2410.14419 | null |
| 2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742 | null |
| 2024-10-16 | RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition | Asish Bera et.al. | 2410.12718 | null |
| 2024-10-01 | A Robust Multisource Remote Sensing Image Matching Method Utilizing Attention and Feature Enhancement Against Noise Interference | Yuan Li et.al. | 2410.11848 | null |
| 2024-10-11 | Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image | Marta Veganzones Rodriguez et.al. | 2410.09155 | null |
| 2024-10-08 | Unsupervised Model Diagnosis | Yinong Oliver Wang et.al. | 2410.06243 | null |
| 2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729 | link |
| 2024-10-16 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection | Hongru Yan et.al. | 2410.01404 | null |
| 2024-09-30 | OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Changsheng Lu et.al. | 2409.19899 | null |
| 2024-10-07 | SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation | Xin Li et.al. | 2409.18082 | null |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-20 | Keypoint Detection Technique for Image-Based Visual Servoing of Manipulators | Niloufar Amiri et.al. | 2409.13668 | null |
| 2024-09-25 | Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding | Rania Hossam et.al. | 2409.08695 | link |
| 2024-09-06 | D4: Text-guided diffusion model-based domain adaptive data augmentation for vineyard shoot detection | Kentaro Hirahara et.al. | 2409.04060 | null |
| 2024-10-01 | Towards Practical Human Motion Prediction with LiDAR Point Clouds | Xiao Han et.al. | 2408.08202 | null |
| 2024-07-31 | Certifying Robustness of Learning-Based Keypoint Detection and Pose Estimation Methods | Xusheng Luo et.al. | 2408.00117 | null |
| 2024-07-26 | SHIC: Shape-Image Correspondences with no Keypoint Supervision | Aleksandar Shtedritski et.al. | 2407.18907 | null |
| 2024-07-25 | LION: Linear Group RNN for 3D Object Detection in Point Clouds | Zhe Liu et.al. | 2407.18232 | link |
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-11-06 | GS2Pose: Tow-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807 | null |
| 2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724 | null |
| 2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561 | null |
| 2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086 | null |
| 2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804 | null |
| 2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443 | link |
| 2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543 | null |
| 2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196 | null |
| 2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207 | link |
| 2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643 | null |
| 2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715 | null |
| 2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213 | null |
| 2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128 | link |
| 2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079 | null |
| 2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743 | null |
| 2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153 | null |
| 2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731 | link |
| 2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358 | null |
| 2024-10-27 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions | Rawal Khirodkar et.al. | 2410.20294 | null |
| 2024-10-26 | Neural Fields in Robotics: A Survey | Muhammad Zubair Irshad et.al. | 2410.20220 | null |
| 2024-10-25 | DECADE: Towards Designing Efficient-yet-Accurate Distance Estimation Modules for Collision Avoidance in Mobile Advanced Driver Assistance Systems | Muhammad Zaeem Shahzad et.al. | 2410.19336 | null |
| 2024-10-24 | Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Junyi Chen et.al. | 2410.18962 | null |
| 2024-10-24 | VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | Daniel Bermuth et.al. | 2410.18723 | null |
| 2024-10-23 | Robust Two-View Geometry Estimation with Implicit Differentiation | Vladislav Pyatov et.al. | 2410.17983 | link |
| 2024-10-23 | YOLOv11: An Overview of the Key Architectural Enhancements | Rahima Khanam et.al. | 2410.17725 | null |
| 2024-10-21 | Assisted Physical Interaction: Autonomous Aerial Robots with Neural Network Detection, Navigation, and Safety Layers | Andrea Berra et.al. | 2410.15802 | null |
| 2024-10-21 | ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | Tao Tang et.al. | 2410.15582 | link |
| 2024-10-20 | Neural Active Structure-from-Motion in Dark and Textureless Environment | Kazuto Ichimaru et.al. | 2410.15378 | null |
| 2024-10-20 | POSE: Pose estimation Of virtual Sync Exhibit system | Hao-Tang Tsui et.al. | 2410.15343 | link |
| 2024-10-18 | Graph Optimality-Aware Stochastic LiDAR Bundle Adjustment with Progressive Spatial Smoothing | Jianping Li et.al. | 2410.14565 | null |
| 2024-10-18 | Multi-modal Pose Diffuser: A Multimodal Generative Conditional Pose Prior | Calvin-Khang Ta et.al. | 2410.14540 | null |
| 2024-10-18 | Sim2real Cattle Joint Estimation in 3D point clouds | Okour Mohammad et.al. | 2410.14419 | null |
| 2024-10-18 | Unlabeled Action Quality Assessment Based on Multi-dimensional Adaptive Constrained Dynamic Time Warping | Renguang Chen et.al. | 2410.14161 | null |
| 2024-10-15 | From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images | unyang Wu et.al. | 2410.13896 | null |
| 2024-10-17 | DualQuat-LOAM: LiDAR Odometry and Mapping parametrized on Dual Quaternions | Edison P. Velasco-Sánchez et.al. | 2410.13541 | null |
| 2024-10-17 | Object Pose Estimation Using Implicit Representation For Transparent Objects | Varun Burde et.al. | 2410.13465 | null |
| 2024-10-16 | Optimizing Multi-Task Learning for Accurate Spacecraft Pose Estimation | Francesco Evangelisti et.al. | 2410.12679 | null |
| 2024-10-15 | Contrastive Touch-to-Touch Pretraining | Samanta Rodriguez et.al. | 2410.11834 | null |
| 2024-10-18 | X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing | Xinyan Chen et.al. | 2410.10167 | null |
| 2024-10-13 | Occluded Human Pose Estimation based on Limb Joint Augmentation | Gangtao Han et.al. | 2410.09885 | null |
| 2024-10-12 | Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors | Hritam Basak et.al. | 2410.09467 | null |
| 2024-10-12 | Towards Multi-Modal Animal Pose Estimation: An In-Depth Analysis | Qianyi Deng et.al. | 2410.09312 | link |
| 2024-10-11 | CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation | Jianyu Zhao et.al. | 2410.09010 | link |
| 2024-10-11 | Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization | Christian Schmidt et.al. | 2410.08743 | link |
| 2024-10-10 | Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation | Felix Petersen et.al. | 2410.08125 | null |
| 2024-10-10 | Robotic framework for autonomous manipulation of laboratory equipment with different degrees of transparency via 6D pose estimation | Maria Makarova et.al. | 2410.07801 | null |
| 2024-10-10 | Optimal-State Dynamics Estimation for Physics-based Human Motion Capture from Videos | Cuong Le et.al. | 2410.07795 | link |
| 2024-10-12 | Autonomous Driving in Unstructured Environments: How Far Have We Come? | Chen Min et.al. | 2410.07701 | null |
| 2024-10-10 | Invisibility Cloak: Disappearance under Human Pose Estimation via Backdoor Attacks | Minxing Zhang et.al. | 2410.07670 | null |
| 2024-10-09 | OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB | Yunzhi Lin et.al. | 2410.06694 | null |
| 2024-10-08 | SpecTrack: Learned Multi-Rotation Tracking via Speckle Imaging | Ziyang Chen et.al. | 2410.06028 | null |
| 2024-10-08 | AIVIO: Closed-loop, Object-relative Navigation of UAVs with AI-aided Visual Inertial Odometry | Thomas Jantos et.al. | 2410.05996 | null |
| 2024-10-08 | Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation? | Charalambos Tzamos et.al. | 2410.05984 | link |
| 2024-10-08 | FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance | Ruocheng Wang et.al. | 2410.05791 | null |
| 2024-10-07 | Comparison of marker-less 2D image-based methods for infant pose estimation | Lennart Jahn et.al. | 2410.04980 | null |
| 2024-10-06 | Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion | Mehwish Ghafoor et.al. | 2410.04574 | link |
| 2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419 | null |
| 2024-10-05 | Test-Time Adaptation for Keypoint-Based Spacecraft Pose Estimation Based on Predicted-View Synthesis | Juan Ignacio Bravo Pérez-Villar et.al. | 2410.04298 | link |
| 2024-10-05 | A Framework for Reproducible Benchmarking and Performance Diagnosis of SLAM Systems | Nikola Radulov et.al. | 2410.04242 | link |
| 2024-10-04 | Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos | Ziyu Wang et.al. | 2410.03858 | null |
| 2024-10-04 | Universal Global State Estimation for Inertial Navigation Systems | Sifeddine Benahmed et.al. | 2410.03846 | null |
| 2024-10-04 | MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion | Junyi Zhang et.al. | 2410.03825 | null |
| 2024-10-04 | Dessie: Disentanglement for Articulated 3D Horse Shape and Pose Estimation from Images | Ci Li et.al. | 2410.03438 | null |
| 2024-10-04 | HRVMamba: High-Resolution Visual State Space Model for Dense Prediction | Hao Zhang et.al. | 2410.03174 | null |
| 2024-10-04 | CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization | Shigemichi Matsuzaki et.al. | 2410.03054 | null |
| 2024-10-03 | Why Sample Space Matters: Keyframe Sampling Optimization for LiDAR-based Place Recognition | Nikolaos Stathoulopoulos et.al. | 2410.02643 | null |
| 2024-10-03 | Key-Grid: Unsupervised 3D Keypoints Detection using Grid Heatmap Features | Chengkai Hou et.al. | 2410.02237 | null |
| 2024-10-02 | SGBA: Semantic Gaussian Mixture Model-Based LiDAR Bundle Adjustment | Xingyu Ji et.al. | 2410.01618 | null |
| 2024-10-02 | SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network | Ahmed Tawfik Aboukhadra et.al. | 2410.01293 | null |
| 2024-10-01 | Pose Estimation of Buried Deep-Sea Objects using 3D Vision Deep Learning Models | Jerry Yan et.al. | 2410.01061 | null |
| 2024-10-01 | RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations | Kaichen Zhou et.al. | 2410.00713 | link |
| 2024-10-01 | GERA: Geometric Embedding for Efficient Point Registration Analysis | Geng Li et.al. | 2410.00589 | null |
| 2024-09-30 | Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations | Muhammad Saif Ullah Khan et.al. | 2409.20469 | null |
| 2024-09-30 | Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies | Shalini Sarode et.al. | 2409.20237 | null |
| 2024-09-30 | PuzzleBoard: A New Camera Calibration Pattern with Position Encoding | Peer Stelldinger et.al. | 2409.20127 | link |
| 2024-09-30 | Robust Gaussian Splatting SLAM by Leveraging Loop Closure | Zunjie Zhu et.al. | 2409.20111 | null |
| 2024-09-30 | GearTrack: Automating 6D Pose Estimation | Yu Deng et.al. | 2409.19986 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-11-05 | From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Xintian Sun et.al. | 2411.05826 | null |
| 2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545 | null |
| 2024-11-11 | INQUIRE: A Natural World Text-to-Image Retrieval Benchmark | Edward Vendrow et.al. | 2411.02537 | link |
| 2024-11-04 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
| 2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804 | null |
| 2024-11-03 | Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification | MD Shaikh Rahman et.al. | 2411.01473 | null |
| 2024-11-01 | Identifying Implicit Social Biases in Vision-Language Models | Kimia Hamidieh et.al. | 2411.00997 | null |
| 2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
| 2024-10-31 | MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval | Haiwen Li et.al. | 2410.23736 | null |
| 2024-10-30 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks | Tassilo Wald et.al. | 2410.23107 | null |
| 2024-10-29 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | Monica Riedler et.al. | 2410.21943 | link |
| 2024-10-28 | NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments | Taiyi Pan et.al. | 2410.21615 | null |
| 2024-10-25 | Context-Based Visual-Language Place Recognition | Soojin Woo et.al. | 2410.19341 | link |
| 2024-10-24 | ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval | Zijia Zhao et.al. | 2410.18715 | link |
| 2024-10-25 | On Model-Free Re-ranking for Visual Place Recognition with Deep Learned Local Features | Tomáš Pivoňka et.al. | 2410.18573 | null |
| 2024-10-22 | Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2410.17393 | null |
| 2024-10-20 | GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning | Haiwen Diao et.al. | 2410.15266 | link |
| 2024-10-19 | Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway’s Digitised Book Collection | Marie Roald et.al. | 2410.14969 | link |
| 2024-10-16 | Development of Image Collection Method Using YOLO and Siamese Network | Chan Young Shin et.al. | 2410.12561 | null |
| 2024-10-16 | LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment | Juelin Zhu et.al. | 2410.12269 | link |
| 2024-10-16 | Leveraging Spatial Attention and Edge Context for Optimized Feature Selection in Visual Localization | Nanda Febri Istighfarin et.al. | 2410.12240 | null |
| 2024-10-15 | LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images | Yuzhou Cheng et.al. | 2410.11505 | null |
| 2024-10-15 | Multiview Scene Graph | Juexiao Zhang et.al. | 2410.11187 | null |
| 2024-10-12 | Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence | Felipe Cadar et.al. | 2410.09533 | link |
| 2024-10-16 | Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP | Eunji Kim et.al. | 2410.08469 | null |
| 2024-10-11 | A Unified Deep Semantic Expansion Framework for Domain-Generalized Person Re-identification | Eugene P. W. Ang et.al. | 2410.08456 | null |
| 2024-10-10 | A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks | Hoin Jung et.al. | 2410.07593 | null |
| 2024-10-09 | Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval | Mohammad Omama et.al. | 2410.07022 | null |
| 2024-10-09 | Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers | Stephen Hausler et.al. | 2410.06614 | null |
| 2024-10-09 | MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging | Noel C. F. Codella et.al. | 2410.06542 | null |
| 2024-10-08 | Temporal Image Caption Retrieval Competition – Description and Results | Jakub Pokrywka et.al. | 2410.06314 | null |
| 2024-10-08 | Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching | Gongxin Yao et.al. | 2410.06285 | null |
| 2024-10-08 | GSLoc: Visual Localization with 3D Gaussian Splatting | Kazii Botashev et.al. | 2410.06165 | null |
| 2024-10-08 | Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Ayush Singh et.al. | 2410.05928 | null |
| 2024-10-08 | RNR-Nav: A Real-World Visual Navigation System Using Renderable Neural Radiance Maps | Minsoo Kim et.al. | 2410.05621 | null |
| 2024-10-11 | LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | Wei Wu et.al. | 2410.05249 | null |
| 2024-10-06 | LiteVLoc: Map-Lite Visual Localization for Image Goal Navigation | Jianhao Jiao et.al. | 2410.04419 | null |
| 2024-10-02 | Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension | Zaiquan Yang et.al. | 2410.01544 | null |
| 2024-10-03 | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections | Francesc Net et.al. | 2410.01536 | link |
| 2024-10-04 | CSIM: A Copula-based similarity index sensitive to local changes for Image quality assessment | Safouane El Ghazouali et.al. | 2410.01411 | link |
| 2024-09-30 | Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation | Aleyna Kütük et.al. | 2410.00266 | null |
| 2024-09-28 | VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition | Ahmad Khaliq et.al. | 2409.19293 | link |
| 2024-09-27 | MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion | Bardienus Duisterhof et.al. | 2409.19152 | null |
| 2024-09-26 | Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval | Mankeerat Sidhu et.al. | 2409.18733 | null |
| 2024-09-26 | Revisit Anything: Visual Place Recognition via Image Segment Retrieval | Kartik Garg et.al. | 2409.18049 | link |
| 2024-09-24 | GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization | Gennady Sidorov et.al. | 2409.16502 | link |
| 2024-09-23 | CamLoPA: A Hidden Wireless Camera Localization Framework via Signal Propagation Path Analysis | Xiang Zhang et.al. | 2409.15169 | null |
2024-11
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492 | null |
| 2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458 | null |
| 2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289 | null |
| 2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167 | null |
| 2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162 | null |
| 2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033 | null |
| 2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944 | null |
| 2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673 | null |
| 2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377 | null |
| 2024-11-26 | Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Generative Latent Priors | Ziang Xu et.al. | 2411.17790 | null |
| 2024-11-26 | Geometric Point Attention Transformer for 3D Shape Reassembly | Jiahan Li et.al. | 2411.17788 | null |
| 2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662 | null |
| 2024-11-26 | Communication-Efficient Cooperative SLAMMOT via Determining the Number of Collaboration Vehicles | Susu Fang et.al. | 2411.17432 | null |
| 2024-11-26 | Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration | Junyuan Deng et.al. | 2411.17240 | link |
| 2024-11-27 | SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting | Gyeongjin Kang et.al. | 2411.17190 | null |
| 2024-11-26 | GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation | Xin Liu et.al. | 2411.17174 | null |
| 2024-11-25 | Diffusion Features for Zero-Shot 6DoF Object Pose Estimation | Bernd Von Gimborn et.al. | 2411.16668 | null |
| 2024-11-25 | Edge Weight Prediction For Category-Agnostic Pose Estimation | Or Hirschorn et.al. | 2411.16665 | link |
| 2024-11-25 | SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis | Hyojun Go et.al. | 2411.16443 | link |
| 2024-11-25 | One Diffusion to Generate Them All | Duong H. Le et.al. | 2411.16318 | link |
| 2024-11-25 | UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image | Xingyu Liu et.al. | 2411.16106 | null |
| 2024-11-24 | Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching | Yujing Sun et.al. | 2411.15860 | link |
| 2024-11-24 | PEnG: Pose-Enhanced Geo-Localisation | Tavis Shore et.al. | 2411.15742 | null |
| 2024-11-22 | Personalization of Wearable Sensor-Based Joint Kinematic Estimation Using Computer Vision for Hip Exoskeleton Applications | Changseob Song et.al. | 2411.15366 | null |
| 2024-11-22 | mmWave Radar for Sit-to-Stand Analysis: A Comparative Study with Wearables and Kinect | Shuting Hu et.al. | 2411.14656 | null |
| 2024-11-21 | DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding | Tianhe Ren et.al. | 2411.14347 | link |
| 2024-11-21 | SEMPose: A Single End-to-end Network for Multi-object Pose Estimation | Xin Liu et.al. | 2411.14002 | null |
| 2024-11-21 | Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain | Vidya Sudevan et.al. | 2411.13988 | null |
| 2024-11-21 | Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework | Vidya Sudevan et.al. | 2411.13962 | null |
| 2024-11-20 | Developing Normative Gait Cycle Parameters for Clinical Analysis Using Human Pose Estimation | Rahm Ranjan et.al. | 2411.13716 | null |
| 2024-11-20 | Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction | Yi Gu et.al. | 2411.13620 | null |
| 2024-11-19 | VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference | Seong Jong Yoo et.al. | 2411.13607 | link |
| 2024-11-20 | DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild | Weicai Ye et.al. | 2411.13291 | null |
| 2024-11-20 | X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation | Yuchen Yang et.al. | 2411.13026 | link |
| 2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676 | null |
| 2024-11-15 | SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction | Yutao Tang et.al. | 2411.12592 | link |
| 2024-11-19 | GLOVER: Generalizable Open-Vocabulary Affordance Reasoning for Task-Oriented Grasping | Teli Ma et.al. | 2411.12286 | null |
| 2024-11-18 | IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Yunong Liu et.al. | 2411.11409 | link |
| 2024-11-15 | USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting | Kang Chen et.al. | 2411.10504 | link |
| 2024-11-13 | ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening | Hojun Jang et.al. | 2411.09435 | null |
| 2024-11-13 | Generalized Pose Space Embeddings for Training In-the-Wild using Anaylis-by-Synthesis | Dominik Borer et.al. | 2411.08603 | null |
| 2024-11-13 | DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization | Yueming Xu et.al. | 2411.08373 | null |
| 2024-11-16 | RINO: Accurate, Robust Radar-Inertial Odometry with Non-Iterative Estimation | Shuocheng Yang et.al. | 2411.07699 | link |
| 2024-11-12 | Human Arm Pose Estimation with a Shoulder-worn Force-Myography Device for Human-Robot Interaction | Rotem Atari et.al. | 2411.07644 | null |
| 2024-11-12 | Towards Seamless Integration of Magnetic Tracking into Fluoroscopy-guided Interventions | Shuwei Xing et.al. | 2411.07495 | null |
| 2024-11-08 | Acoustic-based 3D Human Pose Estimation Robust to Human Position | Yusuke Oumi et.al. | 2411.07165 | null |
| 2024-11-11 | CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Junho Kim et.al. | 2411.06869 | null |
| 2024-11-11 | GenZ-ICP: Generalizable and Degeneracy-Robust LiDAR Odometry Using an Adaptive Weighting | Daehan Lee et.al. | 2411.06766 | null |
| 2024-11-11 | GTA-Net: An IoT-Integrated 3D Human Pose Estimation System for Real-Time Adolescent Sports Posture Correction | Shizhe Yuan et.al. | 2411.06725 | null |
| 2024-11-10 | Magnetic Field Aided Vehicle Localization with Acceleration Correction | Mrunmayee Deshpande et.al. | 2411.06543 | null |
| 2024-11-10 | Visuotactile-Based Learning for Insertion with Compliant Hands | Osher Azulay et.al. | 2411.06408 | null |
| 2024-11-08 | Poze: Sports Technique Feedback under Data Constraints | Agamdeep Singh et.al. | 2411.05734 | null |
| 2024-11-08 | DeepArUco++: Improved detection of square fiducial markers in challenging lighting conditions | Rafael Berral-Soler et.al. | 2411.05552 | link |
| 2024-11-08 | Tightly-Coupled, Speed-aided Monocular Visual-Inertial Localization in Topological Map | Chanuk Yang et.al. | 2411.05497 | null |
| 2024-11-08 | Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements | Kunrui Ze et.al. | 2411.05481 | null |
| 2024-11-07 | Social EgoMesh Estimation | Luca Scofano et.al. | 2411.04598 | link |
| 2024-11-07 | Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player’s Trajectory | Ali K. AlShami et.al. | 2411.04501 | null |
| 2024-11-07 | SuperQ-GRASP: Superquadrics-based Grasp Pose Estimation on Larger Objects for Mobile-Manipulation | Xun Tu et.al. | 2411.04386 | null |
| 2024-11-08 | GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting | Jilan Mei et.al. | 2411.03807 | null |
| 2024-11-06 | Estimation of Psychosocial Work Environment Exposures Through Video Object Detection. Proof of Concept Using CCTV Footage | Claus D. Hansen et.al. | 2411.03724 | null |
| 2024-11-05 | Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data | Seunggeun Chi et.al. | 2411.03561 | null |
| 2024-11-05 | HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features | Arnab Dey et.al. | 2411.03086 | null |
| 2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804 | null |
| 2024-11-03 | Activating Self-Attention for Multi-Scene Absolute Pose Regression | Miso Lee et.al. | 2411.01443 | link |
| 2024-11-04 | 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction | Jongmin Lee et.al. | 2411.00543 | null |
| 2024-10-31 | Whole-Herd Elephant Pose Estimation from Drone Data for Collective Behavior Analysis | Brody McNutt et.al. | 2411.00196 | null |
| 2024-10-31 | No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images | Botao Ye et.al. | 2410.24207 | link |
| 2024-11-06 | SceneComplete: Open-World 3D Scene Completion in Complex Real World Environments for Robot Manipulation | Aditya Agarwal et.al. | 2410.23643 | null |
| 2024-10-30 | SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark | HyunJun Jung et.al. | 2410.22715 | null |
| 2024-10-29 | LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues | Hanqing Jiang et.al. | 2410.22213 | null |
| 2024-10-29 | PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting | Sunghwan Hong et.al. | 2410.22128 | link |
| 2024-10-29 | HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation | Zhoujie Xu et.al. | 2410.22079 | null |
| 2024-10-29 | EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data | Zhonghua Yi et.al. | 2410.21743 | null |
| 2024-10-28 | Synthetica: Large Scale Synthetic Data for Robot Perception | Ritvik Singh et.al. | 2410.21153 | null |
| 2024-10-29 | BLAPose: Enhancing 3D Human Pose Estimation with Bone Length Adjustment | Chih-Hsiang Hsu et.al. | 2410.20731 | link |
| 2024-11-01 | RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior | Mingjiang Liang et.al. | 2410.20358 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-12-06 | DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification | Ying Jin et.al. | 2412.04828 | null |
| 2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512 | null |
| 2024-12-04 | Composed Image Retrieval for Training-Free Domain Conversion | Nikos Efthymiadis et.al. | 2412.03297 | link |
| 2024-12-03 | A Minimalistic 3D Self-Organized UAV Flocking Approach for Desert Exploration | Thulio Amorim et.al. | 2412.02881 | null |
| 2024-12-03 | Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval | Leah Bar et.al. | 2412.02310 | link |
| 2024-12-02 | Mutli-View 3D Reconstruction using Knowledge Distillation | Aditya Dutt et.al. | 2412.02039 | link |
| 2024-12-02 | Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features | MD Shaikh Rahman et.al. | 2412.01555 | null |
| 2024-12-02 | Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models | Yi Liao et.al. | 2412.01202 | null |
| 2024-12-01 | EDTformer: An Efficient Decoder Transformer for Visual Place Recognition | Tong Jin et.al. | 2412.00784 | null |
| 2024-11-28 | EFSA: Episodic Few-Shot Adaptation for Text-to-Image Retrieval | Muhammad Huzaifa et.al. | 2412.00139 | null |
| 2024-11-28 | Unleashing the Power of Data Synthesis in Visual Localization | Sihang Li et.al. | 2412.00138 | null |
| 2024-11-28 | Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval | Yang Liu et.al. | 2412.00120 | null |
| 2024-11-29 | A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications | Liqiang Zhang Ye Tian Dongyan Wei et.al. | 2411.19845 | null |
| 2024-11-27 | Optimizing Image Retrieval with an Extended b-Metric Space | Abdelkader Belhenniche et.al. | 2411.18800 | null |
| 2024-11-26 | Learning Visual Hierarchies with Hyperbolic Embeddings | Ziwei Wang et.al. | 2411.17490 | null |
| 2024-12-02 | Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy | You Li et.al. | 2411.16752 | null |
| 2024-12-02 | AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks | You Li et.al. | 2411.16749 | null |
| 2024-11-25 | Image Generation Diversity Issues and How to Tame Them | Mischa Dombrowski et.al. | 2411.16171 | link |
| 2024-11-24 | PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments | Haoang Li et.al. | 2411.15800 | null |
| 2024-11-22 | Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval | Zengbao Sun et.al. | 2411.14704 | null |
| 2024-11-20 | Globally Correlation-Aware Hard Negative Generation | Wenjie Peng et.al. | 2411.13145 | link |
| 2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | Antonios Gasteratos et.al. | 2411.11481 | null |
| 2024-11-13 | OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Geometric and Semantic Guidances | Youqi Liao et.al. | 2411.08665 | link |
| 2024-11-13 | Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval | Saul Santos et.al. | 2411.08590 | link |
| 2024-11-22 | Saliency Map-based Image Retrieval using Invariant Krawtchouk Moments | Ashkan Nejad et.al. | 2411.08567 | link |
| 2024-11-13 | MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation | Peng Wang et.al. | 2411.08279 | link |
| 2024-11-05 | From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing | Xintian Sun et.al. | 2411.05826 | null |
| 2024-11-04 | TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives | Maitreya Patel et.al. | 2411.02545 | null |
| 2024-11-11 | INQUIRE: A Natural World Text-to-Image Retrieval Benchmark | Edward Vendrow et.al. | 2411.02537 | link |
| 2024-11-20 | Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models | Sharat Agarwal et.al. | 2411.01925 | null |
| 2024-11-04 | Semantic Masking and Visual Feature Matching for Robust Localization | Luisa Mao et.al. | 2411.01804 | null |
| 2024-11-03 | Efficient Medical Image Retrieval Using DenseNet and FAISS for BIRADS Classification | MD Shaikh Rahman et.al. | 2411.01473 | null |
| 2024-11-01 | Identifying Implicit Social Biases in Vision-Language Models | Kimia Hamidieh et.al. | 2411.00997 | null |
| 2024-10-31 | Nearest Neighbor Normalization Improves Multimodal Retrieval | Neil Chowdhury et.al. | 2410.24114 | link |
| 2024-10-31 | MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval | Haiwen Li et.al. | 2410.23736 | null |
| 2024-10-30 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks | Tassilo Wald et.al. | 2410.23107 | null |
| 2024-10-29 | Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | Monica Riedler et.al. | 2410.21943 | link |
| 2024-10-28 | NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments | Taiyi Pan et.al. | 2410.21615 | null |
| 2024-10-25 | Context-Based Visual-Language Place Recognition | Soojin Woo et.al. | 2410.19341 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2024-12-24 | GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network | Xianfeng Song et.al. | 2412.18221 | link |
| 2024-12-21 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2412.16755 | null |
| 2024-12-19 | Corn Ear Detection and Orientation Estimation Using Deep Learning | Nathan Sprague et.al. | 2412.14954 | null |
| 2024-12-12 | Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models | Faith Johnson et.al. | 2412.09739 | null |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292 | null |
| 2024-12-07 | Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures | Muhammad Umar Farooq et.al. | 2412.05487 | null |
| 2024-12-04 | Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything | Yongkyu Lee et.al. | 2412.03472 | link |
| 2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422 | null |
| 2024-11-23 | OCDet: Object Center Detection via Bounding Box-Aware Heatmap Prediction on Edge Devices with NPUs | Chen Xin et.al. | 2411.15653 | link |
| 2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676 | null |
| 2024-11-04 | Silver medal Solution for Image Matching Challenge 2024 | Yian Wang et.al. | 2411.01851 | null |
| 2024-11-04 | KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension | Jie Yang et.al. | 2411.01846 | null |
| 2024-10-31 | From Web Data to Real Fields: Low-Cost Unsupervised Domain Adaptation for Agricultural Robots | Vasileios Tzouras et.al. | 2410.23906 | null |
| 2024-10-04 | Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation | Aman Anand et.al. | 2410.14700 | null |
| 2024-11-27 | Sim2real Cattle Joint Estimation in 3D point clouds | Mohammad Okour et.al. | 2410.14419 | null |
| 2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742 | null |
| 2024-10-16 | RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition | Asish Bera et.al. | 2410.12718 | null |
| 2024-10-11 | Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image | Marta Veganzones Rodriguez et.al. | 2410.09155 | null |
| 2024-10-08 | Unsupervised Model Diagnosis | Yinong Oliver Wang et.al. | 2410.06243 | null |
| 2024-10-08 | Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration | Xueyang Kang et.al. | 2410.05729 | link |
2024-12
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-01-03 | TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jiajie Liu et.al. | 2501.01770 | null |
| 2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752 | null |
| 2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409 | null |
| 2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174 | null |
| 2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657 | null |
| 2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510 | null |
| 2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976 | null |
| 2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830 | link |
| 2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803 | null |
| 2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767 | null |
| 2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733 | null |
| 2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538 | link |
| 2024-12-28 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | Shuo Wang et.al. | 2412.20082 | null |
| 2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056 | link |
| 2024-12-27 | Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation | Guangsheng Xu et.al. | 2412.19676 | link |
| 2024-12-27 | Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images | Xudong Cai et.al. | 2412.19518 | null |
| 2024-12-26 | Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos | Changwoon Choi et.al. | 2412.19089 | null |
| 2024-12-23 | Reconstructing People, Places, and Cameras | Lea Müller et.al. | 2412.17806 | null |
| 2024-12-22 | Leveraging Consistent Spatio-Temporal Correspondence for Robust Visual Odometry | Zhaoxing Zhang et.al. | 2412.16923 | null |
| 2024-12-21 | EasyVis2: A Real Time Multi-view 3D Visualization for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose | Yung-Hong Sun et.al. | 2412.16742 | null |
| 2024-12-21 | FACTS: Fine-Grained Action Classification for Tactical Sports | Christopher Lai et.al. | 2412.16454 | null |
| 2024-12-20 | Can Generative Video Models Help Pose Estimation? | Ruojin Cai et.al. | 2412.16155 | null |
| 2024-12-20 | Monkey Transfer Learning Can Improve Human Pose Estimation | Bradley Scott et.al. | 2412.15966 | null |
| 2024-12-19 | Scaling 4D Representations | João Carreira et.al. | 2412.15212 | null |
| 2024-12-13 | IMPROVE: Impact of Mobile Phones on Remote Online Virtual Education | Roberto Daza et.al. | 2412.14195 | link |
| 2024-12-18 | Level-Set Parameters: Novel Representation for 3D Shape Analysis | Huan Lei et.al. | 2412.13502 | null |
| 2024-12-18 | Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation | Xiaoqi An et.al. | 2412.13454 | null |
| 2024-12-17 | CondiMen: Conditional Multi-Person Mesh Recovery | Brégier Romain et.al. | 2412.13058 | null |
| 2024-12-17 | ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Wangyu Xue et.al. | 2412.12675 | null |
| 2024-12-16 | Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion | Adam Bethell et.al. | 2412.11420 | null |
| 2024-12-13 | ExeChecker: Where Did I Go Wrong? | Yiwen Gu et.al. | 2412.10573 | null |
| 2024-12-11 | CUPS: Improving Human Pose-Shape Estimators with Conformalized Deep Uncertainty | Harry Zhang et.al. | 2412.10431 | null |
| 2024-12-13 | RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting | Lizhi Bai et.al. | 2412.09868 | null |
| 2024-12-12 | Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos | Linyi Jin et.al. | 2412.09621 | null |
| 2024-12-12 | FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction | Jiale Xu et.al. | 2412.09573 | null |
| 2024-12-11 | BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation | Shengze Wang et.al. | 2412.08640 | null |
| 2024-12-12 | Drift-free Visual SLAM using Digital Twins | Roxane Merat et.al. | 2412.08496 | null |
| 2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376 | null |
| 2024-12-10 | LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models | Ziqi Lu et.al. | 2412.07746 | null |
| 2024-12-09 | MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds | Zhenggang Tang et.al. | 2412.06974 | null |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | Attention-Enhanced Lightweight Hourglass Network for Human Pose Estimation | Marsha Mariya Kappan et.al. | 2412.06227 | null |
| 2024-12-06 | CCS: Continuous Learning for Customized Incremental Wireless Sensing Services | Qunhang Fu et.al. | 2412.04821 | null |
| 2024-12-05 | ProPLIKS: Probablistic 3D human body pose estimation | Karthik Shetty et.al. | 2412.04665 | null |
| 2024-12-05 | DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction | Ben Kaye et.al. | 2412.04464 | null |
| 2024-12-05 | Targeted Hard Sample Synthesis Based on Estimated Pose and Occlusion Error for Improved Object Pose Estimation | Alan Li et.al. | 2412.04279 | null |
| 2024-12-04 | Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis | Qitao Zhao et.al. | 2412.03570 | null |
| 2024-12-06 | NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images | Lingen Li et.al. | 2412.03517 | null |
| 2024-12-05 | A Bidirectional Siamese Recurrent Neural Network for Accurate Gait Recognition Using Body Landmarks | Proma Hossain Progga et.al. | 2412.03498 | null |
| 2024-12-04 | MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras | Huai Yu et.al. | 2412.03146 | link |
| 2024-12-04 | An indoor DSO-based ceiling-vision odometry system for indoor industrial environments | Abdelhak Bougouffa et.al. | 2412.02950 | null |
| 2024-12-03 | EgoCast: Forecasting Egocentric Human Pose in the Wild | Maria Escobar et.al. | 2412.02903 | null |
| 2024-12-02 | emg2pose: A Large and Diverse Benchmark for Surface Electromyographic Hand Pose Estimation | Sasha Salter et.al. | 2412.02725 | null |
| 2024-12-03 | ProbPose: A Probabilistic Approach to 2D Human Pose Estimation | Miroslav Purkrabek et.al. | 2412.02254 | null |
| 2024-12-03 | Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images | Xiangyong Lu et.al. | 2412.02197 | link |
| 2024-12-03 | CLERF: Contrastive LEaRning for Full Range Head Pose Estimation | Ting-Ruen Wei et.al. | 2412.02066 | null |
| 2024-12-02 | Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle | Miroslav Purkrabek et.al. | 2412.01562 | link |
| 2024-12-02 | 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting | Yufeng Jin et.al. | 2412.01543 | null |
| 2024-12-02 | HandOS: 3D Hand Reconstruction in One Stage | Xingyu Chen et.al. | 2412.01537 | null |
| 2024-12-02 | SF-Loc: A Visual Mapping and Geo-Localization System based on Sparse Visual Structure Frames | Yuxuan Zhou et.al. | 2412.01500 | null |
| 2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422 | null |
| 2024-12-02 | Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures | Qiyuan Shen et.al. | 2412.01299 | null |
| 2024-12-02 | CRISP: Object Pose and Shape Estimation with Test-Time Adaptation | Jingnan Shi et.al. | 2412.01052 | null |
| 2024-11-29 | Diorama: Unleashing Zero-shot Single-view 3D Scene Modeling | Qirui Wu et.al. | 2411.19492 | null |
| 2024-11-29 | Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning | Yang You et.al. | 2411.19458 | null |
| 2024-11-28 | GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model | Rui Zhou et.al. | 2411.19289 | null |
| 2024-11-28 | HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos | Prithviraj Banerjee et.al. | 2411.19167 | null |
| 2024-11-28 | Lost & Found: Updating Dynamic 3D Scene Graphs from Egocentric Observations | Tjark Behrens et.al. | 2411.19162 | null |
| 2024-11-28 | Distributed Dual Quaternion Extended Kalman Filtering for Spacecraft Pose Estimation | Mathias Hudoba de Badyn et.al. | 2411.19033 | null |
| 2024-11-28 | Waterfall Transformer for Multi-person Pose Estimation | Navin Ranjan et.al. | 2411.18944 | null |
| 2024-12-02 | AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers | Sherwin Bahmani et.al. | 2411.18673 | null |
| 2024-11-27 | XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration | Denys Rozumnyi et.al. | 2411.18377 | null |
| 2024-11-26 | RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training | Raktim Gautam Goswami et.al. | 2411.17662 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887 | null |
| 2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
| 2025-01-12 | SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | Bhavin Jawade et.al. | 2501.08347 | null |
| 2025-01-12 | Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation | Zhenyang Feng et.al. | 2501.06749 | null |
| 2025-01-06 | Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI | Xujin Li et.al. | 2501.02841 | null |
| 2025-01-03 | iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings | Shuhei Tomoshige et.al. | 2501.01642 | null |
| 2025-01-02 | R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization | Xudong Jiang et.al. | 2501.01421 | null |
| 2025-01-02 | Training Medical Large Vision-Language Models with Abnormal-Aware Feedback | Yucheng Zhou et.al. | 2501.01377 | null |
| 2025-01-02 | Domain-invariant feature learning in brain MR imaging for content-based image retrieval | Shuya Tobari et.al. | 2501.01326 | null |
| 2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056 | link |
| 2024-12-25 | FOR: Finetuning for Object Level Open Vocabulary Image Retrieval | Hila Levi et.al. | 2412.18806 | null |
| 2024-12-24 | ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval | Le Dong et.al. | 2412.18136 | link |
| 2024-12-22 | Where am I? Cross-View Geo-localization with Natural Language Descriptions | Junyan Ye et.al. | 2412.17007 | null |
| 2024-12-24 | Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling | Daichi Yashima et.al. | 2412.16576 | link |
| 2024-12-20 | A New Method to Capturing Compositional Knowledge in Linguistic Space | Jiahe Wan et.al. | 2412.15632 | null |
| 2024-12-20 | Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation | Samantha J Alloo et.al. | 2412.15513 | null |
| 2024-12-19 | Learning Visual Composition through Improved Semantic Guidance | Austin Stone et.al. | 2412.15396 | null |
| 2024-12-19 | MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | Junjie Zhou et.al. | 2412.14475 | null |
| 2024-12-18 | Adversarial Hubness in Multi-Modal Retrieval | Tingwei Zhang et.al. | 2412.14113 | link |
| 2024-12-18 | Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval | Giacomo Pacini et.al. | 2412.13834 | null |
| 2024-12-18 | ConDo: Continual Domain Expansion for Absolute Pose Regression | Zijun Li et.al. | 2412.13452 | link |
| 2024-12-17 | Three Things to Know about Deep Metric Learning | Yash Patel et.al. | 2412.12432 | null |
| 2024-12-15 | Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval | Zelong Sun et.al. | 2412.11087 | null |
| 2024-12-20 | Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2412.11077 | null |
| 2024-12-13 | MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition | Qiwen Gu et.al. | 2412.09199 | null |
| 2024-12-12 | A Flexible Plug-and-Play Module for Generating Variable-Length | Liyang He et.al. | 2412.08922 | link |
| 2024-12-11 | Image Retrieval Methods in the Dissimilarity Space | Madhu Kiran et.al. | 2412.08618 | null |
| 2024-12-11 | Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization | Siyan Dong et.al. | 2412.08376 | null |
| 2024-12-11 | Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin | Benjamin D. Killeen et.al. | 2412.08020 | null |
| 2024-12-10 | On Motion Blur and Deblurring in Visual Place Recognition | Timur Ismagilov et.al. | 2412.07751 | null |
| 2024-12-10 | Image Retrieval with Intra-Sweep Representation Learning for Neck Ultrasound Scanning Guidance | Wanwen Chen et.al. | 2412.07741 | null |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | A Hyperdimensional One Place Signature to Represent Them All: Stackable Descriptors For Visual Place Recognition | Connor Malone et.al. | 2412.06153 | null |
| 2024-12-07 | Compositional Image Retrieval via Instruction-Aware Contrastive Learning | Wenliang Zhong et.al. | 2412.05756 | null |
| 2024-12-06 | DAug: Diffusion-based Channel Augmentation for Radiology Image Retrieval and Classification | Ying Jin et.al. | 2412.04828 | null |
| 2024-12-04 | Distillation of Diffusion Features for Semantic Correspondence | Frank Fundel et.al. | 2412.03512 | null |
| 2024-12-04 | Composed Image Retrieval for Training-Free Domain Conversion | Nikos Efthymiadis et.al. | 2412.03297 | link |
| 2024-12-03 | A Minimalistic 3D Self-Organized UAV Flocking Approach for Desert Exploration | Thulio Amorim et.al. | 2412.02881 | null |
| 2024-12-03 | Active Learning via Classifier Impact and Greedy Selection for Interactive Image Retrieval | Leah Bar et.al. | 2412.02310 | link |
| 2024-12-02 | Mutli-View 3D Reconstruction using Knowledge Distillation | Aditya Dutt et.al. | 2412.02039 | link |
| 2024-12-02 | Optimizing Domain-Specific Image Retrieval: A Benchmark of FAISS and Annoy with Fine-Tuned Features | MD Shaikh Rahman et.al. | 2412.01555 | null |
| 2024-12-02 | Neuron Abandoning Attention Flow: Visual Explanation of Dynamics inside CNN Models | Yi Liao et.al. | 2412.01202 | null |
| 2024-12-01 | EDTformer: An Efficient Decoder Transformer for Visual Place Recognition | Tong Jin et.al. | 2412.00784 | null |
| 2024-11-28 | EFSA: Episodic Few-Shot Adaptation for Text-to-Image Retrieval | Muhammad Huzaifa et.al. | 2412.00139 | null |
| 2024-11-28 | Unleashing the Power of Data Synthesis in Visual Localization | Sihang Li et.al. | 2412.00138 | null |
| 2024-11-28 | Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval | Yang Liu et.al. | 2412.00120 | null |
| 2024-11-29 | A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications | Liqiang Zhang Ye Tian Dongyan Wei et.al. | 2411.19845 | null |
| 2024-11-27 | Optimizing Image Retrieval with an Extended b-Metric Space | Abdelkader Belhenniche et.al. | 2411.18800 | null |
| 2024-11-26 | Learning Visual Hierarchies with Hyperbolic Embeddings | Ziwei Wang et.al. | 2411.17490 | null |
| 2024-12-02 | Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy | You Li et.al. | 2411.16752 | null |
| 2024-12-02 | AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks | You Li et.al. | 2411.16749 | null |
| 2024-11-25 | Image Generation Diversity Issues and How to Tame Them | Mischa Dombrowski et.al. | 2411.16171 | link |
| 2024-11-24 | PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments | Haoang Li et.al. | 2411.15800 | null |
| 2024-11-22 | Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval | Zengbao Sun et.al. | 2411.14704 | null |
| 2024-11-20 | Globally Correlation-Aware Hard Negative Generation | Wenjie Peng et.al. | 2411.13145 | link |
| 2024-11-18 | Exploring Emerging Trends and Research Opportunities in Visual Place Recognition | Antonios Gasteratos et.al. | 2411.11481 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-02-23 | Rewards-based image analysis in microscopy | Kamyar Barakati et.al. | 2502.18522 | null |
| 2025-02-19 | 2.5D U-Net with Depth Reduction for 3D CryoET Object Identification | Yusuke Uchida et.al. | 2502.13484 | link |
| 2025-01-30 | Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images | Wei-Lun Chen et.al. | 2501.18453 | null |
| 2025-01-30 | Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models | Bhargav Ghanekar et.al. | 2501.18361 | null |
| 2025-01-30 | Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems | Liudi Yang et.al. | 2501.18110 | null |
| 2025-01-21 | Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction | Mengyuan Li et.al. | 2501.11844 | null |
| 2025-01-20 | MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching | Yepeng Liu et.al. | 2501.11299 | null |
| 2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069 | null |
| 2025-01-13 | Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications | Lukas Rustler et.al. | 2501.07421 | null |
| 2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399 | null |
| 2024-12-24 | GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network | Xianfeng Song et.al. | 2412.18221 | link |
| 2024-12-21 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2412.16755 | null |
| 2024-12-19 | Corn Ear Detection and Orientation Estimation Using Deep Learning | Nathan Sprague et.al. | 2412.14954 | null |
| 2024-12-12 | Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models | Faith Johnson et.al. | 2412.09739 | null |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292 | null |
| 2024-12-07 | Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures | Muhammad Umar Farooq et.al. | 2412.05487 | null |
| 2024-12-04 | Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything | Yongkyu Lee et.al. | 2412.03472 | link |
| 2024-12-02 | MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Yonghao Dang et.al. | 2412.01422 | null |
| 2024-11-23 | OCDet: Object Center Detection via Bounding Box-Aware Heatmap Prediction on Edge Devices with NPUs | Chen Xin et.al. | 2411.15653 | link |
| 2024-11-19 | IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose | Fei Ren et.al. | 2411.12676 | null |
| 2024-11-04 | Silver medal Solution for Image Matching Challenge 2024 | Yian Wang et.al. | 2411.01851 | null |
| 2024-11-04 | KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension | Jie Yang et.al. | 2411.01846 | null |
| 2024-10-31 | From Web Data to Real Fields: Low-Cost Unsupervised Domain Adaptation for Agricultural Robots | Vasileios Tzouras et.al. | 2410.23906 | null |
| 2024-11-27 | Sim2real Cattle Joint Estimation in 3D point clouds | Mohammad Okour et.al. | 2410.14419 | null |
| 2024-10-16 | PND-Net: Plant Nutrition Deficiency and Disease Classification using Graph Convolutional Network | Asish Bera et.al. | 2410.12742 | null |
| 2024-10-16 | RAFA-Net: Region Attention Network For Food Items And Agricultural Stress Recognition | Asish Bera et.al. | 2410.12718 | null |
| 2024-10-11 | Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image | Marta Veganzones Rodriguez et.al. | 2410.09155 | null |
2025-1
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-02-08 | Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment | Maneesha Wickramasuriya et.al. | 2502.05409 | null |
| 2025-02-06 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | Nathan Louis et.al. | 2502.04483 | null |
| 2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293 | null |
| 2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877 | null |
| 2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510 | null |
| 2025-02-04 | Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Jian Liu et.al. | 2502.02525 | link |
| 2025-02-03 | CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation | Xiao Lin et.al. | 2502.01312 | null |
| 2025-02-03 | Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter | Dabin Kim et.al. | 2502.01092 | null |
| 2025-02-03 | ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking | Jianqiu Chen et.al. | 2502.01004 | null |
| 2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115 | null |
| 2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
| 2025-01-30 | SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images | Daniel Bermuth et.al. | 2501.18478 | null |
| 2025-01-29 | Online Trajectory Replanner for Dynamically Grasping Irregular Objects | Minh Nhat Vu et.al. | 2501.17968 | null |
| 2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751 | null |
| 2025-01-27 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | Hoosang Lee et.al. | 2501.16146 | null |
| 2025-01-27 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | Jialun Cai et.al. | 2501.15763 | null |
| 2025-01-25 | Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos | Zhen-Hui Dong et.al. | 2501.15096 | null |
| 2025-01-25 | SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos | Yingying Jiao et.al. | 2501.15073 | null |
| 2025-01-24 | 3D/2D Registration of Angiograms using Silhouette-based Differentiable Rendering | Taewoong Lee et.al. | 2501.14918 | link |
| 2025-01-24 | Light3R-SfM: Towards Feed-forward Structure-from-Motion | Sven Elflein et.al. | 2501.14914 | null |
| 2025-01-24 | Glissando-Net: Deep sinGLe vIew category level poSe eStimation ANd 3D recOnstruction | Bo Sun et.al. | 2501.14896 | null |
| 2025-01-24 | Optimizing Grasping Precision for Industrial Pick-and-Place Tasks Through a Novel Visual Servoing Approach | Khairidine Benali et.al. | 2501.14557 | null |
| 2025-01-24 | LiDAR-Based Vehicle Detection and Tracking for Autonomous Racing | Marcello Cellina et.al. | 2501.14502 | null |
| 2025-01-24 | Optimizing Human Pose Estimation Through Focused Human and Joint Regions | Yingying Jiao et.al. | 2501.14439 | null |
| 2025-01-24 | Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation | Haipeng Chen et.al. | 2501.14356 | null |
| 2025-01-24 | HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting | Javier Yu et.al. | 2501.14147 | null |
| 2025-01-23 | Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass | Jianing Yang et.al. | 2501.13928 | null |
| 2025-01-23 | EgoHand: Ego-centric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMUs | Yizhe Lv et.al. | 2501.13805 | link |
| 2025-01-23 | VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM | Gyuhyeon Pak et.al. | 2501.13402 | null |
| 2025-01-22 | Deep Learning-Based Image Recovery and Pose Estimation for Resident Space Objects | Louis Aberdeen et.al. | 2501.13009 | null |
| 2025-01-21 | BlanketGen2-Fit3D: Synthetic Blanket Augmentation Towards Improving Real-World In-Bed Blanket Occluded Human Pose Estimation | Tamás Karácsony et.al. | 2501.12318 | null |
| 2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069 | null |
| 2025-01-17 | landmarker: a Toolkit for Anatomical Landmark Localization in 2D/3D Images | Jef Jonkers et.al. | 2501.10098 | link |
| 2025-01-16 | A New Teacher-Reviewer-Student Framework for Semi-supervised 2D Human Pose Estimation | Wulian Yun et.al. | 2501.09565 | null |
| 2025-01-21 | Towards Robust and Realistic Human Pose Estimation via WiFi Signals | Yang Chen et.al. | 2501.09411 | link |
| 2025-01-16 | RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects | Zhen Luo et.al. | 2501.09307 | null |
| 2025-01-16 | BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module | Dongzhihan Wang et.al. | 2501.08659 | null |
| 2025-01-14 | Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion | Cesare Davide Pace et.al. | 2501.08446 | link |
| 2025-01-14 | Leveraging 2D Masked Reconstruction for Domain Adaptation of 3D Pose Estimation | Hansoo Park et.al. | 2501.08408 | null |
| 2025-01-14 | Predicting 4D Hand Trajectory from Monocular Videos | Yufei Ye et.al. | 2501.08329 | null |
| 2025-01-14 | A Critical Synthesis of Uncertainty Quantification and Foundation Models in Monocular Depth Estimation | Steven Landgraf et.al. | 2501.08188 | null |
| 2025-01-14 | AgentPose: Progressive Distribution Alignment via Feature Agent for Human Pose Distillation | Feng Zhang et.al. | 2501.08088 | null |
| 2025-01-14 | Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation | Feng Zhang et.al. | 2501.08038 | null |
| 2025-01-14 | BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | Farnoosh Koleini et.al. | 2501.07800 | null |
| 2025-01-13 | Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation | Yaqing Ding et.al. | 2501.07742 | link |
| 2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399 | null |
| 2025-01-13 | Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics | Tze Ho Elden Tse et.al. | 2501.07100 | null |
| 2025-01-10 | eKalibr: Dynamic Intrinsic Calibration for Event Cameras From First Principles of Events | Shuolong Chen et.al. | 2501.05688 | null |
| 2025-01-09 | Relative Pose Estimation through Affine Corrections of Monocular Depth Priors | Yifan Yu et.al. | 2501.05446 | link |
| 2025-01-09 | From Simple to Complex Skills: The Case of In-Hand Object Reorientation | Haozhi Qi et.al. | 2501.05439 | null |
| 2025-01-11 | Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation | Jiaxuan Peng et.al. | 2501.05264 | null |
| 2025-01-08 | KN-LIO: Geometric Kinematics and Neural Field Coupled LiDAR-Inertial Odometry | Zhong Wang et.al. | 2501.04263 | null |
| 2025-01-10 | MC-VTON: Minimal Control Virtual Try-On Diffusion Transformer | Junsheng Luan et.al. | 2501.03630 | null |
| 2025-01-07 | TexHOI: Reconstructing Textures of 3D Unknown Objects in Monocular Hand-Object Interaction Scenes | Alakh Aggarwal et.al. | 2501.03525 | link |
| 2025-01-06 | Mobile Augmented Reality Framework with Fusional Localization and Pose Estimation | Songlin Hou et.al. | 2501.03336 | null |
| 2025-01-06 | SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation | Haozheng Xu et.al. | 2501.02990 | null |
| 2025-01-06 | HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos | Jinglei Zhang et.al. | 2501.02973 | null |
| 2025-01-06 | Spiking monocular event based 6D pose estimation for space application | Jonathan Courtois et.al. | 2501.02916 | null |
| 2025-01-06 | Universal Features Guided Zero-Shot Category-Level Object Pose Estimation | Wentian Qu et.al. | 2501.02831 | null |
| 2025-01-06 | Unsupervised Domain Adaptation for Occlusion Resilient Human Pose Estimation | Arindam Dutta et.al. | 2501.02773 | null |
| 2025-01-06 | WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation | Tianjian Jiang et.al. | 2501.02771 | null |
| 2025-01-05 | LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments | Haosong Yue et.al. | 2501.02580 | null |
| 2025-01-04 | ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle | Yinchuan Wang et.al. | 2501.02166 | link |
| 2025-01-03 | TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | Jiajie Liu et.al. | 2501.01770 | null |
| 2025-01-03 | Laparoscopic Scene Analysis for Intraoperative Visualisation of Gamma Probe Signals in Minimally Invasive Cancer Surgery | Baoru Huang et.al. | 2501.01752 | null |
| 2025-01-02 | On Unifying Video Generation and Camera Pose Estimation | Chun-Hao Paul Huang et.al. | 2501.01409 | null |
| 2025-01-02 | L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild | Soumyaratna Debnath et.al. | 2501.01174 | null |
| 2024-12-31 | Relative Pose Observability Analysis Using Dual Quaternions | Nicholas B. Andrews et.al. | 2501.00657 | null |
| 2024-12-31 | VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception | Zhaoliang Wan et.al. | 2501.00510 | null |
| 2024-12-30 | Hierarchical Pose Estimation and Mapping with Multi-Scale Neural Feature Fields | Evgenii Kruzhkov et.al. | 2412.20976 | null |
| 2024-12-30 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | Hrishikesh Gupta et.al. | 2412.20830 | link |
| 2024-12-30 | Frequency-aware Event Cloud Network | Hongwei Ren et.al. | 2412.20803 | null |
| 2024-12-30 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | Keng-Wei Chang et.al. | 2412.20767 | null |
| 2024-12-30 | Towards nation-wide analytical healthcare infrastructures: A privacy-preserving augmented knee rehabilitation case study | Boris Bačić et.al. | 2412.20733 | null |
| 2024-12-29 | Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation | Qucheng Peng et.al. | 2412.20538 | link |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-02-11 | Ultrafast 4D scanning transmission electron microscopy for imaging of localized optical fields | Petr Koutenský et.al. | 2502.07338 | null |
| 2025-02-11 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Haowen Gao et.al. | 2502.07327 | null |
| 2025-02-11 | PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval | Osman Tursun et.al. | 2502.07215 | null |
| 2025-02-10 | AstroLoc: Robust Space to Ground Image Localizer | Gabriele Berton et.al. | 2502.07003 | null |
| 2025-02-09 | Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education | Yanhao Jia et.al. | 2502.05863 | null |
| 2025-02-07 | Learning Street View Representations with Spatiotemporal Contrast | Yong Li et.al. | 2502.04638 | null |
| 2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | null |
| 2025-02-05 | Human-Aligned Image Models Improve Visual Decoding from the Brain | Nona Rajabi et.al. | 2502.03081 | null |
| 2025-02-03 | ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies | Costin F. Ciusdel et.al. | 2502.01335 | null |
| 2025-01-27 | Freestyle Sketch-in-the-Loop Image Segmentation | Subhadeep Koley et.al. | 2501.16022 | null |
| 2025-01-26 | Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations | Zijun Long et.al. | 2501.15379 | null |
| 2025-01-24 | Visual Localization via Semantic Structures in Autonomous Photovoltaic Power Plant Inspection | Viktor Kozák et.al. | 2501.14587 | null |
| 2025-01-23 | Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jakob Krogh Petersen et.al. | 2501.14051 | link |
| 2025-01-22 | Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation | Kenta Uesugi et.al. | 2501.13968 | null |
| 2025-01-19 | Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection | Zhipeng Yu et.al. | 2501.11063 | link |
| 2025-01-18 | A Resource-Efficient Training Framework for Remote Sensing Text–Image Retrieval | Weihang Zhang et.al. | 2501.10638 | null |
| 2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887 | null |
| 2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
| 2025-01-12 | SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | Bhavin Jawade et.al. | 2501.08347 | null |
| 2025-01-12 | Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation | Zhenyang Feng et.al. | 2501.06749 | null |
| 2025-01-06 | Integrating Language-Image Prior into EEG Decoding for Cross-Task Zero-Calibration RSVP-BCI | Xujin Li et.al. | 2501.02841 | null |
| 2025-01-03 | iCBIR-Sli: Interpretable Content-Based Image Retrieval with 2D Slice Embeddings | Shuhei Tomoshige et.al. | 2501.01642 | null |
| 2025-01-02 | R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization | Xudong Jiang et.al. | 2501.01421 | null |
| 2025-01-02 | Training Medical Large Vision-Language Models with Abnormal-Aware Feedback | Yucheng Zhou et.al. | 2501.01377 | null |
| 2025-01-02 | Domain-invariant feature learning in brain MR imaging for content-based image retrieval | Shuya Tobari et.al. | 2501.01326 | null |
| 2024-12-28 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | Atticus J. Zeller et.al. | 2412.20056 | link |
| 2024-12-25 | FOR: Finetuning for Object Level Open Vocabulary Image Retrieval | Hila Levi et.al. | 2412.18806 | null |
| 2024-12-24 | ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval | Le Dong et.al. | 2412.18136 | link |
| 2024-12-22 | Where am I? Cross-View Geo-localization with Natural Language Descriptions | Junyan Ye et.al. | 2412.17007 | null |
| 2024-12-24 | Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling | Daichi Yashima et.al. | 2412.16576 | link |
| 2024-12-20 | A New Method to Capturing Compositional Knowledge in Linguistic Space | Jiahe Wan et.al. | 2412.15632 | null |
| 2024-12-20 | Stabilizing Laplacian Inversion in Fokker-Planck Image Retrieval using the Transport-of-Intensity Equation | Samantha J Alloo et.al. | 2412.15513 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-03-07 | Automatic determination of quasicrystalline patterns from microscopy images | Tano Kim Kender et.al. | 2503.05472 | null |
| 2025-03-07 | Spatial regularisation for improved accuracy and interpretability in keypoint-based registration | Benjamin Billot et.al. | 2503.04499 | null |
| 2025-03-04 | A Novel Streamline-based diffusion MRI Tractography Registration Method with Probabilistic Keypoint Detection | Junyi Wang et.al. | 2503.02481 | null |
| 2025-03-01 | Autonomous Dissection in Robotic Cholecystectomy | Ki-Hwan Oh et.al. | 2503.00666 | null |
| 2025-02-28 | CNSv2: Probabilistic Correspondence Encoded Neural Image Servo | Anzhe Chen et.al. | 2503.00132 | null |
| 2025-02-27 | Automatic Temporal Segmentation for Post-Stroke Rehabilitation: A Keypoint Detection and Temporal Segmentation Approach for Small Datasets | Jisoo Lee et.al. | 2502.19766 | null |
| 2025-02-23 | Rewards-based image analysis in microscopy | Kamyar Barakati et.al. | 2502.18522 | null |
| 2025-02-19 | 2.5D U-Net with Depth Reduction for 3D CryoET Object Identification | Yusuke Uchida et.al. | 2502.13484 | link |
| 2025-01-30 | Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images | Wei-Lun Chen et.al. | 2501.18453 | null |
| 2025-01-30 | Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models | Bhargav Ghanekar et.al. | 2501.18361 | null |
| 2025-01-30 | Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems | Liudi Yang et.al. | 2501.18110 | null |
| 2025-01-21 | Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction | Mengyuan Li et.al. | 2501.11844 | null |
| 2025-01-20 | MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching | Yepeng Liu et.al. | 2501.11299 | null |
| 2025-01-19 | Refinement Module based on Parse Graph of Feature Map for Human Pose Estimation | Shibang Liu et.al. | 2501.11069 | null |
| 2025-01-13 | Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications | Lukas Rustler et.al. | 2501.07421 | null |
| 2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399 | null |
| 2024-12-24 | GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network | Xianfeng Song et.al. | 2412.18221 | link |
| 2024-12-21 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2412.16755 | null |
| 2024-12-19 | Corn Ear Detection and Orientation Estimation Using Deep Learning | Nathan Sprague et.al. | 2412.14954 | null |
| 2024-12-12 | Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models | Faith Johnson et.al. | 2412.09739 | null |
| 2024-12-09 | An Efficient Scene Coordinate Encoding and Relocalization Method | Kuan Xu et.al. | 2412.06488 | link |
| 2024-12-09 | ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models | Bingchen Gong et.al. | 2412.06292 | null |
| 2024-12-07 | Securing Social Media Against Deepfakes using Identity, Behavioral, and Geometric Signatures | Muhammad Umar Farooq et.al. | 2412.05487 | null |
| 2024-12-04 | Measure Anything: Real-time, Multi-stage Vision-based Dimensional Measurement using Segment Anything | Yongkyu Lee et.al. | 2412.03472 | link |
2025-2
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-02-28 | BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Jing-Yuan Chang et.al. | 2502.21085 | null |
| 2025-02-28 | Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints | Masoumeh Chapariniya et.al. | 2502.20803 | null |
| 2025-02-27 | Cutting-edge 3D reconstruction solutions for underwater coral reef images: A review and comparison | Jiageng Zhong et.al. | 2502.20154 | null |
| 2025-02-27 | BEV-DWPVO: BEV-based Differentiable Weighted Procrustes for Low Scale-drift Monocular Visual Odometry on Ground | Yufei Wei et.al. | 2502.20078 | null |
| 2025-02-28 | SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird’s-Eye-View Segmentation | Zijie Zhou et.al. | 2502.20077 | link |
| 2025-02-27 | RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges | Thibaut Loiseau et.al. | 2502.19955 | null |
| 2025-02-27 | QORT-Former: Query-optimized Real-time Transformer for Understanding Two Hands Manipulating Objects | Elkhan Ismayilzada et.al. | 2502.19769 | null |
| 2025-02-27 | Accurate Pose Estimation for Flight Platforms based on Divergent Multi-Aperture Imaging System | Shunkun Liang et.al. | 2502.19708 | null |
| 2025-02-26 | Increasing the Task Flexibility of Heavy-Duty Manipulators Using Visual 6D Pose Estimation of Objects | Petri Mäkinen et.al. | 2502.19169 | null |
| 2025-02-25 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | Dominik Hollidt et.al. | 2502.18373 | null |
| 2025-02-25 | Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation | Tianyang Xu et.al. | 2502.18214 | link |
| 2025-02-24 | V-HOP: Visuo-Haptic 6D Object Pose Tracking | Hongyu Li et.al. | 2502.17434 | null |
| 2025-02-23 | Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM | Yao Zhang et.al. | 2502.16495 | null |
| 2025-02-23 | DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion | Jianbin Jiao et.al. | 2502.16419 | link |
| 2025-02-21 | RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes | Sicheng Yu et.al. | 2502.15633 | null |
| 2025-02-21 | SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training | Nie Lin et.al. | 2502.15251 | null |
| 2025-02-21 | Nonlinear Dynamical Systems for Automatic Face Annotation in Head Tracking and Pose Estimation | Thoa Thieu et.al. | 2502.15179 | null |
| 2025-02-20 | Design of a Visual Pose Estimation Algorithm for Moon Landing | Atakan Süslü et.al. | 2502.14942 | null |
| 2025-02-20 | Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting | Boying Li et.al. | 2502.14931 | null |
| 2025-02-19 | EfficientPose 6D: Scalable and Efficient 6D Object Pose Estimation | Zixuan Fang et.al. | 2502.14061 | null |
| 2025-02-19 | Active Illumination for Visual Ego-Motion Estimation in the Dark | Francesco Crocetti et.al. | 2502.13708 | null |
| 2025-02-19 | Object-Pose Estimation With Neural Population Codes | Heiko Hoffmann et.al. | 2502.13403 | null |
| 2025-02-18 | Spatiotemporal Multi-Camera Calibration using Freely Moving People | Sang-Eun Lee et.al. | 2502.12546 | null |
| 2025-02-18 | Learning Transformation-Isomorphic Latent Space for Accurate Hand Pose Estimation | Kaiwen Ren et.al. | 2502.12535 | null |
| 2025-02-19 | FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views | Shangzhan Zhang et.al. | 2502.12138 | null |
| 2025-02-17 | Enhancing Transparent Object Pose Estimation: A Fusion of GDR-Net and Edge Detection | Tessa Pulli et.al. | 2502.12027 | null |
| 2025-02-17 | SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking | Zijian Wu et.al. | 2502.11534 | null |
| 2025-02-18 | VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS | Ming Meng et.al. | 2502.10729 | link |
| 2025-02-15 | Semantics-aware Test-time Adaptation for 3D Human Pose Estimation | Qiuxia Lin et.al. | 2502.10724 | null |
| 2025-02-15 | Learning semantical dynamics and spatiotemporal collaboration for human pose estimation in video | Runyang Feng et.al. | 2502.10616 | null |
| 2025-02-14 | HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation | Yibo Liu et.al. | 2502.10606 | null |
| 2025-02-14 | Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models | Chenrui Tie et.al. | 2502.10090 | null |
| 2025-02-13 | Metamorphic Testing for Pose Estimation Systems | Matias Duran et.al. | 2502.09460 | null |
| 2025-02-13 | BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization | Qiwei Wang et.al. | 2502.09080 | null |
| 2025-02-14 | Siren Song: Manipulating Pose Estimation in XR Headsets Using Acoustic Attacks | Zijian Huang et.al. | 2502.08865 | null |
| 2025-02-12 | LIR-LIVO: A Lightweight,Robust LiDAR/Vision/Inertial Odometry with Illumination-Resilient Deep Features | Shujie Zhou et.al. | 2502.08676 | link |
| 2025-02-12 | CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World | Yankai Fu et.al. | 2502.08449 | null |
| 2025-02-11 | GaRLIO: Gravity enhanced Radar-LiDAR-Inertial Odometry | Chiyun Noh et.al. | 2502.07703 | link |
| 2025-02-11 | Matrix3D: Large Photogrammetry Model All-in-One | Yuanxun Lu et.al. | 2502.07685 | null |
| 2025-02-08 | Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment | Maneesha Wickramasuriya et.al. | 2502.05409 | null |
| 2025-02-06 | Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation | Nathan Louis et.al. | 2502.04483 | link |
| 2025-02-06 | GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation | Weihang Li et.al. | 2502.04293 | null |
| 2025-02-06 | Advanced Object Detection and Pose Estimation with Hybrid Task Cascade and High-Resolution Networks | Yuhui Jin et.al. | 2502.03877 | null |
| 2025-02-05 | Mapping and Localization Using LiDAR Fiducial Markers | Yibo Liu et.al. | 2502.03510 | null |
| 2025-02-04 | Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation | Jian Liu et.al. | 2502.02525 | link |
| 2025-02-03 | CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation | Xiao Lin et.al. | 2502.01312 | null |
| 2025-02-03 | Enhancing Feature Tracking Reliability for Visual Navigation using Real-Time Safety Filter | Dabin Kim et.al. | 2502.01092 | null |
| 2025-02-03 | ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking | Jianqiu Chen et.al. | 2502.01004 | null |
| 2025-01-31 | A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration | Richard Cheng et.al. | 2502.00115 | null |
| 2025-01-31 | XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses | Bo Lan et.al. | 2501.19034 | link |
| 2025-01-30 | SimpleDepthPose: Fast and Reliable Human Pose Estimation with RGBD-Images | Daniel Bermuth et.al. | 2501.18478 | null |
| 2025-01-29 | Online Trajectory Replanner for Dynamically Grasping Irregular Objects | Minh Nhat Vu et.al. | 2501.17968 | null |
| 2025-01-28 | DebugAgent: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging | Muxi Chen et.al. | 2501.16751 | null |
| 2025-01-27 | Toward Efficient Generalization in 3D Human Pose Estimation via a Canonical Domain Approach | Hoosang Lee et.al. | 2501.16146 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-03-04 | TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition | Oliver Grainge et.al. | 2503.02511 | null |
| 2025-03-04 | Continual Multi-Robot Learning from Black-Box Visual Place Recognition Models | Kenta Tsukahara et.al. | 2503.02256 | null |
| 2025-03-03 | Composed Multi-modal Retrieval: A Survey of Approaches and Applications | Kun Zhang et.al. | 2503.01334 | link |
| 2025-03-03 | AirRoom: Objects Matter in Room Reidentification | Runmao Yao et.al. | 2503.01130 | null |
| 2025-03-02 | Efficient End-to-end Visual Localization for Autonomous Driving with Decoupled BEV Neural Matching | Jinyu Miao et.al. | 2503.00862 | null |
| 2025-03-01 | Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning | Songlin Dong et.al. | 2503.00515 | null |
| 2025-02-28 | EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration | Kuangyi Chen et.al. | 2503.00167 | null |
| 2025-02-28 | CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval | Zelong Sun et.al. | 2502.20826 | null |
| 2025-02-28 | SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition | Shanshan Wan et.al. | 2502.20676 | null |
| 2025-02-27 | A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization | Yejun Zhang et.al. | 2502.20036 | link |
| 2025-02-27 | On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation | Ruben T. Lucassen et.al. | 2502.19285 | null |
| 2025-02-19 | A Comprehensive Survey on Composed Image Retrieval | Xuemeng Song et.al. | 2502.18495 | null |
| 2025-02-25 | MegaLoc: One Retrieval to Place Them All | Gabriele Berton et.al. | 2502.17237 | link |
| 2025-02-23 | Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries | Yin Wu et.al. | 2502.16636 | link |
| 2025-02-23 | SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition | Feng Lu et.al. | 2502.16601 | link |
| 2025-02-21 | ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval | Guanqi Zhan et.al. | 2502.15682 | null |
| 2025-02-20 | Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition | Tianyi Shang et.al. | 2502.14195 | link |
| 2025-02-19 | 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments | Vincent Ress et.al. | 2502.13803 | null |
| 2025-02-18 | Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization | Shuo Xing et.al. | 2502.13146 | link |
| 2025-02-19 | IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360 $^\circ$ Cameras | Dongki Jung et.al. | 2502.12545 | null |
| 2025-02-17 | From Gaming to Research: GTA V for Synthetic Data Generation for Robotics and Navigations | Matteo Scucchia et.al. | 2502.12303 | null |
| 2025-02-17 | Descriminative-Generative Custom Tokens for Vision-Language Models | Pramuditha Perera et.al. | 2502.12095 | null |
| 2025-02-17 | ILIAS: Instance-Level Image retrieval At Scale | Giorgos Kordopatis-Zilos et.al. | 2502.11748 | null |
| 2025-02-17 | Range and Bird’s Eye View Fused Cross-Modal Visual Place Recognition | Jianyi Peng et.al. | 2502.11742 | null |
| 2025-02-17 | Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics | Francesco Croce et.al. | 2502.11725 | link |
| 2025-02-17 | Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization | Yuanze Xu et.al. | 2502.11408 | null |
| 2025-02-12 | E2LVLM:Evidence-Enhanced Large Vision-Language Model for Multimodal Out-of-Context Misinformation Detection | Junjie Wu et.al. | 2502.10455 | null |
| 2025-02-11 | Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning | Yuhang Dong et.al. | 2502.09649 | null |
| 2025-02-13 | ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation | Rotem Shalev-Arkushin et.al. | 2502.09411 | null |
| 2025-02-12 | SpeechCompass: Enhancing Mobile Captioning with Diarization and Directional Guidance via Multi-Microphone Localization | Artem Dementyev et.al. | 2502.08848 | null |
| 2025-02-12 | Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions | Prajwal Gatti et.al. | 2502.08438 | null |
| 2025-02-11 | Captured by Captions: On Memorization and its Mitigation in CLIP Models | Wenhao Wang et.al. | 2502.07830 | null |
| 2025-02-11 | Ultrafast 4D scanning transmission electron microscopy for imaging of localized optical fields | Petr Koutenský et.al. | 2502.07338 | null |
| 2025-02-11 | Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Haowen Gao et.al. | 2502.07327 | null |
| 2025-02-11 | PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval | Osman Tursun et.al. | 2502.07215 | null |
| 2025-02-10 | AstroLoc: Robust Space to Ground Image Localizer | Gabriele Berton et.al. | 2502.07003 | null |
| 2025-02-09 | Uni-Retrieval: A Multi-Style Retrieval Framework for STEM’s Education | Yanhao Jia et.al. | 2502.05863 | null |
| 2025-02-07 | Learning Street View Representations with Spatiotemporal Contrast | Yong Li et.al. | 2502.04638 | null |
| 2025-02-06 | Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion | Marco Mistretta et.al. | 2502.04263 | link |
| 2025-02-05 | Human-Aligned Image Models Improve Visual Decoding from the Brain | Nona Rajabi et.al. | 2502.03081 | null |
| 2025-02-03 | ConceptVAE: Self-Supervised Fine-Grained Concept Disentanglement from 2D Echocardiographies | Costin F. Ciusdel et.al. | 2502.01335 | null |
| 2025-01-27 | Freestyle Sketch-in-the-Loop Image Segmentation | Subhadeep Koley et.al. | 2501.16022 | null |
| 2025-01-26 | Zero-Shot Interactive Text-to-Image Retrieval via Diffusion-Augmented Representations | Zijun Long et.al. | 2501.15379 | null |
| 2025-01-24 | Visual Localization via Semantic Structures in Autonomous Photovoltaic Power Plant Inspection | Viktor Kozák et.al. | 2501.14587 | null |
| 2025-01-23 | Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models | Jakob Krogh Petersen et.al. | 2501.14051 | link |
| 2025-01-22 | Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation | Kenta Uesugi et.al. | 2501.13968 | null |
| 2025-01-19 | Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection | Zhipeng Yu et.al. | 2501.11063 | link |
| 2025-01-18 | A Resource-Efficient Training Framework for Remote Sensing Text–Image Retrieval | Weihang Zhang et.al. | 2501.10638 | null |
| 2025-01-17 | FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis | Zhe Chen et.al. | 2501.09887 | null |
| 2025-01-15 | Vision Foundation Models for Computed Tomography | Suraj Pai et.al. | 2501.09001 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-03-05 | Periodontal Bone Loss Analysis via Keypoint Detection With Heuristic Post-Processing | Ryan Banks et.al. | 2503.13477 | null |
| 2025-03-16 | Histogram Transporter: Learning Rotation-Equivariant Orientation Histograms for High-Precision Robotic Kitting | Jiadong Zhou et.al. | 2503.12541 | null |
| 2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-11 | DaD: Distilled Reinforcement Learning for Diverse Keypoint Detection | Johan Edstedt et.al. | 2503.07347 | link |
| 2025-03-07 | Automatic determination of quasicrystalline patterns from microscopy images | Tano Kim Kender et.al. | 2503.05472 | null |
| 2025-03-07 | Spatial regularisation for improved accuracy and interpretability in keypoint-based registration | Benjamin Billot et.al. | 2503.04499 | link |
| 2025-03-04 | A Novel Streamline-based diffusion MRI Tractography Registration Method with Probabilistic Keypoint Detection | Junyi Wang et.al. | 2503.02481 | null |
| 2025-03-01 | Autonomous Dissection in Robotic Cholecystectomy | Ki-Hwan Oh et.al. | 2503.00666 | null |
| 2025-02-28 | CNSv2: Probabilistic Correspondence Encoded Neural Image Servo | Anzhe Chen et.al. | 2503.00132 | null |
| 2025-02-27 | Automatic Temporal Segmentation for Post-Stroke Rehabilitation: A Keypoint Detection and Temporal Segmentation Approach for Small Datasets | Jisoo Lee et.al. | 2502.19766 | null |
| 2025-02-23 | Rewards-based image analysis in microscopy | Kamyar Barakati et.al. | 2502.18522 | null |
| 2025-02-19 | 2.5D U-Net with Depth Reduction for 3D CryoET Object Identification | Yusuke Uchida et.al. | 2502.13484 | link |
| 2025-01-30 | Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images | Wei-Lun Chen et.al. | 2501.18453 | null |
| 2025-01-30 | Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models | Bhargav Ghanekar et.al. | 2501.18361 | null |
| 2025-01-30 | Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems | Liudi Yang et.al. | 2501.18110 | null |
| 2025-01-21 | Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction | Mengyuan Li et.al. | 2501.11844 | null |
| 2025-01-20 | MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching | Yepeng Liu et.al. | 2501.11299 | null |
| 2025-01-13 | Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications | Lukas Rustler et.al. | 2501.07421 | null |
| 2025-01-13 | Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps | Saurabh Gupta et.al. | 2501.07399 | null |
| 2024-12-24 | GIMS: Image Matching System Based on Adaptive Graph Construction and Graph Neural Network | Xianfeng Song et.al. | 2412.18221 | link |
| 2024-12-21 | A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection | Shahid Ansari et.al. | 2412.16755 | null |
2025-3
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-04-04 | Robust Human Registration with Body Part Segmentation on Noisy Point Clouds | Kai Lascheit et.al. | 2504.03602 | null |
| 2025-04-04 | Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video | Jiaxin Guo et.al. | 2504.03198 | null |
| 2025-04-03 | Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks | Hyun-Ho Choi et.al. | 2504.03052 | null |
| 2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812 | null |
| 2025-04-03 | PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Lihua Liu et.al. | 2504.02617 | null |
| 2025-04-02 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | Mingrui Ye et.al. | 2504.01764 | link |
| 2025-04-02 | ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue | Thomas Pritchard et.al. | 2504.01261 | link |
| 2025-04-01 | AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline | Lei Wang et.al. | 2504.00394 | null |
| 2025-03-31 | Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Xingyu Chen et.al. | 2503.24391 | link |
| 2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664 | null |
| 2025-03-30 | PhysPose: Refining 6D Object Poses with Physical Constraints | Martin Malenický et.al. | 2503.23587 | null |
| 2025-03-30 | Improving Indoor Localization Accuracy by Using an Efficient Implicit Neural Map Representation | Haofei Kuang et.al. | 2503.23480 | link |
| 2025-03-30 | SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation | Pranjal Paul et.al. | 2503.23465 | null |
| 2025-03-30 | HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation | Hongwei Zheng et.al. | 2503.23331 | null |
| 2025-03-29 | Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization | Jintao Cheng et.al. | 2503.23199 | null |
| 2025-03-28 | ForcePose: A Deep Learning Approach for Force Calculation Based on Action Recognition Using MediaPipe Pose Estimation Combined with Object Detection | Nandakishor M et.al. | 2503.22363 | null |
| 2025-03-28 | GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion | Li-Heng Chen et.al. | 2503.22349 | null |
| 2025-03-27 | NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications | Kibon Ku et.al. | 2503.21958 | null |
| 2025-03-27 | Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video | David Yifan Yao et.al. | 2503.21761 | link |
| 2025-03-27 | Reconstructing Humans with a Biomechanically Accurate Skeleton | Yan Xia et.al. | 2503.21751 | null |
| 2025-03-27 | OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation | Mallika Garg et.al. | 2503.21723 | null |
| 2025-03-27 | RapidPoseTriangulation: Multi-view Multi-person Whole-body Human Pose Triangulation in a Millisecond | Daniel Bermuth et.al. | 2503.21692 | null |
| 2025-03-27 | STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM | Yongxu Wang et.al. | 2503.21425 | null |
| 2025-03-27 | Lidar-only Odometry based on Multiple Scan-to-Scan Alignments over a Moving Window | Aaron Kurda et.al. | 2503.21293 | null |
| 2025-03-27 | Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2503.21140 | link |
| 2025-03-26 | DINeMo: Learning Neural Mesh Models with no 3D Annotations | Weijie Guo et.al. | 2503.20220 | null |
| 2025-03-25 | Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors | Yuke Lou et.al. | 2503.20118 | null |
| 2025-03-25 | Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders | Paul Koch et.al. | 2503.19947 | null |
| 2025-03-25 | Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing | Lukas Mack et.al. | 2503.19893 | null |
| 2025-03-25 | Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving | Yusen Xie et.al. | 2503.19713 | null |
| 2025-03-25 | DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera Scenarios | Xiangting Meng et.al. | 2503.19625 | null |
| 2025-03-25 | Pose-Based Fall Detection System: Efficient Monitoring on Standard CPUs | Vinayak Mali et.al. | 2503.19501 | null |
| 2025-03-25 | Multi-modal 3D Pose and Shape Estimation with Computed Tomography | Mingxiao Tu et.al. | 2503.19405 | null |
| 2025-03-25 | From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting | Zhiwei Huang et.al. | 2503.19358 | null |
| 2025-03-25 | Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation | Zhuoran Zhao et.al. | 2503.19307 | null |
| 2025-03-25 | Any6D: Model-free 6D Pose Estimation of Novel Objects | Taeyeop Lee et.al. | 2503.18673 | null |
| 2025-03-24 | Structure-Aware Correspondence Learning for Relative Pose Estimation | Yihan Chen et.al. | 2503.18671 | null |
| 2025-03-24 | TrackID3x3: A Dataset and Algorithm for Multi-Player Tracking with Identification and Pose Estimation in 3x3 Basketball Full-court Videos | Kazuhiro Yamada et.al. | 2503.18282 | null |
| 2025-03-23 | Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning | Xiang Fang et.al. | 2503.17938 | null |
| 2025-03-22 | Co-op: Correspondence-based Novel Object Pose Estimation | Sungphill Moon et.al. | 2503.17731 | null |
| 2025-03-21 | Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image | Jerred Chen et.al. | 2503.17358 | null |
| 2025-03-21 | Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors | Wonbong Jang et.al. | 2503.17316 | null |
| 2025-03-20 | ContactFusion: Stochastic Poisson Surface Maps from Visual and Contact Sensing | Aditya Kamireddypalli et.al. | 2503.16592 | null |
| 2025-03-19 | A Comprehensive Survey on Architectural Advances in Deep CNNs: Challenges, Applications, and Emerging Research Directions | Saddam Hussain Khan et.al. | 2503.16546 | null |
| 2025-03-20 | Probabilistic Prompt Distribution Learning for Animal Pose Estimation | Jiyong Rao et.al. | 2503.16120 | link |
| 2025-03-20 | Automating 3D Dataset Generation with Neural Radiance Fields | P. Schulz et.al. | 2503.15997 | link |
| 2025-03-20 | Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras | Beilei Cui et.al. | 2503.15917 | null |
| 2025-03-19 | EdgeRegNet: Edge Feature-based Multimodal Registration Network between Images and LiDAR Point Clouds | Yuanchao Yue et.al. | 2503.15284 | null |
| 2025-03-20 | GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation | Zinqin Huang et.al. | 2503.15110 | link |
| 2025-03-20 | Distilling 3D distinctive local descriptors for 6D pose estimation | Amir Hamza et.al. | 2503.15106 | null |
| 2025-03-18 | Validation of Human Pose Estimation and Human Mesh Recovery for Extracting Clinically Relevant Motion Data from Videos | Kai Armstrong et.al. | 2503.14760 | null |
| 2025-03-18 | SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model | Yucheng Mao et.al. | 2503.14463 | null |
| 2025-03-18 | SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation | Weihong Chen et.al. | 2503.14097 | null |
| 2025-03-18 | Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach | Tianshu Wu et.al. | 2503.14051 | null |
| 2025-03-19 | Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation | Huan Ren et.al. | 2503.13926 | null |
| 2025-03-20 | STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans | Shashikant Verma et.al. | 2503.13344 | null |
| 2025-03-17 | UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation | Yinqiao Wang et.al. | 2503.13303 | null |
| 2025-03-17 | Uncertainty-Aware Knowledge Distillation for Compact and Efficient 6DoF Pose Estimation | Nassim Ali Ousalah et.al. | 2503.13053 | null |
| 2025-03-17 | PoseSyn: Synthesizing Diverse 3D Pose Data from In-the-Wild 2D Data | ChangHee Yang et.al. | 2503.13025 | null |
| 2025-03-15 | Gun Detection Using Combined Human Pose and Weapon Appearance | Amulya Reddy Maligireddy et.al. | 2503.12215 | null |
| 2025-03-15 | TACO: Taming Diffusion for in-the-wild Video Amodal Completion | Ruijie Lu et.al. | 2503.12049 | null |
| 2025-03-14 | Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation | Hiroyasu Akada et.al. | 2503.11652 | null |
| 2025-03-14 | Online Test-time Adaptation for 3D Human Pose Estimation: A Practical Perspective with Estimated 2D Poses | Qiuxia Lin et.al. | 2503.11194 | null |
| 2025-03-14 | Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching | Ruochen Hou et.al. | 2503.11020 | null |
| 2025-03-13 | Clothes-Changing Person Re-identification Based On Skeleton Dynamics | Asaf Joseph et.al. | 2503.10759 | null |
| 2025-03-13 | Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking | Maarten Perneel et.al. | 2503.10450 | null |
| 2025-03-13 | 6D Object Pose Tracking in Internet Videos for Robotic Manipulation | Georgy Ponimatkin et.al. | 2503.10307 | null |
| 2025-03-13 | VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames | Zhiqi Li et.al. | 2503.10286 | null |
| 2025-03-12 | Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting | Weiquan Wang et.al. | 2503.09640 | null |
| 2025-03-12 | GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals | Shuokang Huang et.al. | 2503.09537 | null |
| 2025-03-12 | MonoSLAM: Robust Monocular SLAM with Global Structure Optimization | Bingzheng Jiang et.al. | 2503.09296 | null |
| 2025-03-12 | Better Together: Unified Motion Capture and 3D Avatar Reconstruction | Arthur Moreau et.al. | 2503.09293 | null |
| 2025-03-11 | Acoustic Neural 3D Reconstruction Under Pose Drift | Tianxiang Lin et.al. | 2503.08930 | null |
| 2025-03-11 | Keypoint Semantic Integration for Improved Feature Matching in Outdoor Agricultural Environments | Rajitha de Silva et.al. | 2503.08843 | null |
| 2025-03-11 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
| 2025-03-11 | SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving | Akshat Ghiya et.al. | 2503.08016 | null |
| 2025-03-10 | Better Pose Initialization for Fast and Robust 2D/3D Pelvis Registration | Yehyun Suh et.al. | 2503.07767 | null |
| 2025-03-10 | HumanMM: Global Human Motion Recovery from Multi-shot Videos | Yuhong Zhang et.al. | 2503.07597 | null |
| 2025-03-11 | AthletePose3D: A Benchmark Dataset for 3D Human Pose Estimation and Kinematic Validation in Athletic Movements | Calvin Yeung et.al. | 2503.07499 | null |
| 2025-03-10 | Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey | Chuqi Wang et.al. | 2503.07278 | null |
| 2025-03-12 | Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion | Mona Sheikh Zeinoddin et.al. | 2503.07204 | null |
| 2025-03-10 | Multi-Modal 3D Mesh Reconstruction from Images and Text | Melvin Reka et.al. | 2503.07190 | null |
| 2025-03-11 | PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM | Alan Dao et.al. | 2503.07111 | null |
| 2025-03-09 | AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation | Yang Zou et.al. | 2503.06660 | null |
| 2025-03-08 | NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features | Hongjia Zhai et.al. | 2503.06117 | null |
| 2025-03-08 | Fish2Mesh Transformer: 3D Human Mesh Recovery from Egocentric Vision | David C. Jeong et.al. | 2503.06089 | null |
| 2025-03-08 | ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features | Shan An et.al. | 2503.05995 | null |
| 2025-03-07 | Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments | Zekai Liang et.al. | 2503.05953 | null |
| 2025-03-07 | Novel Object 6D Pose Estimation with a Single Reference View | Jian Liu et.al. | 2503.05578 | null |
| 2025-03-07 | Multi-Grained Feature Pruning for Video-Based Human Pose Estimation | Zhigang Wang et.al. | 2503.05365 | null |
| 2025-03-07 | Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects | Justin Yu et.al. | 2503.05189 | null |
| 2025-03-07 | SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting | Linqi Yang et.al. | 2503.05174 | null |
| 2025-03-07 | GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting | Zheng Zhou et.al. | 2503.05161 | null |
| 2025-03-06 | MarsLGPR: Mars Rover Localization with Ground Penetrating Radar | Anja Sheppard et.al. | 2503.04944 | null |
| 2025-03-06 | ReynoldsFlow: Exquisite Flow Estimation via Reynolds Transport Theorem | Yu-Hsi Chen et.al. | 2503.04500 | null |
| 2025-03-05 | Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames | Jun Yang et.al. | 2503.03726 | null |
| 2025-03-05 | Machine Learning in Biomechanics: Key Applications and Limitations in Walking, Running, and Sports Movements | Carlo Dindorf et.al. | 2503.03717 | null |
| 2025-03-05 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | Thomas Pöllabauer et.al. | 2503.03655 | null |
| 2025-03-05 | Tiny Lidars for Manipulator Self-Awareness: Sensor Characterization and Initial Localization Experiments | Giammarco Caroleo et.al. | 2503.03449 | null |
| 2025-03-05 | Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments | Jie Deng et.al. | 2503.03373 | null |
| 2025-03-05 | Supervised Visual Docking Network for Unmanned Surface Vehicles Using Auto-labeling in Real-world Water Environments | Yijie Chu et.al. | 2503.03282 | null |
| 2025-03-05 | SCORE: Saturated Consensus Relocalization in Semantic Line Maps | Haodong Jiang et.al. | 2503.03254 | null |
| 2025-03-04 | Monocular Person Localization under Camera Ego-motion | Yu Zhan et.al. | 2503.02916 | null |
| 2025-03-04 | PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers | Wooju Lee et.al. | 2503.02388 | null |
| 2025-03-04 | DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting | Haoyuan Li et.al. | 2503.02223 | null |
| 2025-03-04 | Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints | Yan Miao et.al. | 2503.02198 | null |
| 2025-03-03 | Constraint-Based Modeling of Dynamic Entities in 3D Scene Graphs for Robust SLAM | Marco Giberna et.al. | 2503.02050 | null |
| 2025-03-03 | Category-level Meta-learned NeRF Priors for Efficient Object Mapping | Saad Ejaz et.al. | 2503.01582 | null |
| 2025-03-03 | RUSSO: Robust Underwater SLAM with Sonar Optimization against Visual Degradation | Shu Pan et.al. | 2503.01434 | null |
| 2025-03-03 | ecg2o: A Seamless Extension of g2o for Equality-Constrained Factor Graph Optimization | Anas Abdelkarim et.al. | 2503.01311 | null |
| 2025-03-03 | Convex Hull-based Algebraic Constraint for Visual Quadric SLAM | Xiaolong Yu et.al. | 2503.01254 | link |
| 2025-03-04 | Floorplan-SLAM: A Real-Time, High-Accuracy, and Long-Term Multi-Session Point-Plane SLAM for Efficient Floorplan Reconstruction | Haolin Wang et.al. | 2503.00397 | null |
| 2025-03-01 | BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds | Yuto Shibata et.al. | 2503.00389 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
| 2025-04-08 | To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition | Davide Sferrazza et.al. | 2504.06116 | null |
| 2025-04-06 | NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval | Peng Gao et.al. | 2504.04339 | null |
| 2025-04-04 | REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval | Shabnam Choudhury et.al. | 2504.03169 | null |
| 2025-04-06 | Re-thinking Temporal Search for Long-Form Video Understanding | Jinhui Ye et.al. | 2504.02259 | null |
| 2025-04-02 | Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval | Yuji Nozawa et.al. | 2504.01348 | null |
| 2025-04-01 | IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval | Bangwei Liu et.al. | 2504.00954 | null |
| 2025-04-01 | Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data | Yiqun Duan et.al. | 2504.00812 | null |
| 2025-03-31 | CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization | Yingrui Ji et.al. | 2503.24182 | null |
| 2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664 | null |
| 2025-03-30 | Multiview Image-Based Localization | Cameron Fiore et.al. | 2503.23577 | null |
| 2025-03-27 | LOCORE: Image Re-ranking with Long-Context Sequence Modeling | Zilin Xiao et.al. | 2503.21772 | link |
| 2025-03-27 | Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck | Adrian Bulat et.al. | 2503.21757 | null |
| 2025-03-27 | UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation | Yehui Shen et.al. | 2503.21338 | link |
| 2025-03-27 | FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval | Zixu Li et.al. | 2503.21309 | link |
| 2025-03-27 | Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing | Shuai Li et.al. | 2503.21236 | null |
| 2025-03-25 | CoLLM: A Large Language Model for Composed Image Retrieval | Chuong Huynh et.al. | 2503.19910 | link |
| 2025-03-25 | Scene-agnostic Pose Regression for Visual Localization | Junwei Zheng et.al. | 2503.19543 | null |
| 2025-03-25 | From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting | Zhiwei Huang et.al. | 2503.19358 | null |
| 2025-03-25 | Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval | Haoqiang Lin et.al. | 2503.19296 | link |
| 2025-03-23 | LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space | Zhangyu Wang et.al. | 2503.18142 | null |
| 2025-03-23 | Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning | Xiang Fang et.al. | 2503.17938 | null |
| 2025-03-23 | What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images | Dongheng Lin et.al. | 2503.17899 | null |
| 2025-03-22 | good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval | Pranavi Kolouju et.al. | 2503.17871 | null |
| 2025-03-21 | Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval | Yuanmin Tang et.al. | 2503.17109 | null |
| 2025-03-20 | PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval | Qiang Zou et.al. | 2503.16064 | link |
| 2025-03-20 | Automating 3D Dataset Generation with Neural Radiance Fields | P. Schulz et.al. | 2503.15997 | link |
| 2025-03-18 | 3D Densification for Multi-Map Monocular VSLAM in Endoscopy | X. Anadón et.al. | 2503.14346 | null |
| 2025-03-18 | A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios | Huy-Hoang Bui et.al. | 2503.13982 | link |
| 2025-03-17 | Scale Efficient Training for Large Datasets | Qing Zhou et.al. | 2503.13385 | null |
| 2025-03-17 | Multi-Platform Teach-and-Repeat Navigation by Visual Place Recognition Based on Deep-Learned Local Features | Václav Truhlařík et.al. | 2503.13090 | null |
| 2025-03-17 | All You Need to Know About Training Image Retrieval Models | Gabriele Berton et.al. | 2503.13045 | link |
| 2025-03-12 | Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark | Yibin Ye et.al. | 2503.10692 | link |
| 2025-03-13 | ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning | Pengfei Luo et.al. | 2503.10166 | link |
| 2025-03-12 | Revisiting Medical Image Retrieval via Knowledge Consolidation | Yang Nan et.al. | 2503.09370 | null |
| 2025-03-11 | CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition | Dongyue Li et.al. | 2503.08170 | null |
| 2025-03-10 | Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization | Michael Green et.al. | 2503.07038 | null |
| 2025-03-10 | Zero-Shot Hashing Based on Reconstruction With Part Alignment | Yan Jiang et.al. | 2503.07037 | null |
| 2025-03-10 | Improving Visual Place Recognition with Sequence-Matching Receptiveness Prediction | Somayeh Hussaini et.al. | 2503.06840 | null |
| 2025-03-09 | RoboDesign1M: A Large-scale Dataset for Robot Design Understanding | Tri Le et.al. | 2503.06796 | null |
| 2025-03-09 | StructVPR++: Distill Structural and Semantic Knowledge with Weighting Samples for Visual Place Recognition | Yanqing Shen et.al. | 2503.06601 | link |
| 2025-03-09 | TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification | Huaqi Tao et.al. | 2503.06501 | null |
| 2025-03-08 | NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features | Hongjia Zhai et.al. | 2503.06117 | null |
| 2025-03-07 | Data-Efficient Generalization for Zero-shot Composed Image Retrieval | Zining Chen et.al. | 2503.05204 | null |
| 2025-03-06 | RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining | Tengfei Zhang et.al. | 2503.04653 | null |
| 2025-03-06 | Geometry-Constrained Monocular Scale Estimation Using Semantic Segmentation for Dynamic Scenes | Hui Zhang et.al. | 2503.04235 | null |
| 2025-03-06 | Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior | Haitao Wu et.al. | 2503.04207 | null |
| 2025-03-06 | Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments | Beverley Gorry et.al. | 2503.04096 | null |
| 2025-03-04 | TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition | Oliver Grainge et.al. | 2503.02511 | null |
| 2025-03-04 | Continual Multi-Robot Learning from Black-Box Visual Place Recognition Models | Kenta Tsukahara et.al. | 2503.02256 | null |
| 2025-03-03 | Composed Multi-modal Retrieval: A Survey of Approaches and Applications | Kun Zhang et.al. | 2503.01334 | link |
| 2025-03-03 | AirRoom: Objects Matter in Room Reidentification | Runmao Yao et.al. | 2503.01130 | null |
| 2025-03-02 | Efficient End-to-end Visual Localization for Autonomous Driving with Decoupled BEV Neural Matching | Jinyu Miao et.al. | 2503.00862 | null |
| 2025-03-01 | Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning | Songlin Dong et.al. | 2503.00515 | null |
| 2025-02-28 | EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration | Kuangyi Chen et.al. | 2503.00167 | null |
| 2025-02-28 | CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval | Zelong Sun et.al. | 2502.20826 | null |
| 2025-02-28 | SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition | Shanshan Wan et.al. | 2502.20676 | null |
| 2025-02-27 | A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization | Yejun Zhang et.al. | 2502.20036 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-04-29 | Emotion Recognition in Contemporary Dance Performances Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21154 | null |
| 2025-04-29 | Learning a General Model: Folding Clothing with Topological Dynamics | Yiming Liu et.al. | 2504.20720 | null |
| 2025-04-26 | VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation | Niaz Ahmad et.al. | 2504.19032 | null |
| 2025-04-24 | EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy | Haodi Yao et.al. | 2504.17280 | null |
| 2025-04-15 | UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques | Pedro Diaz-Garcia et.al. | 2504.11063 | null |
| 2025-04-15 | Acquisition of high-quality images for camera calibration in robotics applications via speech prompts | Timm Linder et.al. | 2504.11031 | null |
| 2025-04-11 | Stereophotoclinometry Revisited | Travis Driver et.al. | 2504.08252 | null |
| 2025-03-31 | SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection | Yannick Burkhardt et.al. | 2504.00139 | null |
| 2025-03-29 | Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction | Shayan Sepahvand et.al. | 2503.23171 | null |
| 2025-03-25 | Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines | Junle Liu et.al. | 2503.19278 | null |
| 2025-03-05 | Periodontal Bone Loss Analysis via Keypoint Detection With Heuristic Post-Processing | Ryan Banks et.al. | 2503.13477 | null |
| 2025-03-16 | Histogram Transporter: Learning Rotation-Equivariant Orientation Histograms for High-Precision Robotic Kitting | Jiadong Zhou et.al. | 2503.12541 | null |
| 2025-04-12 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-11 | DaD: Distilled Reinforcement Learning for Diverse Keypoint Detection | Johan Edstedt et.al. | 2503.07347 | link |
| 2025-03-07 | Automatic determination of quasicrystalline patterns from microscopy images | Tano Kim Kender et.al. | 2503.05472 | null |
| 2025-03-07 | Spatial regularisation for improved accuracy and interpretability in keypoint-based registration | Benjamin Billot et.al. | 2503.04499 | link |
| 2025-03-04 | A Novel Streamline-based diffusion MRI Tractography Registration Method with Probabilistic Keypoint Detection | Junyi Wang et.al. | 2503.02481 | null |
| 2025-03-01 | Autonomous Dissection in Robotic Cholecystectomy | Ki-Hwan Oh et.al. | 2503.00666 | null |
| 2025-02-28 | CNSv2: Probabilistic Correspondence Encoded Neural Image Servo | Anzhe Chen et.al. | 2503.00132 | null |
| 2025-02-27 | Automatic Temporal Segmentation for Post-Stroke Rehabilitation: A Keypoint Detection and Temporal Segmentation Approach for Small Datasets | Jisoo Lee et.al. | 2502.19766 | null |
| 2025-02-23 | Rewards-based image analysis in microscopy | Kamyar Barakati et.al. | 2502.18522 | null |
| 2025-02-19 | 2.5D U-Net with Depth Reduction for 3D CryoET Object Identification | Yusuke Uchida et.al. | 2502.13484 | link |
| 2025-01-30 | Transfer Learning for Keypoint Detection in Low-Resolution Thermal TUG Test Images | Wei-Lun Chen et.al. | 2501.18453 | null |
| 2025-01-30 | Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models | Bhargav Ghanekar et.al. | 2501.18361 | null |
| 2025-01-30 | Lifelong 3D Mapping Framework for Hand-held & Robot-mounted LiDAR Mapping Systems | Liudi Yang et.al. | 2501.18110 | null |
| 2025-01-21 | Keypoint Detection Empowered Near-Field User Localization and Channel Reconstruction | Mengyuan Li et.al. | 2501.11844 | null |
2025-4
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-05-02 | T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph | Qingyu Xian et.al. | 2505.01207 | null |
| 2025-05-02 | 3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer | Kamel Aouaidjia et.al. | 2505.01003 | null |
| 2025-05-01 | Are Minimal Radial Distortion Solvers Really Necessary for Relative Pose Estimation? | Viktor Kocur et.al. | 2505.00866 | null |
| 2025-05-01 | P2P-Insole: Human Pose Estimation Using Foot Pressure Distribution and Motion Sensors | Atsuya Watanabe et.al. | 2505.00755 | null |
| 2025-05-01 | Dietary Intake Estimation via Continuous 3D Reconstruction of Food | Wallace Lee et.al. | 2505.00606 | null |
| 2025-05-02 | InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | Nguyen Hoang Khoi Tran et.al. | 2505.00512 | null |
| 2025-04-30 | Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling | Stavrow A. Bahnam et.al. | 2504.21695 | null |
| 2025-04-29 | Dance Style Recognition Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21166 | null |
| 2025-04-29 | Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining | Weizhen He et.al. | 2504.20800 | null |
| 2025-04-29 | A Survey on Event-based Optical Marker Systems | Nafiseh Jabbari Tofighi et.al. | 2504.20736 | null |
| 2025-04-29 | Large-scale visual SLAM for in-the-wild videos | Shuo Sun et.al. | 2504.20496 | null |
| 2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
| 2025-05-01 | PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking | Xiatao Sun et.al. | 2504.20359 | null |
| 2025-04-28 | Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM | Leon Davies et.al. | 2504.19654 | null |
| 2025-04-28 | GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM | Leon Davies et.al. | 2504.19653 | null |
| 2025-04-28 | Category-Level and Open-Set Object Pose Estimation for Robotics | Peter Hönig et.al. | 2504.19572 | null |
| 2025-04-25 | Certifiably-Correct Mapping for Safe Navigation Despite Odometry Drift | Devansh R. Agrawal et.al. | 2504.18713 | null |
| 2025-04-25 | SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations | Shuting Zhao et.al. | 2504.18332 | null |
| 2025-04-25 | S3MOT: Monocular 3D Object Tracking with Selective State Space Model | Zhuohao Yan et.al. | 2504.18068 | null |
| 2025-04-22 | SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos | Yuxin Yao et.al. | 2504.17810 | null |
| 2025-04-24 | Dynamic Camera Poses and Where to Find Them | Chris Rockwell et.al. | 2504.17788 | null |
| 2025-04-24 | A Guide to Structureless Visual Localization | Vojtech Panek et.al. | 2504.17636 | null |
| 2025-04-24 | Object Pose Estimation by Camera Arm Control Based on the Next Viewpoint Estimation | Tomoki Mizuno et.al. | 2504.17424 | null |
| 2025-04-24 | Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization | Guangyang Zeng et.al. | 2504.17410 | null |
| 2025-04-23 | WiFi based Human Fall and Activity Recognition using Transformer based Encoder Decoder and Graph Neural Networks | Younggeol Cho et.al. | 2504.16655 | null |
| 2025-04-23 | Assessing the Feasibility of Internet-Sourced Video for Automatic Cattle Lameness Detection | Md Fahimuzzman Sohan et.al. | 2504.16404 | null |
| 2025-04-22 | SignX: The Foundation Model for Sign Recognition | Sen Fang et.al. | 2504.16315 | null |
| 2025-04-22 | GADS: A Super Lightweight Model for Head Pose Estimation | Menan Velayuthan et.al. | 2504.15751 | null |
| 2025-04-21 | Field Report on Ground Penetrating Radar for Localization at the Mars Desert Research Station | Anja Sheppard et.al. | 2504.15455 | null |
| 2025-04-21 | Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation | Yike Zhang et.al. | 2504.15329 | null |
| 2025-04-21 | Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs | Chun-Hsiao Yeh et.al. | 2504.15280 | link |
| 2025-04-21 | Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation | Xiao Zhang et.al. | 2504.15134 | null |
| 2025-04-20 | Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction | Weirong Chen et.al. | 2504.14516 | null |
| 2025-04-20 | SG-Reg: Generalizable and Efficient Scene Graph Registration | Chuhao Liu et.al. | 2504.14440 | link |
| 2025-04-18 | Imitation Learning with Precisely Labeled Human Demonstrations | Yilong Song et.al. | 2504.13803 | null |
| 2025-04-18 | Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction | Wenyu Li et.al. | 2504.13419 | null |
| 2025-04-17 | ViTa-Zero: Zero-shot Visuotactile Object 6D Pose Estimation | Hongyu Li et.al. | 2504.13179 | null |
| 2025-04-18 | ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos | Zetong Zhang et.al. | 2504.13167 | null |
| 2025-04-17 | Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global Transforms | Jingjing Liu et.al. | 2504.12699 | null |
| 2025-04-16 | MobilePoser: Real-Time Full-Body Pose Estimation and 3D Human Translation from IMUs in Mobile Consumer Devices | Vasco Xu et.al. | 2504.12492 | link |
| 2025-04-16 | Diffusion Based Robust LiDAR Place Recognition | Benjamin Krummenacher et.al. | 2504.12412 | null |
| 2025-04-16 | Regist3R: Incremental Registration with Stereo Foundation Model | Sidun Liu et.al. | 2504.12356 | null |
| 2025-04-16 | CoMotion: Concurrent Multi-person 3D Motion | Alejandro Newell et.al. | 2504.12186 | link |
| 2025-04-16 | No Fuss, Just Function – A Proposal for Non-Intrusive Full Body Tracking in XR for Meaningful Spatial Interactions | Elisabeth Mayer et.al. | 2504.11987 | null |
| 2025-04-16 | An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World | Xingwu Ji et.al. | 2504.11698 | link |
| 2025-04-17 | CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image | Jingshun Huang et.al. | 2504.11230 | null |
| 2025-04-15 | DMAGaze: Gaze Estimation Based on Feature Disentanglement and Multi-Scale Attention | Haohan Chen et.al. | 2504.11160 | null |
| 2025-04-14 | MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model | Jian Liu et.al. | 2504.10433 | null |
| 2025-04-14 | Benchmarking 3D Human Pose Estimation Models Under Occlusions | Filipa Lino et.al. | 2504.10350 | null |
| 2025-04-15 | Differentially Private 2D Human Pose Estimation | Kaushik Bhargav Sivangi et.al. | 2504.10190 | null |
| 2025-04-14 | TT3D: Table Tennis 3D Reconstruction | Thomas Gossard et.al. | 2504.10035 | null |
| 2025-04-14 | Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations | Katja Ludwig et.al. | 2504.09953 | null |
| 2025-04-14 | NeRF-Based Transparent Object Grasping Enhanced by Shape Priors | Yi Han et.al. | 2504.09868 | null |
| 2025-04-13 | EasyREG: Easy Depth-Based Markerless Registration and Tracking using Augmented Reality Device for Surgical Guidance | Yue Yang et.al. | 2504.09498 | null |
| 2025-04-12 | SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow | Qingyuan Wang et.al. | 2504.09160 | null |
| 2025-04-12 | A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds | Jizong Peng et.al. | 2504.09129 | null |
| 2025-04-12 | BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting | Jeongwan On et.al. | 2504.09097 | null |
| 2025-04-11 | The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation | Masashi Hatano et.al. | 2504.08654 | null |
| 2025-04-11 | MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction | Ian Noronha et.al. | 2504.08646 | null |
| 2025-04-11 | Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review | Claudio Cimarelli et.al. | 2504.08588 | null |
| 2025-04-11 | Multi-person Physics-based Pose Estimation for Combat Sports | Hossein Feiz et.al. | 2504.08175 | null |
| 2025-04-10 | Towards Unconstrained 2D Pose Estimation of the Human Spine | Muhammad Saif Ullah Khan et.al. | 2504.08110 | null |
| 2025-04-10 | BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation | Yuanhong Yu et.al. | 2504.07955 | null |
| 2025-04-09 | DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates | Akash Jadhav et.al. | 2504.07335 | null |
| 2025-04-09 | Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation | Yu Qi et.al. | 2504.06961 | null |
| 2025-04-09 | GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered Scenes | Seunghyeok Back et.al. | 2504.06866 | link |
| 2025-04-09 | Setup-Invariant Augmented Reality for Teaching by Demonstration with Surgical Robots | Alexandre Banks et.al. | 2504.06677 | link |
| 2025-04-09 | HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network | Hu Cui et.al. | 2504.06638 | null |
| 2025-04-08 | Leveraging Synthetic Adult Datasets for Unsupervised Infant Pose Estimation | Sarosij Bose et.al. | 2504.05789 | null |
| 2025-04-08 | SAP-CoPE: Social-Aware Planning using Cooperative Pose Estimation with Infrastructure Sensor Nodes | Minghao Ning et.al. | 2504.05727 | link |
| 2025-04-08 | POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction | Songyan Zhang et.al. | 2504.05692 | link |
| 2025-04-10 | Learning Affine Correspondences by Integrating Geometric Constraints | Pengju Sun et.al. | 2504.04834 | link |
| 2025-04-10 | A Convex and Global Solution for the P $n$ P Problem in 2D Forward-Looking Sonar | Jiayi Su et.al. | 2504.04445 | null |
| 2025-04-05 | 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS | Zhisheng Huang et.al. | 2504.04294 | null |
| 2025-04-02 | A Geometric Approach For Pose and Velocity Estimation Using IMU and Inertial/Body-Frame Measurements | Sifeddine Benahmed et.al. | 2504.03764 | null |
| 2025-04-04 | Robust Human Registration with Body Part Segmentation on Noisy Point Clouds | Kai Lascheit et.al. | 2504.03602 | null |
| 2025-04-04 | Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video | Jiaxin Guo et.al. | 2504.03198 | null |
| 2025-04-03 | Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks | Hyun-Ho Choi et.al. | 2504.03052 | null |
| 2025-04-03 | BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation | Van Nguyen Nguyen et.al. | 2504.02812 | null |
| 2025-04-03 | PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation | Lihua Liu et.al. | 2504.02617 | null |
| 2025-04-02 | Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation | Mingrui Ye et.al. | 2504.01764 | link |
| 2025-04-02 | ForestVO: Enhancing Visual Odometry in Forest Environments through ForestGlue | Thomas Pritchard et.al. | 2504.01261 | link |
| 2025-04-01 | AP-CAP: Advancing High-Quality Data Synthesis for Animal Pose Estimation via a Controllable Image Generation Pipeline | Lei Wang et.al. | 2504.00394 | null |
| 2025-03-31 | Easi3R: Estimating Disentangled Motion from DUSt3R Without Training | Xingyu Chen et.al. | 2503.24391 | link |
| 2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664 | null |
| 2025-03-30 | PhysPose: Refining 6D Object Poses with Physical Constraints | Martin Malenický et.al. | 2503.23587 | null |
| 2025-03-30 | Improving Indoor Localization Accuracy by Using an Efficient Implicit Neural Map Representation | Haofei Kuang et.al. | 2503.23480 | link |
| 2025-03-30 | SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation | Pranjal Paul et.al. | 2503.23465 | null |
| 2025-03-30 | HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation | Hongwei Zheng et.al. | 2503.23331 | null |
| 2025-03-29 | Incorporating GNSS Information with LIDAR-Inertial Odometry for Accurate Land-Vehicle Localization | Jintao Cheng et.al. | 2503.23199 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-05-16 | Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing | Mathis Jürgen Adler et.al. | 2505.11121 | null |
| 2025-05-04 | OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery | Chongsheng Zhang et.al. | 2505.03836 | link |
| 2025-05-06 | Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions | Lukas Schichler et.al. | 2505.03565 | null |
| 2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422 | link |
| 2025-05-06 | Seeing the Abstract: Translating the Abstract Language for Vision Language Models | Davide Talon et.al. | 2505.03242 | link |
| 2025-05-13 | SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2505.01956 | null |
| 2025-05-02 | NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization | Xun Li et.al. | 2505.01113 | null |
| 2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
| 2025-04-25 | From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval | Yabing Wang et.al. | 2504.17990 | null |
| 2025-04-24 | A Guide to Structureless Visual Localization | Vojtech Panek et.al. | 2504.17636 | null |
| 2025-04-23 | Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval | Xin Jiang et.al. | 2504.16691 | null |
| 2025-04-22 | Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs | Merve Cerit et.al. | 2504.16323 | link |
| 2025-04-19 | A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling | Kyle Buettner et.al. | 2504.14359 | null |
| 2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
| 2025-04-16 | Generalized Visual Relation Detection with Diffusion Models | Kaifeng Gao et.al. | 2504.12100 | null |
| 2025-04-15 | Visual Re-Ranking with Non-Visual Side Information | Gustav Hanning et.al. | 2504.11134 | link |
| 2025-04-15 | TMCIR: Token Merge Benefits Composed Image Retrieval | Chaoyang Wang et.al. | 2504.10995 | null |
| 2025-04-14 | Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition | Changwei Wang et.al. | 2504.09881 | null |
| 2025-04-12 | Evolved Hierarchical Masking for Self-Supervised Learning | Zhanzhou Feng et.al. | 2504.09155 | null |
| 2025-04-11 | HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields | Asterios Reppas et.al. | 2504.08901 | null |
| 2025-04-11 | Hypergraph Vision Transformers: Images are More than Nodes, More than Edges | Joshua Fixelle et.al. | 2504.08710 | null |
| 2025-04-11 | FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations | Cheng-Yu Hsieh et.al. | 2504.08368 | null |
| 2025-04-10 | Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval | Zehong Ma et.al. | 2504.07718 | null |
| 2025-04-09 | A Pointcloud Registration Framework for Relocalization in Subterranean Environments | David Akhihiero et.al. | 2504.07231 | null |
| 2025-04-09 | Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception | Ruotian Peng et.al. | 2504.06666 | null |
| 2025-04-08 | To Match or Not to Match: Revisiting Image Matching for Reliable Visual Place Recognition | Davide Sferrazza et.al. | 2504.06116 | null |
| 2025-04-06 | NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval | Peng Gao et.al. | 2504.04339 | null |
| 2025-04-04 | REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval | Shabnam Choudhury et.al. | 2504.03169 | null |
| 2025-04-06 | Re-thinking Temporal Search for Long-Form Video Understanding | Jinhui Ye et.al. | 2504.02259 | link |
| 2025-04-02 | Prompt-Guided Attention Head Selection for Focus-Oriented Image Retrieval | Yuji Nozawa et.al. | 2504.01348 | null |
| 2025-04-01 | IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval | Bangwei Liu et.al. | 2504.00954 | null |
| 2025-04-01 | Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data | Yiqun Duan et.al. | 2504.00812 | null |
| 2025-03-31 | CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization | Yingrui Ji et.al. | 2503.24182 | null |
| 2025-03-31 | LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds | Masahiko Tsuji et.al. | 2503.23664 | null |
| 2025-03-30 | Multiview Image-Based Localization | Cameron Fiore et.al. | 2503.23577 | null |
| 2025-03-27 | LOCORE: Image Re-ranking with Long-Context Sequence Modeling | Zilin Xiao et.al. | 2503.21772 | link |
| 2025-03-27 | Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck | Adrian Bulat et.al. | 2503.21757 | null |
| 2025-03-27 | UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation | Yehui Shen et.al. | 2503.21338 | link |
| 2025-03-27 | FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval | Zixu Li et.al. | 2503.21309 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-05-16 | Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation | Massimiliano Cassia et.al. | 2505.11110 | null |
| 2025-05-12 | RDD: Robust Feature Detector and Descriptor using Deformable Transformer | Gonglin Chen et.al. | 2505.08013 | null |
| 2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306 | null |
| 2025-05-09 | My Emotion on your face: The use of Facial Keypoint Detection to preserve Emotions in Latent Space Editing | Jingrui He et.al. | 2505.06436 | null |
| 2025-05-05 | Unsupervised training of keypoint-agnostic descriptors for flexible retinal image registration | David Rivas-Villar et.al. | 2505.02787 | null |
| 2025-05-05 | Unsupervised Deep Learning-based Keypoint Localization Estimating Descriptor Matching Performance | David Rivas-Villar et.al. | 2505.02779 | null |
| 2025-05-04 | Focus What Matters: Matchability-Based Reweighting for Local Feature Matching | Dongyue Li et.al. | 2505.02161 | null |
| 2025-05-04 | Enhancing Lidar Point Cloud Sampling via Colorization and Super-Resolution of Lidar Imagery | Sier Ha et.al. | 2505.02049 | null |
| 2025-04-29 | Emotion Recognition in Contemporary Dance Performances Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21154 | null |
| 2025-04-29 | Learning a General Model: Folding Clothing with Topological Dynamics | Yiming Liu et.al. | 2504.20720 | null |
| 2025-04-26 | VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation | Niaz Ahmad et.al. | 2504.19032 | null |
| 2025-04-24 | EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy | Haodi Yao et.al. | 2504.17280 | null |
| 2025-04-15 | UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques | Pedro Diaz-Garcia et.al. | 2504.11063 | null |
| 2025-04-15 | Acquisition of high-quality images for camera calibration in robotics applications via speech prompts | Timm Linder et.al. | 2504.11031 | null |
| 2025-04-11 | Stereophotoclinometry Revisited | Travis Driver et.al. | 2504.08252 | null |
| 2025-03-31 | SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection | Yannick Burkhardt et.al. | 2504.00139 | null |
| 2025-03-29 | Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction | Shayan Sepahvand et.al. | 2503.23171 | null |
| 2025-03-25 | Multiscale Feature Importance-based Bit Allocation for End-to-End Feature Coding for Machines | Junle Liu et.al. | 2503.19278 | null |
| 2025-03-16 | Histogram Transporter: Learning Rotation-Equivariant Orientation Histograms for High-Precision Robotic Kitting | Jiadong Zhou et.al. | 2503.12541 | null |
| 2025-04-12 | Keypoint Detection and Description for Raw Bayer Images | Jiakai Lin et.al. | 2503.08673 | null |
| 2025-03-10 | REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding | Yan Tai et.al. | 2503.07413 | link |
| 2025-03-11 | DaD: Distilled Reinforcement Learning for Diverse Keypoint Detection | Johan Edstedt et.al. | 2503.07347 | link |
| 2025-03-07 | Automatic determination of quasicrystalline patterns from microscopy images | Tano Kim Kender et.al. | 2503.05472 | null |
| 2025-03-07 | Spatial regularisation for improved accuracy and interpretability in keypoint-based registration | Benjamin Billot et.al. | 2503.04499 | link |
2025-5
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-06-04 | Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation | Tianyu Huang et.al. | 2506.04225 | null |
| 2025-06-04 | Accelerating SfM-based Pose Estimation with Dominating Set | Joji Joseph et.al. | 2506.03667 | null |
| 2025-06-03 | Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | Mingjie Wei et.al. | 2506.02853 | null |
| 2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736 | link |
| 2025-06-02 | Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction | Samuel Li et.al. | 2506.02265 | null |
| 2025-06-02 | E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models | Wenyan Cong et.al. | 2506.01933 | null |
| 2025-06-02 | SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation | Sang-Eun Lee et.al. | 2506.01691 | null |
| 2025-06-01 | TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction | Yiyao Huang et.al. | 2506.00953 | null |
| 2025-05-31 | XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | Junwen Huang et.al. | 2506.00599 | null |
| 2025-05-30 | Lazy Heuristic Search for Solving POMDPs with Expensive-to-Compute Belief Transitions | Muhammad Suhail Saleem et.al. | 2506.00285 | null |
| 2025-05-30 | 6D Pose Estimation on Point Cloud Data through Prior Knowledge Integration: A Case Study in Autonomous Disassembly | Chengzhi Wu et.al. | 2505.24669 | null |
| 2025-05-30 | Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data | Marios Glytsos et.al. | 2505.24636 | null |
| 2025-05-30 | PCIE_Pose Solution for EgoExo4D Pose and Proficiency Estimation Challenge | Feng Chen et.al. | 2505.24411 | null |
| 2025-05-29 | Pose-free 3D Gaussian splatting via shape-ray estimation | Youngju Na et.al. | 2505.22978 | null |
| 2025-05-28 | TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Dynamic Objects | Wen Yang et.al. | 2505.22882 | null |
| 2025-05-28 | 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians | Hidenobu Matsuki et.al. | 2505.22859 | null |
| 2025-05-28 | MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism | Yanyi Qu et.al. | 2505.22555 | null |
| 2025-05-28 | Event-based Egocentric Human Pose Estimation in Dynamic Environment | Wataru Ikeda et.al. | 2505.22007 | null |
| 2025-05-27 | Spectral Compression Transformer with Line Pose Graph for Monocular 3D Human Pose Estimation | Zenghao Zheng et.al. | 2505.21309 | null |
| 2025-05-29 | ReassembleNet: Learnable Keypoints and Diffusion for 2D Fresco Reconstruction | Adeela Islam et.al. | 2505.21117 | null |
| 2025-05-27 | HS-SLAM: A Fast and Hybrid Strategy-Based SLAM Approach for Low-Speed Autonomous Driving | Bingxiang Kang et.al. | 2505.20906 | null |
| 2025-05-27 | Mamba-Driven Topology Fusion for Monocular 3-D Human Pose Estimation | Zenghao Zheng et.al. | 2505.20611 | null |
| 2025-05-28 | HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval | Matthew Hong et.al. | 2505.20455 | null |
| 2025-05-25 | Learning the Contact Manifold for Accurate Pose Estimation During Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2505.19215 | null |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | null |
| 2025-05-24 | An Inertial Sequence Learning Framework for Vehicle Speed Estimation via Smartphone IMU | Xuan Xiao et.al. | 2505.18490 | null |
| 2025-05-23 | Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance | Jack Goffinet et.al. | 2505.18342 | null |
| 2025-05-23 | To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models | Simone Gaisbauer et.al. | 2505.17973 | null |
| 2025-05-23 | Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | Ming Hu et.al. | 2505.17677 | null |
| 2025-05-23 | PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation | Uyoung Jeong et.al. | 2505.17475 | link |
| 2025-05-22 | Towards Texture- And Shape-Independent 3D Keypoint Estimation in Birds | Valentin Schmuker et.al. | 2505.16633 | null |
| 2025-05-22 | GMatch: Geometry-Constrained Feature Matching for RGB-D Object Pose Estimation | Ming Yang et.al. | 2505.16144 | null |
| 2025-05-21 | Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation | Yihang Li et.al. | 2505.15098 | null |
| 2025-05-20 | UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction | Nisarga Nilavadi et.al. | 2505.14866 | null |
| 2025-05-19 | Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos | Ruoyu Wang et.al. | 2505.13440 | link |
| 2025-05-19 | KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture | R. James Cotton et.al. | 2505.13436 | null |
| 2025-05-19 | The Way Up: A Dataset for Hold Usage Detection in Sport Climbing | Anna Maschek et.al. | 2505.12854 | null |
| 2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130 | null |
| 2025-05-17 | Black-box Adversaries from Latent Space: Unnoticeable Attacks on Human Pose and Shape Estimation | Zhiying Li et.al. | 2505.12009 | null |
| 2025-05-17 | ElderFallGuard: Real-Time IoT and Computer Vision-Based Fall Detection System for Elderly Safety | Tasrifur Riahi et.al. | 2505.11845 | null |
| 2025-05-16 | SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision | Utsav Rai et.al. | 2505.11439 | null |
| 2025-05-16 | MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection | Shrutarv Awasthi et.al. | 2505.11282 | null |
| 2025-05-16 | PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation | Saad Manzur et.al. | 2505.10888 | link |
| 2025-05-16 | RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects | Jaeguk Kim et.al. | 2505.10841 | null |
| 2025-05-14 | UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units | Huakun Liu et.al. | 2505.09393 | link |
| 2025-05-14 | APR-Transformer: Initial Pose Estimation for Localization in Complex Environments through Absolute Pose Regression | Srinivas Ravuri et.al. | 2505.09356 | link |
| 2025-05-13 | Real-time Capable Learning-based Visual Tool Pose Correction via Differentiable Simulation | Shuyuan Yang et.al. | 2505.08875 | null |
| 2025-05-12 | Sleep Position Classification using Transfer Learning for Bed-based Pressure Sensors | Olivier Papillon et.al. | 2505.08111 | null |
| 2025-05-07 | Pose Estimation for Intra-cardiac Echocardiography Catheter via AI-Based Anatomical Understanding | Jaeyoung Huh et.al. | 2505.07851 | null |
| 2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306 | null |
| 2025-05-13 | Human Motion Prediction via Test-domain-aware Adaptation with Easily-available Human Motions Estimated from Videos | Katsuki Shimbo et.al. | 2505.07301 | null |
| 2025-05-12 | When Dance Video Archives Challenge Computer Vision | Philippe Colantoni et.al. | 2505.07249 | null |
| 2025-05-10 | CompSLAM: Complementary Hierarchical Multi-Modal Localization and Mapping for Robot Autonomy in Underground Environments | Shehryar Khattak et.al. | 2505.06483 | null |
| 2025-05-09 | Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach | Tim Schneider et.al. | 2505.06182 | null |
| 2025-05-08 | Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors | Zunjie Zhu et.al. | 2505.05336 | null |
| 2025-05-08 | Improving Global Motion Estimation in Sparse IMU-based Motion Capture with Physics | Xinyu Yi et.al. | 2505.05010 | null |
| 2025-05-08 | An Efficient Method for Accurate Pose Estimation and Error Correction of Cuboidal Objects | Utsav Rai et.al. | 2505.04962 | null |
| 2025-05-07 | Comparison of Visual Trackers for Biomechanical Analysis of Running | Luis F. Gomez et.al. | 2505.04713 | null |
| 2025-05-07 | Do We Still Need to Work on Odometry for Autonomous Driving? | Cedric Le Gentil et.al. | 2505.04438 | null |
| 2025-05-07 | HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation | Yajie Fu et.al. | 2505.04276 | link |
| 2025-05-07 | One2Any: One-Reference 6D Pose Estimation for Any Object | Mengya Liu et.al. | 2505.04109 | null |
| 2025-05-06 | Polar Coordinate-Based 2D Pose Prior with Neural Distance Field | Qi Gan et.al. | 2505.03445 | null |
| 2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422 | link |
| 2025-05-06 | Artificial Behavior Intelligence: Technology, Challenges, and Future Directions | Kanghyun Jo et.al. | 2505.03315 | null |
| 2025-05-05 | Dance of Fireworks: An Interactive Broadcast Gymnastics Training System Based on Pose Estimation | Haotian Chen et.al. | 2505.02690 | null |
| 2025-05-05 | Corr2Distrib: Making Ambiguous Correspondences an Ally to Predict Reliable 6D Pose Distributions | Asma Brazi et.al. | 2505.02501 | null |
| 2025-05-05 | Finger Pose Estimation for Under-screen Fingerprint Sensor | Xiongjun Guan et.al. | 2505.02481 | link |
| 2025-05-05 | 6D Pose Estimation on Spoons and Hands | Kevin Tan et.al. | 2505.02335 | null |
| 2025-05-04 | Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation | Shipeng Liu et.al. | 2505.02287 | null |
| 2025-05-04 | A Birotation Solution for Relative Pose Problems | Hongbo Zhao et.al. | 2505.02025 | null |
| 2025-05-03 | Near-field 5D Pose Estimation using Reconfigurable Intelligent Surfaces | Srikar Sharma Sadhu et.al. | 2505.01829 | null |
| 2025-05-03 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | Junhao Shi et.al. | 2505.01799 | null |
| 2025-05-03 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | Bu Jin et.al. | 2505.01729 | null |
| 2025-05-02 | T-Graph: Enhancing Sparse-view Camera Pose Estimation by Pairwise Translation Graph | Qingyu Xian et.al. | 2505.01207 | null |
| 2025-05-02 | 3D Human Pose Estimation via Spatial Graph Order Attention and Temporal Body Aware Transformer | Kamel Aouaidjia et.al. | 2505.01003 | null |
| 2025-05-01 | Are Minimal Radial Distortion Solvers Really Necessary for Relative Pose Estimation? | Viktor Kocur et.al. | 2505.00866 | null |
| 2025-05-01 | P2P-Insole: Human Pose Estimation Using Foot Pressure Distribution and Motion Sensors | Atsuya Watanabe et.al. | 2505.00755 | null |
| 2025-05-01 | Dietary Intake Estimation via Continuous 3D Reconstruction of Food | Wallace Lee et.al. | 2505.00606 | null |
| 2025-05-02 | InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method | Nguyen Hoang Khoi Tran et.al. | 2505.00512 | null |
| 2025-04-30 | Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling | Stavrow A. Bahnam et.al. | 2504.21695 | null |
| 2025-04-29 | Dance Style Recognition Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21166 | null |
| 2025-04-29 | Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining | Weizhen He et.al. | 2504.20800 | null |
| 2025-04-29 | A Survey on Event-based Optical Marker Systems | Nafiseh Jabbari Tofighi et.al. | 2504.20736 | null |
| 2025-04-29 | Large-scale visual SLAM for in-the-wild videos | Shuo Sun et.al. | 2504.20496 | null |
| 2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
| 2025-05-01 | PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking | Xiatao Sun et.al. | 2504.20359 | null |
| 2025-04-28 | Transformation & Translation Occupancy Grid Mapping: 2-Dimensional Deep Learning Refined SLAM | Leon Davies et.al. | 2504.19654 | null |
| 2025-04-28 | GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM | Leon Davies et.al. | 2504.19653 | null |
| 2025-04-28 | Category-Level and Open-Set Object Pose Estimation for Robotics | Peter Hönig et.al. | 2504.19572 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-06-10 | Robust Visual Localization via Semantic-Guided Multi-Scale Transformer | Zhongtao Tian et.al. | 2506.08526 | null |
| 2025-06-08 | Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs | Yikun Ji et.al. | 2506.07045 | null |
| 2025-06-07 | Zero Shot Composed Image Retrieval | Santhosh Kakarla et.al. | 2506.06602 | null |
| 2025-06-06 | GenIR: Generative Visual Feedback for Mental Image Retrieval | Diji Yang et.al. | 2506.06220 | null |
| 2025-06-06 | Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning | Sheng Chen et.al. | 2506.06205 | null |
| 2025-06-05 | HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition | Suhan Woo et.al. | 2506.04764 | null |
| 2025-06-05 | Deep Learning Reforms Image Matching: A Survey and Outlook | Shihua Zhang et.al. | 2506.04619 | null |
| 2025-06-02 | Entity Image and Mixed-Modal Image Retrieval Datasets | Cristian-Ioan Blaga et.al. | 2506.02291 | null |
| 2025-06-01 | Quantization-based Bounds on the Wasserstein Metric | Jonathan Bobrutsky et.al. | 2506.00976 | null |
| 2025-05-30 | SORCE: Small Object Retrieval in Complex Environments | Chunxu Liu et.al. | 2505.24441 | link |
| 2025-05-29 | Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch | Aneeshan Sain et.al. | 2505.23763 | null |
| 2025-05-28 | 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians | Hidenobu Matsuki et.al. | 2505.22859 | null |
| 2025-05-28 | UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images | Junhuan Liu et.al. | 2505.22098 | null |
| 2025-05-28 | Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule | San Jiang et.al. | 2505.22089 | null |
| 2025-05-27 | QuARI: Query Adaptive Retrieval Improvement | Eric Xing et.al. | 2505.21647 | null |
| 2025-05-27 | ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval | Eric Xing et.al. | 2505.20764 | null |
| 2025-05-26 | Visualized Text-to-Image Retrieval | Di Wu et.al. | 2505.20291 | link |
| 2025-05-26 | Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19952 | null |
| 2025-05-26 | Can Visual Encoder Learn to See Arrows? | Naoyuki Terashita et.al. | 2505.19944 | null |
| 2025-05-26 | MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval | Rong-Cheng Tu et.al. | 2505.19707 | null |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | null |
| 2025-05-24 | TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP | Yuliang Cai et.al. | 2505.18434 | null |
| 2025-05-23 | ImLPR: Image-based LiDAR Place Recognition using Vision Foundation Models | Minwoo Jung et.al. | 2505.18364 | null |
| 2025-05-23 | DART $^3$ : Leveraging Distance for Test Time Adaptation in Person Re-Identification | Rajarshi Bhattacharya et.al. | 2505.18337 | null |
| 2025-05-23 | To Glue or Not to Glue? Classical vs Learned Image Matching for Mobile Mapping Cameras to Textured Semantic 3D Building Models | Simone Gaisbauer et.al. | 2505.17973 | null |
| 2025-05-23 | DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval | Yuxin Yang et.al. | 2505.17796 | null |
| 2025-05-22 | TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition | Oliver Grainge et.al. | 2505.16447 | null |
| 2025-05-21 | Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval | Siting Li et.al. | 2505.15877 | null |
| 2025-05-21 | SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval | Nikolaos Chaidos et.al. | 2505.15867 | link |
| 2025-05-20 | Multimodal RAG-driven Anomaly Detection and Classification in Laser Powder Bed Fusion using Large Language Models | Kiarash Naghavi Khanghah et.al. | 2505.13828 | null |
| 2025-05-18 | MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark | Yiwei Ou et.al. | 2505.12254 | null |
| 2025-05-16 | Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization | Aaron Wilhelm et.al. | 2505.11620 | null |
| 2025-05-16 | Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing | Mathis Jürgen Adler et.al. | 2505.11121 | null |
| 2025-05-04 | OBD-Finder: Explainable Coarse-to-Fine Text-Centric Oracle Bone Duplicates Discovery | Chongsheng Zhang et.al. | 2505.03836 | link |
| 2025-05-06 | Thermal-LiDAR Fusion for Robust Tunnel Localization in GNSS-Denied and Low-Visibility Conditions | Lukas Schichler et.al. | 2505.03565 | null |
| 2025-05-06 | LiftFeat: 3D Geometry-Aware Local Feature Matching | Yepeng Liu et.al. | 2505.03422 | link |
| 2025-05-06 | Seeing the Abstract: Translating the Abstract Language for Vision Language Models | Davide Talon et.al. | 2505.03242 | link |
| 2025-05-13 | SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment | Ganesh Sapkota et.al. | 2505.01956 | null |
| 2025-05-02 | NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization | Xun Li et.al. | 2505.01113 | null |
| 2025-05-01 | GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting | Jongwon Lee et.al. | 2504.20379 | null |
| 2025-04-25 | From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval | Yabing Wang et.al. | 2504.17990 | null |
| 2025-04-24 | A Guide to Structureless Visual Localization | Vojtech Panek et.al. | 2504.17636 | null |
| 2025-04-23 | Rethinking Vision Transformer for Large-Scale Fine-Grained Image Retrieval | Xin Jiang et.al. | 2504.16691 | null |
| 2025-04-22 | Media Content Atlas: A Pipeline to Explore and Investigate Multidimensional Media Space using Multimodal LLMs | Merve Cerit et.al. | 2504.16323 | link |
| 2025-04-19 | A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling | Kyle Buettner et.al. | 2504.14359 | null |
| 2025-04-17 | SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs | Haoxuan Li et.al. | 2504.13172 | null |
| 2025-04-16 | Generalized Visual Relation Detection with Diffusion Models | Kaifeng Gao et.al. | 2504.12100 | null |
| 2025-04-15 | Visual Re-Ranking with Non-Visual Side Information | Gustav Hanning et.al. | 2504.11134 | link |
| 2025-04-15 | TMCIR: Token Merge Benefits Composed Image Retrieval | Chaoyang Wang et.al. | 2504.10995 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-15 | KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jie Yang et.al. | 2507.11102 | null |
| 2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077 | null |
| 2025-07-14 | FPC-Net: Revisiting SuperPoint with Descriptor-Free Keypoint Detection via Feature Pyramids and Consistency-Based Implicit Matching | Ionuţ Grigore et.al. | 2507.10770 | null |
| 2025-07-11 | Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection | Subhajit Maity et.al. | 2507.07994 | null |
| 2025-07-09 | Reading a Ruler in the Wild | Yimu Pan et.al. | 2507.07077 | null |
| 2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662 | null |
| 2025-06-27 | MatChA: Cross-Algorithm Matching with Feature Augmentation | Paula Carbó Cubero et.al. | 2506.22336 | null |
| 2025-06-27 | SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images | Naftaly Wambugu et.al. | 2506.21945 | null |
| 2025-05-29 | TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning | Ron Shapira Weber et.al. | 2505.23475 | link |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | link |
| 2025-05-18 | SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving | Muleilan Pei et.al. | 2505.12246 | null |
| 2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130 | null |
| 2025-05-16 | Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation | Massimiliano Cassia et.al. | 2505.11110 | null |
| 2025-06-19 | RDD: Robust Feature Detector and Descriptor using Deformable Transformer | Gonglin Chen et.al. | 2505.08013 | null |
| 2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306 | null |
| 2025-05-09 | My Emotion on your face: The use of Facial Keypoint Detection to preserve Emotions in Latent Space Editing | Jingrui He et.al. | 2505.06436 | null |
| 2025-05-05 | Unsupervised training of keypoint-agnostic descriptors for flexible retinal image registration | David Rivas-Villar et.al. | 2505.02787 | null |
| 2025-05-05 | Unsupervised Deep Learning-based Keypoint Localization Estimating Descriptor Matching Performance | David Rivas-Villar et.al. | 2505.02779 | null |
| 2025-05-04 | Focus What Matters: Matchability-Based Reweighting for Local Feature Matching | Dongyue Li et.al. | 2505.02161 | null |
| 2025-05-04 | Enhancing Lidar Point Cloud Sampling via Colorization and Super-Resolution of Lidar Imagery | Sier Ha et.al. | 2505.02049 | null |
| 2025-04-29 | Emotion Recognition in Contemporary Dance Performances Using Laban Movement Analysis | Muhammad Turab et.al. | 2504.21154 | null |
| 2025-04-29 | Learning a General Model: Folding Clothing with Topological Dynamics | Yiming Liu et.al. | 2504.20720 | null |
| 2025-04-26 | VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation | Niaz Ahmad et.al. | 2504.19032 | null |
| 2025-04-24 | EdgePoint2: Compact Descriptors for Superior Efficiency and Accuracy | Haodi Yao et.al. | 2504.17280 | null |
| 2025-04-15 | UKDM: Underwater keypoint detection and matching using underwater image enhancement techniques | Pedro Diaz-Garcia et.al. | 2504.11063 | null |
| 2025-04-15 | Acquisition of high-quality images for camera calibration in robotics applications via speech prompts | Timm Linder et.al. | 2504.11031 | null |
2025-6
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-03 | Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning | Buzhen Huang et.al. | 2507.02565 | null |
| 2025-07-03 | IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning | Abiam Remache González et.al. | 2507.02519 | null |
| 2025-07-03 | 3D Heart Reconstruction from Sparse Pose-agnostic 2D Echocardiographic Slices | Zhurong Chen et.al. | 2507.02411 | null |
| 2025-07-03 | LMPNet for Weakly-supervised Keypoint Discovery | Pei Guo et.al. | 2507.02308 | null |
| 2025-07-02 | What does really matter in image goal navigation? | Gianluca Monaci et.al. | 2507.01667 | null |
| 2025-07-01 | 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration | Kathy Zhuang et.al. | 2507.01206 | null |
| 2025-07-01 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | Hao Xing et.al. | 2507.00752 | null |
| 2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659 | null |
| 2025-06-30 | Computer Vision for Objects used in Group Work: Challenges and Opportunities | Changsoo Jung et.al. | 2507.00224 | null |
| 2025-06-30 | Validation of AI-Based 3D Human Pose Estimation in a Cyber-Physical Environment | Lisa Marie Otto et.al. | 2506.23739 | null |
| 2025-06-30 | MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-based Multi-Robot Relative Localization in Large Indoor Environments | Sai Krishna Ghanta et.al. | 2506.23514 | null |
| 2025-06-29 | TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints | Zhen Tan et.al. | 2506.23207 | null |
| 2025-06-28 | Deterministic Object Pose Confidence Region Estimation | Jinghao Wang et.al. | 2506.22720 | null |
| 2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116 | null |
| 2025-06-27 | Single-Scanline Relative Pose Estimation for Rolling Shutter Cameras | Petr Hruby et.al. | 2506.22069 | null |
| 2025-06-24 | ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes | Chenhao Zhang et.al. | 2506.21629 | null |
| 2025-06-26 | EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting | Taoyu Wu et.al. | 2506.21420 | null |
| 2025-06-26 | CURL-SLAM: Continuous and Compact LiDAR Mapping | Kaicheng Zhang et.al. | 2506.21077 | null |
| 2025-06-27 | DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation | Wenzhou Lyu et.al. | 2506.21034 | null |
| 2025-06-25 | How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? | Stephanie Käs et.al. | 2506.20795 | null |
| 2025-06-26 | Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception | Eric C. Joyce et.al. | 2506.20045 | null |
| 2025-06-24 | Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images | Stephanie Käs et.al. | 2506.19747 | null |
| 2025-06-23 | RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base | Kuanning Wang et.al. | 2506.18856 | null |
| 2025-06-19 | Reproducible Evaluation of Camera Auto-Exposure Methods in the Field: Platform, Benchmark and Lessons Learned | Olivier Gamache et.al. | 2506.18844 | null |
| 2025-06-23 | SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives | Yizhou Chen et.al. | 2506.18825 | null |
| 2025-06-20 | RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking | Teng Guo et.al. | 2506.17119 | link |
| 2025-06-20 | Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping | Teng Guo et.al. | 2506.17110 | null |
| 2025-06-20 | LunarLoc: Segment-Based Global Localization on the Moon | Annika Thomas et.al. | 2506.16940 | link |
| 2025-06-19 | ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models | Puhao Li et.al. | 2506.16211 | null |
| 2025-06-19 | STAR-Pose: Efficient Low-Resolution Video Human Pose Estimation via Spatial-Temporal Adaptive Super-Resolution | Yucheng Jin et.al. | 2506.16061 | null |
| 2025-06-19 | KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping | Kowndinya Boyalakuntla et.al. | 2506.15945 | null |
| 2025-06-19 | Beyond Audio and Pose: A General-Purpose Framework for Video Synchronization | Yosub Shin et.al. | 2506.15937 | null |
| 2025-06-18 | Improving Robotic Manipulation: Techniques for Object Pose Estimation, Accommodating Positional Uncertainty, and Disassembly Tasks from Examples | Viral Rasik Galaiya et.al. | 2506.15865 | null |
| 2025-06-18 | PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps | Kirill Muravyev et.al. | 2506.15849 | null |
| 2025-06-18 | Human Motion Capture from Loose and Sparse Inertial Sensors with Garment-aware Diffusion Models | Andela Ilic et.al. | 2506.15290 | null |
| 2025-06-18 | RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories | Qingsong Yan et.al. | 2506.15242 | null |
| 2025-06-17 | PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation | Ming Xu et.al. | 2506.14596 | null |
| 2025-06-17 | Non-Overlap-Aware Egocentric Pose Estimation for Collaborative Perception in Connected Autonomy | Hong Huang et.al. | 2506.14180 | null |
| 2025-06-17 | TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Indoor Localization and Mapping | Jeewon Kim et.al. | 2506.14178 | null |
| 2025-06-16 | Diffusion-based Inverse Observation Model for Artificial Skin | Ante Maric et.al. | 2506.13986 | null |
| 2025-06-16 | PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images | Lingteng Qiu et.al. | 2506.13766 | null |
| 2025-06-16 | JENGA: Object selection and pose estimation for robotic grasping from a stack | Sai Srinivas Jeevanandam et.al. | 2506.13425 | null |
| 2025-06-16 | Automatic Multi-View X-Ray/CT Registration Using Bone Substructure Contours | Roman Flepp et.al. | 2506.13292 | null |
| 2025-06-16 | DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | Sebastian Janampa et.al. | 2506.13027 | link |
| 2025-06-15 | A large-scale, physically-based synthetic dataset for satellite pose estimation | Szabolcs Velkei et.al. | 2506.12782 | null |
| 2025-06-13 | ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation | Jayjun Lee et.al. | 2506.12239 | null |
| 2025-06-10 | Monocular 3D Hand Pose Estimation with Implicit Camera Alignment | Christos Pantazopoulos et.al. | 2506.11133 | null |
| 2025-06-12 | Occlusion-Aware 3D Hand-Object Pose Estimation with Masked AutoEncoders | Hui Yang et.al. | 2506.10816 | null |
| 2025-06-12 | In-Hand Object Pose Estimation via Visual-Tactile Fusion | Felix Nonnengießer et.al. | 2506.10787 | null |
| 2025-06-11 | Fluoroscopic Shape and Pose Tracking of Catheters with Custom Radiopaque Markers | Jared Lawson et.al. | 2506.09934 | null |
| 2025-06-11 | EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks | Athinoulla Konstantinou et.al. | 2506.09895 | link |
| 2025-06-11 | Accurate and efficient zero-shot 6D pose estimation with frozen foundation models | Andrea Caraffa et.al. | 2506.09784 | null |
| 2025-06-11 | CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings | Mattia Nardon et.al. | 2506.09699 | null |
| 2025-06-10 | Princeton365: A Diverse Dataset with Accurate Camera Pose | Karhan Kayan et.al. | 2506.09035 | null |
| 2025-06-10 | ArrowPose: Segmentation, Detection, and 5 DoF Pose Estimation Network for Colorless Point Clouds | Frederik Hagelskjaer et.al. | 2506.08699 | null |
| 2025-06-09 | UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References | Ming-Feng Li et.al. | 2506.07996 | null |
| 2025-06-09 | Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation | Yijie Deng et.al. | 2506.07338 | null |
| 2025-06-10 | From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models | Pablo Acuaviva et.al. | 2506.07280 | null |
| 2025-06-08 | GoTrack: Generic 6DoF Object Pose Refinement and Tracking | Van Nguyen Nguyen et.al. | 2506.07155 | null |
| 2025-06-08 | UNO: Unified Self-Supervised Monocular Odometry for Platform-Agnostic Deployment | Wentao Zhao et.al. | 2506.07013 | null |
| 2025-06-07 | Deep Inertial Pose: A deep learning approach for human pose estimation | Sara M. Cerqueira et.al. | 2506.06850 | null |
| 2025-06-06 | Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments | Mingrui Li et.al. | 2506.05965 | null |
| 2025-06-06 | SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction | Yuchao Zheng et.al. | 2506.05935 | null |
| 2025-06-06 | CryoFastAR: Fast Cryo-EM Ab Initio Reconstruction Made Easy | Jiakai Zhang et.al. | 2506.05864 | null |
| 2025-06-06 | You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping | Jingshun Huang et.al. | 2506.05719 | null |
| 2025-06-05 | On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images | Andreas Meuleman et.al. | 2506.05558 | null |
| 2025-06-05 | Rectified Point Flow: Generic Point Cloud Pose Estimation | Tao Sun et.al. | 2506.05282 | null |
| 2025-06-05 | Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline | Zihan Xu et.al. | 2506.05117 | link |
| 2025-06-05 | CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx | Lukas Picek et.al. | 2506.04931 | null |
| 2025-06-05 | SupeRANSAC: One RANSAC to Rule Them All | Daniel Barath et.al. | 2506.04803 | null |
| 2025-06-05 | LGM-Pose: A Lightweight Global Modeling Network for Real-time Human Pose Estimation | Biao Guo et.al. | 2506.04561 | null |
| 2025-06-04 | Photoreal Scene Reconstruction from an Egocentric Device | Zhaoyang Lv et.al. | 2506.04444 | link |
| 2025-06-04 | cuVSLAM: CUDA accelerated visual odometry | Alexander Korovko et.al. | 2506.04359 | null |
| 2025-06-04 | Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation | Tianyu Huang et.al. | 2506.04225 | null |
| 2025-06-04 | Accelerating SfM-based Pose Estimation with Dominating Set | Joji Joseph et.al. | 2506.03667 | null |
| 2025-06-03 | Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | Mingjie Wei et.al. | 2506.02853 | null |
| 2025-06-03 | GeneA-SLAM2: Dynamic SLAM with AutoEncoder-Preprocessed Genetic Keypoints Resampling and Depth Variance-Guided Dynamic Region Removal | Shufan Qing et.al. | 2506.02736 | link |
| 2025-06-02 | Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction | Samuel Li et.al. | 2506.02265 | null |
| 2025-06-02 | E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models | Wenyan Cong et.al. | 2506.01933 | null |
| 2025-06-02 | SteerPose: Simultaneous Extrinsic Camera Calibration and Matching from Articulation | Sang-Eun Lee et.al. | 2506.01691 | null |
| 2025-06-01 | TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction | Yiyao Huang et.al. | 2506.00953 | null |
| 2025-05-31 | XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | Junwen Huang et.al. | 2506.00599 | null |
| 2025-05-30 | Lazy Heuristic Search for Solving POMDPs with Expensive-to-Compute Belief Transitions | Muhammad Suhail Saleem et.al. | 2506.00285 | null |
| 2025-05-30 | 6D Pose Estimation on Point Cloud Data through Prior Knowledge Integration: A Case Study in Autonomous Disassembly | Chengzhi Wu et.al. | 2505.24669 | null |
| 2025-05-30 | Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data | Marios Glytsos et.al. | 2505.24636 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970 | null |
| 2025-07-08 | OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval | Zhiwei Chen et.al. | 2507.05631 | null |
| 2025-07-07 | Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model | Mengyao Xu et.al. | 2507.05513 | null |
| 2025-07-07 | An analysis of vision-language models for fabric retrieval | Francesco Giuliari et.al. | 2507.04735 | null |
| 2025-07-08 | What’s Making That Sound Right Now? Video-centric Audio-Visual Localization | Hahyeon Choi et.al. | 2507.04667 | null |
| 2025-07-06 | U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration | Xiaofan Li et.al. | 2507.04503 | null |
| 2025-07-04 | Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition | Jiuhong Xiao et.al. | 2507.03831 | null |
| 2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659 | null |
| 2025-06-28 | Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data | Ghufran A. Omran et.al. | 2506.22939 | null |
| 2025-06-28 | Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval | Li-Cheng Shen et.al. | 2506.22864 | null |
| 2025-06-27 | MatChA: Cross-Algorithm Matching with Feature Augmentation | Paula Carbó Cubero et.al. | 2506.22336 | null |
| 2025-06-26 | OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Caoshuo Li et.al. | 2506.21101 | null |
| 2025-06-25 | Visualizing intercalation effects in 2D materials using AFM based techniques | Karmen Kapustić et.al. | 2506.20467 | null |
| 2025-06-25 | On the Burstiness of Faces in Set | Jiong Wang et.al. | 2506.20312 | null |
| 2025-06-24 | jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval | Michael Günther et.al. | 2506.18902 | null |
| 2025-06-26 | Referring Expression Instance Retrieval and A Strong End-to-End Baseline | Xiangzhao Hao et.al. | 2506.18246 | null |
| 2025-06-20 | Class Agnostic Instance-level Descriptor for Visual Instance Search | Qi-Ying Sun et.al. | 2506.16745 | null |
| 2025-06-19 | MambaHash: Visual State Space Deep Hashing Model for Large-Scale Image Retrieval | Chao He et.al. | 2506.16353 | link |
| 2025-06-19 | Fine-grained Image Retrieval via Dual-Vision Adaptation | Xin Jiang et.al. | 2506.16273 | null |
| 2025-06-19 | Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation | Connor Malone et.al. | 2506.15988 | link |
| 2025-06-18 | Semantic and Feature Guided Uncertainty Quantification of Visual Localization for Autonomous Vehicles | Qiyuan Wu et.al. | 2506.15851 | null |
| 2025-06-18 | ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections | Ziling Huang et.al. | 2506.15180 | null |
| 2025-06-17 | HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search | Qian Xu et.al. | 2506.14707 | null |
| 2025-06-16 | A Semantically-Aware Relevance Measure for Content-Based Medical Image Retrieval Evaluation | Xiaoyang Wei et.al. | 2506.13509 | null |
| 2025-06-19 | Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval | Kshitij Kavimandan et.al. | 2506.13496 | null |
| 2025-06-16 | EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition | Bingxi Liu et.al. | 2506.13133 | null |
| 2025-06-16 | SuperPlace: The Renaissance of Classical Feature Aggregation for Visual Place Recognition in the Era of Foundation Models | Bingxi Liu et.al. | 2506.13073 | null |
| 2025-06-14 | Feature Complementation Architecture for Visual Place Recognition | Weiwei Wang et.al. | 2506.12401 | null |
| 2025-06-11 | Towards a general-purpose foundation model for fMRI analysis | Cheng Wang et.al. | 2506.11167 | null |
| 2025-06-11 | Improving Personalized Search with Regularized Low-Rank Parameter Updates | Fiona Ryan et.al. | 2506.10182 | link |
| 2025-06-10 | Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment | Tianyu Chen et.al. | 2506.10030 | null |
| 2025-06-11 | Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints | Xiangkai Zhang et.al. | 2506.09748 | null |
| 2025-06-10 | Robust Visual Localization via Semantic-Guided Multi-Scale Transformer | Zhongtao Tian et.al. | 2506.08526 | null |
| 2025-06-08 | Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs | Yikun Ji et.al. | 2506.07045 | null |
| 2025-06-07 | Zero Shot Composed Image Retrieval | Santhosh Kakarla et.al. | 2506.06602 | null |
| 2025-06-06 | GenIR: Generative Visual Feedback for Mental Image Retrieval | Diji Yang et.al. | 2506.06220 | null |
| 2025-06-06 | Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning | Sheng Chen et.al. | 2506.06205 | null |
| 2025-06-05 | HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition | Suhan Woo et.al. | 2506.04764 | null |
| 2025-06-05 | Deep Learning Reforms Image Matching: A Survey and Outlook | Shihua Zhang et.al. | 2506.04619 | null |
| 2025-06-02 | Entity Image and Mixed-Modal Image Retrieval Datasets | Cristian-Ioan Blaga et.al. | 2506.02291 | null |
| 2025-06-01 | Quantization-based Bounds on the Wasserstein Metric | Jonathan Bobrutsky et.al. | 2506.00976 | null |
| 2025-05-30 | SORCE: Small Object Retrieval in Complex Environments | Chunxu Liu et.al. | 2505.24441 | link |
| 2025-05-29 | Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch | Aneeshan Sain et.al. | 2505.23763 | null |
| 2025-05-28 | 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians | Hidenobu Matsuki et.al. | 2505.22859 | null |
| 2025-05-28 | UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images | Junhuan Liu et.al. | 2505.22098 | null |
| 2025-05-28 | Fast Feature Matching of UAV Images via Matrix Band Reduction-based GPU Data Schedule | San Jiang et.al. | 2505.22089 | null |
| 2025-05-27 | QuARI: Query Adaptive Retrieval Improvement | Eric Xing et.al. | 2505.21647 | null |
| 2025-05-27 | ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval | Eric Xing et.al. | 2505.20764 | null |
| 2025-05-26 | Visualized Text-to-Image Retrieval | Di Wu et.al. | 2505.20291 | link |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | CartoonAlive: Towards Expressive Live2D Modeling from Single Portraits | Chao He et.al. | 2507.17327 | null |
| 2025-07-21 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | Mohamed Adjel et.al. | 2507.16850 | null |
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-15 | KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jie Yang et.al. | 2507.11102 | null |
| 2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077 | null |
| 2025-07-14 | FPC-Net: Revisiting SuperPoint with Descriptor-Free Keypoint Detection via Feature Pyramids and Consistency-Based Implicit Matching | Ionuţ Grigore et.al. | 2507.10770 | null |
| 2025-07-11 | Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection | Subhajit Maity et.al. | 2507.07994 | null |
| 2025-07-09 | Reading a Ruler in the Wild | Yimu Pan et.al. | 2507.07077 | null |
| 2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662 | null |
| 2025-06-27 | MatChA: Cross-Algorithm Matching with Feature Augmentation | Paula Carbó Cubero et.al. | 2506.22336 | null |
| 2025-06-27 | SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images | Naftaly Wambugu et.al. | 2506.21945 | null |
| 2025-05-29 | TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning | Ron Shapira Weber et.al. | 2505.23475 | link |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | link |
| 2025-05-18 | SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving | Muleilan Pei et.al. | 2505.12246 | null |
| 2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130 | null |
| 2025-05-16 | Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation | Massimiliano Cassia et.al. | 2505.11110 | null |
| 2025-06-19 | RDD: Robust Feature Detector and Descriptor using Deformable Transformer | Gonglin Chen et.al. | 2505.08013 | null |
| 2025-05-12 | Enabling Privacy-Aware AI-Based Ergonomic Analysis | Sander De Coninck et.al. | 2505.07306 | null |
| 2025-05-09 | My Emotion on your face: The use of Facial Keypoint Detection to preserve Emotions in Latent Space Editing | Jingrui He et.al. | 2505.06436 | null |
| 2025-05-05 | Unsupervised training of keypoint-agnostic descriptors for flexible retinal image registration | David Rivas-Villar et.al. | 2505.02787 | null |
| 2025-05-05 | Unsupervised Deep Learning-based Keypoint Localization Estimating Descriptor Matching Performance | David Rivas-Villar et.al. | 2505.02779 | null |
2025-7
Pose Estimation
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction | Yuqing Lan et.al. | 2507.17594 | null |
| 2025-07-23 | Physics-based Human Pose Estimation from a Single Moving RGB Camera | Ayce Idil Aytekin et.al. | 2507.17406 | null |
| 2025-07-21 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | Mohamed Adjel et.al. | 2507.16850 | null |
| 2025-07-22 | Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers | Batu Candan et.al. | 2507.16214 | null |
| 2025-07-21 | TONUS: Neuromorphic human pose estimation for artistic sound co-creation | Jules Lecomte et.al. | 2507.15734 | null |
| 2025-07-21 | Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing | Boni Hu et.al. | 2507.15683 | null |
| 2025-07-21 | Dense-depth map guided deep Lidar-Visual Odometry with Sparse Point Clouds and Images | JunYing Huang et.al. | 2507.15496 | null |
| 2025-07-20 | 3-Dimensional CryoEM Pose Estimation and Shift Correction Pipeline | Kaishva Chintan Shah et.al. | 2507.14924 | null |
| 2025-07-20 | An Evaluation of DUSt3R/MASt3R/VGGT 3D Reconstruction on Photogrammetric Aerial Blocks | Xinyi Wu et.al. | 2507.14798 | null |
| 2025-07-22 | AI-Enhanced Precision in Sport Taekwondo: Increasing Fairness, Speed, and Trust in Competition (FST.ai) | Keivan Shariatmadar et.al. | 2507.14657 | null |
| 2025-07-18 | C-DOG: Training-Free Multi-View Multi-Object Association in Dense Scenes Without Visual Feature via Connected δ-Overlap Graphs | Yung-Hong Sun et.al. | 2507.14095 | null |
| 2025-07-21 | PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations | Yu Wei et.al. | 2507.13891 | null |
| 2025-07-18 | MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training | Yuechen Xie et.al. | 2507.13673 | null |
| 2025-07-17 | $π^3$ : Scalable Permutation-Equivariant Visual Geometry Learning | Yifan Wang et.al. | 2507.13347 | null |
| 2025-07-17 | Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Junsu Kim et.al. | 2507.13314 | null |
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-17 | AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability | Tomohiro Suzuki et.al. | 2507.12905 | null |
| 2025-07-17 | From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation | Mengxi Liu et.al. | 2507.12884 | null |
| 2025-07-19 | SpatialTrackerV2: 3D Point Tracking Made Easy | Yuxi Xiao et.al. | 2507.12462 | null |
| 2025-07-16 | Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation | Antonio Finocchiaro et.al. | 2507.12292 | null |
| 2025-07-16 | UniLGL: Learning Uniform Place Recognition for FOV-limited/Panoramic LiDAR Global Localization | Hongming Shen et.al. | 2507.12194 | null |
| 2025-07-16 | BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images | Davide Di Nucci et.al. | 2507.12095 | null |
| 2025-07-16 | SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation | Beining Xu et.al. | 2507.12027 | null |
| 2025-07-16 | SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring | Kaustav Chanda et.al. | 2507.11910 | null |
| 2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077 | null |
| 2025-07-15 | Joint angle model based learning to refine kinematic human pose estimation | Chang Peng et.al. | 2507.11075 | null |
| 2025-07-14 | Raci-Net: Ego-vehicle Odometry Estimation in Adverse Weather Conditions | Mohammadhossein Talebi et.al. | 2507.10376 | null |
| 2025-07-14 | Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures | Xinlong Ding et.al. | 2507.10265 | null |
| 2025-07-14 | ProGait: A Multi-Purpose Video Dataset and Benchmark for Transfemoral Prosthesis Users | Xiangyu Yin et.al. | 2507.10223 | null |
| 2025-07-13 | VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation | Xinyu Zhang et.al. | 2507.09672 | null |
| 2025-07-13 | EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation | Bolun Zheng et.al. | 2507.09560 | null |
| 2025-07-13 | Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding | Yanchen Wang et.al. | 2507.09513 | null |
| 2025-07-12 | PoseLLM: Enhancing Language-Guided Human Pose Estimation with MLP Alignment | Dewen Zhang et.al. | 2507.09139 | null |
| 2025-07-10 | RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration | Chong Cheng et.al. | 2507.08136 | null |
| 2025-07-10 | SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation | Juyeop Han et.al. | 2507.07467 | null |
| 2025-07-09 | g2o vs. Ceres: Optimizing Scan Matching in Cartographer SLAM | Quanjie Qiu et.al. | 2507.07142 | null |
| 2025-07-09 | Smartphone Exergames with Real-Time Markerless Motion Capture: Challenges and Trade-offs | Mathieu Phosanarack et.al. | 2507.06669 | null |
| 2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662 | null |
| 2025-07-09 | Mask6D: Masked Pose Priors For 6D Object Pose Estimation | Yuechen Xie et.al. | 2507.06486 | null |
| 2025-07-08 | SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations | Yegyu Han et.al. | 2507.05751 | null |
| 2025-07-08 | Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting | Mohsi Jawaid et.al. | 2507.05698 | null |
| 2025-07-07 | W2W: A Simulated Exploration of IMU Placement Across the Human Body for Designing Smarter Wearable | Lala Shakti Swarup Ray et.al. | 2507.05532 | null |
| 2025-07-07 | UDF-GMA: Uncertainty Disentanglement and Fusion for General Movement Assessment | Zeqi Luo et.al. | 2507.04814 | null |
| 2025-07-06 | Thousand-Brains Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference | Niels Leadholm et.al. | 2507.04494 | null |
| 2025-07-09 | Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM | Xiaolei Lang et.al. | 2507.04004 | null |
| 2025-07-05 | Accurate Pose Estimation Using Contact Manifold Sampling for Safe Peg-in-Hole Insertion of Complex Geometries | Abhay Negi et.al. | 2507.03925 | null |
| 2025-07-02 | Markerless Stride Length estimation in Athletic using Pose Estimation with monocular vision | Patryk Skorupski et.al. | 2507.03016 | null |
| 2025-07-03 | Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning | Buzhen Huang et.al. | 2507.02565 | null |
| 2025-07-03 | IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning | Abiam Remache González et.al. | 2507.02519 | null |
| 2025-07-03 | 3D Heart Reconstruction from Sparse Pose-agnostic 2D Echocardiographic Slices | Zhurong Chen et.al. | 2507.02411 | null |
| 2025-07-03 | LMPNet for Weakly-supervised Keypoint Discovery | Pei Guo et.al. | 2507.02308 | null |
| 2025-07-02 | What does really matter in image goal navigation? | Gianluca Monaci et.al. | 2507.01667 | null |
| 2025-07-01 | 2024 NASA SUITS Report: LLM-Driven Immersive Augmented Reality User Interface for Robotics and Space Exploration | Kathy Zhuang et.al. | 2507.01206 | null |
| 2025-07-01 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | Hao Xing et.al. | 2507.00752 | null |
| 2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659 | null |
| 2025-06-30 | Computer Vision for Objects used in Group Work: Challenges and Opportunities | Changsoo Jung et.al. | 2507.00224 | null |
| 2025-06-30 | Validation of AI-Based 3D Human Pose Estimation in a Cyber-Physical Environment | Lisa Marie Otto et.al. | 2506.23739 | null |
| 2025-06-30 | MGPRL: Distributed Multi-Gaussian Processes for Wi-Fi-based Multi-Robot Relative Localization in Large Indoor Environments | Sai Krishna Ghanta et.al. | 2506.23514 | null |
| 2025-06-29 | TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints | Zhen Tan et.al. | 2506.23207 | null |
| 2025-06-28 | Deterministic Object Pose Confidence Region Estimation | Jinghao Wang et.al. | 2506.22720 | null |
| 2025-06-27 | Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration | Noora Sassali et.al. | 2506.22116 | null |
Visual Localization
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization | Sania Waheed et.al. | 2507.17455 | null |
| 2025-07-23 | Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and Staging | Farnaz Khun Jush et.al. | 2507.17412 | null |
| 2025-07-20 | Visual Place Recognition for Large-Scale UAV Applications | Ioannis Tsampikos Papapetros et.al. | 2507.15089 | null |
| 2025-07-20 | U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs | Xiaojie Li et.al. | 2507.14902 | null |
| 2025-07-19 | OptiCorNet: Optimizing Sequence-Based Context Correlation for Visual Place Recognition | Zhenyu Li et.al. | 2507.14477 | null |
| 2025-07-16 | Developing an AI-Guided Assistant Device for the Deaf and Hearing Impaired | Jiayu et.al. | 2507.14215 | null |
| 2025-07-17 | FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval | Jeong-Woo Park et.al. | 2507.12823 | null |
| 2025-07-17 | MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval | Jeong-Woo Park et.al. | 2507.12819 | null |
| 2025-07-16 | QuRe: Query-Relevant Retrieval through Hard Negative Sampling in Composed Image Retrieval | Jaehyun Kwak et.al. | 2507.12416 | null |
| 2025-07-16 | CorrMoE: Mixture of Experts with De-stylization Learning for Cross-Scene and Cross-Domain Correspondence Pruning | Peiwen Xia et.al. | 2507.11834 | null |
| 2025-07-09 | Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning | Konstantinos I. Roumeliotis et.al. | 2507.10571 | null |
| 2025-07-14 | GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space | David G. Shatwell et.al. | 2507.10473 | null |
| 2025-07-14 | Text-to-Remote-Sensing-Image Retrieval beyond RGB Sources | Daniele Rege Cambrin et.al. | 2507.10403 | null |
| 2025-07-14 | Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures | Xinlong Ding et.al. | 2507.10265 | null |
| 2025-07-11 | RadiomicsRetrieval: A Customizable Framework for Medical Image Retrieval Using Radiomics Features | Inye Na et.al. | 2507.08546 | null |
| 2025-07-11 | Deep Hashing with Semantic Hash Centers for Image Retrieval | Li Chen et.al. | 2507.08404 | null |
| 2025-07-08 | Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis | Li Li et.al. | 2507.08021 | null |
| 2025-07-10 | SCREP: Scene Coordinate Regression and Evidential Learning-based Perception-Aware Trajectory Generation | Juyeop Han et.al. | 2507.07467 | null |
| 2025-07-10 | VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching | Yu Chen et.al. | 2507.07384 | null |
| 2025-07-08 | FACap: A Large-scale Fashion Dataset for Fine-grained Composed Image Retrieval | François Gardères et.al. | 2507.07135 | null |
| 2025-07-09 | Evaluating Attribute Confusion in Fashion Text-to-Image Generation | Ziyue Liu et.al. | 2507.07079 | null |
| 2025-07-09 | MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval | Naoya Sogi et.al. | 2507.06654 | null |
| 2025-07-08 | Automatic Synthesis of High-Quality Triplet Data for Composed Image Retrieval | Haiwen Li et.al. | 2507.05970 | null |
| 2025-07-08 | OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval | Zhiwei Chen et.al. | 2507.05631 | null |
| 2025-07-07 | Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model | Mengyao Xu et.al. | 2507.05513 | null |
| 2025-07-07 | An analysis of vision-language models for fabric retrieval | Francesco Giuliari et.al. | 2507.04735 | null |
| 2025-07-08 | What’s Making That Sound Right Now? Video-centric Audio-Visual Localization | Hahyeon Choi et.al. | 2507.04667 | null |
| 2025-07-06 | U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration | Xiaofan Li et.al. | 2507.04503 | null |
| 2025-07-04 | Query-Based Adaptive Aggregation for Multi-Dataset Joint Training Toward Universal Visual Place Recognition | Jiuhong Xiao et.al. | 2507.03831 | null |
| 2025-07-01 | LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment | Juelin Zhu et.al. | 2507.00659 | null |
| 2025-06-28 | Utilizing a Novel Deep Learning Method for Scene Categorization in Remote Sensing Data | Ghufran A. Omran et.al. | 2506.22939 | null |
| 2025-06-28 | Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval | Li-Cheng Shen et.al. | 2506.22864 | null |
| 2025-06-27 | MatChA: Cross-Algorithm Matching with Feature Augmentation | Paula Carbó Cubero et.al. | 2506.22336 | null |
| 2025-06-26 | OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography | Caoshuo Li et.al. | 2506.21101 | null |
| 2025-06-25 | Visualizing intercalation effects in 2D materials using AFM based techniques | Karmen Kapustić et.al. | 2506.20467 | null |
| 2025-06-25 | On the Burstiness of Faces in Set | Jiong Wang et.al. | 2506.20312 | null |
| 2025-06-24 | jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval | Michael Günther et.al. | 2506.18902 | null |
| 2025-06-26 | Referring Expression Instance Retrieval and A Strong End-to-End Baseline | Xiangzhao Hao et.al. | 2506.18246 | null |
| 2025-06-20 | Class Agnostic Instance-level Descriptor for Visual Instance Search | Qi-Ying Sun et.al. | 2506.16745 | null |
Keypoint Detection
| Publish Date | Title | Authors | Code | |
|---|---|---|---|---|
| 2025-07-23 | CartoonAlive: Towards Expressive Live2D Modeling from Single Portraits | Chao He et.al. | 2507.17327 | null |
| 2025-07-21 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | Mohamed Adjel et.al. | 2507.16850 | null |
| 2025-07-17 | DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model | Maulana Bisyir Azhari et.al. | 2507.13145 | null |
| 2025-07-15 | KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model | Jie Yang et.al. | 2507.11102 | null |
| 2025-07-15 | GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft | Weizhao Ma et.al. | 2507.11077 | null |
| 2025-07-14 | FPC-Net: Revisiting SuperPoint with Descriptor-Free Keypoint Detection via Feature Pyramids and Consistency-Based Implicit Matching | Ionuţ Grigore et.al. | 2507.10770 | null |
| 2025-07-11 | Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection | Subhajit Maity et.al. | 2507.07994 | null |
| 2025-07-09 | Reading a Ruler in the Wild | Yimu Pan et.al. | 2507.07077 | null |
| 2025-07-09 | MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning | Yifan Yang et.al. | 2507.06662 | null |
| 2025-06-27 | MatChA: Cross-Algorithm Matching with Feature Augmentation | Paula Carbó Cubero et.al. | 2506.22336 | null |
| 2025-06-27 | SDRNET: Stacked Deep Residual Network for Accurate Semantic Segmentation of Fine-Resolution Remotely Sensed Images | Naftaly Wambugu et.al. | 2506.21945 | null |
| 2025-05-29 | TimePoint: Accelerated Time Series Alignment via Self-Supervised Keypoint and Descriptor Learning | Ron Shapira Weber et.al. | 2505.23475 | link |
| 2025-05-24 | Why Not Replace? Sustaining Long-Term Visual Localization via Handcrafted-Learned Feature Collaboration on CPU | Yicheng Lin et.al. | 2505.18652 | link |
| 2025-05-18 | SEPT: Standard-Definition Map Enhanced Scene Perception and Topology Reasoning for Autonomous Driving | Muleilan Pei et.al. | 2505.12246 | null |
| 2025-05-17 | Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation | Niaz Ahmad et.al. | 2505.12130 | null |
| 2025-05-16 | Deepfake Forensic Analysis: Source Dataset Attribution and Legal Implications of Synthetic Media Manipulation | Massimiliano Cassia et.al. | 2505.11110 | null |
| 2025-06-19 | RDD: Robust Feature Detector and Descriptor using Deformable Transformer | Gonglin Chen et.al. | 2505.08013 | null |