Latent Representations for Visual Proprioception in Inexpensive Robots

Sahara Sheikholeslami, Ladislau Bölöni

2025-04-20Industrial Robots

Abstract

Robotic manipulation requires explicit or implicit knowledge of the robot's joint positions. Precise proprioception is standard in high-quality industrial robots but is often unavailable in inexpensive robots operating in unstructured environments. In this paper, we ask: to what extent can a fast, single-pass regression architecture perform visual proprioception from a single external camera image, available even in the simplest manipulation settings? We explore several latent representations, including CNNs, VAEs, ViTs, and bags of uncalibrated fiducial markers, using fine-tuning techniques adapted to the limited data available. We evaluate the achievable accuracy through experiments on an inexpensive 6-DoF robot.

Related Papers

Newtonian and Lagrangian Neural Networks: A Comparison Towards Efficient Inverse Dynamics Identification2025-06-22 SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL2025-06-04 Non-planar Object Detection and Identification by Features Matching and Triangulation Growth2025-05-19 AI-based Framework for Robust Model-Based Connector Mating in Robotic Wire Harness Installation2025-03-12 MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration2025-01-03 Overview of AI and Communication for 6G Network: Fundamentals, Challenges, and Future Research Opportunities2024-12-19 Capacity-Aware Planning and Scheduling in Budget-Constrained Monotonic MDPs: A Meta-RL Approach2024-10-28 KiloBot: A Programming Language for Deploying Perception-Guided Industrial Manipulators at Scale2024-09-05