Abstract
High-quality data collection is indispensable for Vision-Language-Action (VLA) learning in humanoids to address labor
shortages in agriculture. Conventional robot simulators often lack visual fidelity for complex natural environments. This study
constructs a data collection system that integrates a photorealistic environment using Unreal Engine 5 and 3D Gaussian Splatting with
intuitive VR operations. By performing demonstrations in a virtual space that mirrors the actual robot's kinematics, we aim to generate
data that captures the operator's intent. This report describes the system architecture and the implementation of a 29-DOF robot model
based on official kinematic specifications.