In this study, we perform unified discrete and continuous state recognition for robots using pre-trained large-scale vision-language models. Here, we apply Visual Question Answering (VQA) and Image-to-Text Retrieval (ITR), which are tasks of the vision-language models. The state recognition is achieved by appropriately selecting texts to be input to VQA and ITR for the current image through optimization based on a genetic algorithm. This study shows that an appropriate state recognizer can be easily constructed from a small number of dataset without retraining the neural network, and that it is possible to recognize whether a glass door is open, whether water is running, whether water is boiling, and whether butter has melted, which have been difficult for robots to recognize so far.
Cooking is one of the household support tasks that robots are expected to perform in today's society, where the aging of society and declining birthrate are becoming social problems. There are various challenges in recognition and action planning for the execution of cooking tasks by robots. In the recognition of cooking, it is important to recognize changes in the state of food ingredients, and in the action planning of cooking, there is the issue of robot execution of cooking tasks based on cooking recipes written in natural language. However, these issues have been studied separately and have not been integrated. In this research, we propose a robot system that recognizes changes in the state of ingredients and performs appropriate cooking tasks based on cooking recipes by using descriptive analysis of recipes and time series of vision-language model. The effectiveness of the proposed system was verified by conducting an actual experiment with egg dishes.
Parallel wire-driven legs have been proposed as a leg structure that enables both high and continuous jumping. On the other hand, the jumping ability of multiple leg structures including parallel wire-driven legs has not been compared based on dynamic models. In this study, we will develop dynamic models of parallel wire-driven legs and other representative leg structures, and compare their jumping abilities under the same conditions. In addition, the errors between the model and the actual robot are discussed based on the data from the jumping experiment of the parallel wire-driven leg robot RAMIEL.
We propose a novel visual electromyography (EMG) biofeedback system that encourages patients to relearn their internal models by presenting the limb characteristics. These characteristics differ from themselves and are presented in a Virtual Reality (VR) space. Based on our previous research, we developed a prototype system that calculates the angles of the human elbow joint using EMG with a time-invariant model, and utilizes those values in the virtual upper limb objects in VR space. Furthermore, to establish the utility of this system in patients, we verified the effectiveness of the prototype by assessing it in healthy participants. We concluded that this system may interfere with motor control strategies.
In this work, we developed a sensorless method for discriminating objects caged by a robot hand in a two-dimensional plane, for part feeders based on sensorless in-hand caging manipulation. A particle filter is used to represent possible object shapes in terms of a probability distribution based on only the joint angle information of the hand and refine the dimensions and shapes of the object. We have confirmed the ability of the method to discriminate among several object candidates on a simulation basis, especially based on joint information when objects are jammed.
To achieve grasping parts with a variety of geometries and initial pose errors, several robotic hands have been developed that can self-align the parts and realize form closure grasps. Depending on the hand mechanism and the part geometry, a 2nd-order form closure grasp may be required. This paper discusses the feasibility of alignment for 2nd-order form closure grasps of 2D parts with circular fingers based on the required finger force calculated from dynamic analysis, assuming that initial orientational errors of the parts exist. Experiments on the alignment will also be conducted to examine its feasibility in the real environment.
To address the stability of wall climbing motions in legged robots, the Tumble Stability criterion offers several advantages in addition to its computational simplicity: 1) it can be applied in scenarios where the robot's centroid projection is not defined (e.g., wall climbing), and 2) it easily incorporates the ground gripping capability in the calculation of stability margin. In this study, the conventional Tumble Stability judgment theory is reorganized in the acceleration domain to derive a “GIA Stable Space,” representing a stable region for the robot's gravito-inertial acceleration (GIA) and visualized as a polyhedron. A quantitative metric, named GIA Margin, is defined as the geometric distance between the robot's GIA vector and the polyhedron's side. The GIA Margin serves as an extension of the conventional support polygon and centroid projection, providing a measure to evaluate the level of stability. As a representative application of this approach, stable motion planning based on GIA Margin for a wall-climbing legged robot is presented.