This paper proposes methods to generate sentences from motions by using a large word dataset, and to distribute the computations to cloud computers. These methods make it possible for humanoid robots to understand human motions in various sentences in real time. The proposed framework consists of two modules for associations between motions and words, and sentence structures of word trigrams. The framework recognizes motions, associates words with them, and arranges the words into sentences by taking into account their lengths. We tested the framework on captured motion data and a large scale dataset of word Ngrams by using cloud computer of Amazon Elastic Compute Cloud, and demonstrated its validity.