This paper reviews the literature on gesture in order to elucidate the reasons why people gesture. Two major classes of reasons are explored: communicative and cognitive. One of the main communicative functions of gesture is to help achieve the act of conveying some content to others. First, gestures can encode, in various ways, the information to be conveyed, and become the main vehicle of “content delivery”. Second, gesture can regulate the way the content is conveyed to others. For example, gesture regulates the division of labor between gesture and speech with respect to the content delivery. It also regulates and supplements the content delivery in the speech channel (e.g., managing turns in conversation, marking illocutionary force of an utterance). Another main communicative function is to create an affective “bonding” among communicative partners. This is achieved by producing the same body movement synchronously. As for the cognitive function, gesturing helps the conceptual planning for speech production. It is argued that gesture is a form of thinking, “spatio-motoric thinking”, which is based on our ability to maneuver the body in the environment. This is qualitatively different from the “analytic thinking”, the manipulation of abstract propositional representation. In the course of speech production, these two kinds of thinking collaboratively seek the “package” of information that fits the expressive resources of the language. Because the two forms of thinking have ready access to different organizations of information, it is advantageous for speaker to run both of them simultaneously.
The growth point is a theoretical unit of language and thought proposed as the initiating cognitive unit in the production of speech. Growth points are inferred from speech-gesture synchrony and co-expressivity, and incorporate elements from the context of speaking by virtue of what Vygotsky (1987) called ‘psychological predicates.’ The psychological predicate in any given case can be known only in relation to its local context, from which it is the point of differentiation. Thus growth points, as psychological predicates, incorporate context as an inherent property. The first part of the paper consists of a detailed case study of the growth point of a recorded English utterance. The aim is to demonstrate how this growth point was inferred and how it, and no other, and the full sentence it was unpacked into, could have been products of contextual differentiation. The growth point ‘predicts’ a context, which must be observed in order to validate the growth point in each given case. An essential tool of analysis is the concept of a catchment—a thematic unit in gesture revealed through partial form and space recurrences. In our case study, three catchments are involved. The second part of the paper consists of a comparison of the growth point to Kita's information packaging hypothesis. One difference is that the growth point defines psychological units whereas the information packaging hypothesis, despite its name, does not. This part of the paper also presents an analysis of gestural foreshadowing as seen in a cross-linguistic example (Turkish). Foreshadowing at first glance seems problematic for the growth point, because synchrony appears to break down, but in fact it leads to new insights into mechanisms of growth point composition. The paper concludes with the statement that language is inseparable from imagery—a concept not deducible from the tradition of text-based linguistic analyses. Imagery requires finding new theoretical concepts about language in non-traditional domains.
Six normal young Japanese children and six children with Williams syndrome (WMS) were presented a nonsense label to identify an unfamiliar target solid object that was either rigid (e.g., steel) or flexible (e.g., rubber). The presentation was accompanied with or without a gesture. When the gesture was performed, it emphasized either the shape and function (e.g., a rolling gesture), or the material (e.g., squeezing) of the target. Thereafter the children were presented with a pair of unfamiliar objects, one of them matched with the target in shape and function the other in material. They were asked, given the same label, to choose an object that matched the target in shape and function, or in material. When a gesture accompanied the preceding presentation of the target, the normal children were likely to choose that object which matched the target in shape and function if the gesture had emphasized the shape and function of the target, but tended to choose the object that matched the target in material if the gesture had emphasized the target material. When no gesture was provided, they chose either of the two objects at random. In contrast, the choices of the children with WMS were not influenced by the type of preceding gesture, but were determined randomly regardless of the presence of the gesture. These findings are discussed in terms of the possible impairement of working memory in children with WMS, which would make their language acquisition unique.
The purposes of this study are to investigate factors relating to individual variation of the frequency of gestures during speech, and to assess the functions of gestures during speech based on the factors. An experiment was administered to 24 female subjects under two conditions, the face-to-face condition and the non-face-to-face condition. The task was to give an explanation of the meaning of a target word to a listener. The subjects filled in two questionnaires, by which the cognitive style and affiliation with the listener were measured. The observed gestures were classified into representational gestures and beat gestures. The analysis shows that the “analysis and abstraction” factor of the cognitive style correlated with the frequency of representational gestures under both the face-to-face and the non face-to-face conditions, and affiliation correlated with the frequency of representational gestures only under the non-face-to-face condition. That is, the group who scored high in the “analysis and abstraction” factor in the cognitive style questionnaire performed a representational gesture more frequently than the low-score group in both conditions, and the group with a high score in the affiliation factor performed a representational gesture more frequently than the low-score group in the non face-to-face condition. This study indicates that the emergence of gestures during speech is attributed to achievement of communication, but, at the same time, gestures during speech help the speaker to access lexical representation or to represent spatial information for spatio—kinesic thinking. Moreover it raises a question about the validity of the paradigm of previous experiments, which assumed that the gestures which appear in the non face-to-face condition have purely non-communicative speaker-internal functions.
This study investigates the communicative factor that has an effect on the production of gestures during speech. 24 university students explained the definition of nouns referred to concrete objects and abstract concepts to an imaginary child who didn't know those words. In one condition, they were instructed that their explanations would be video-recorded, and in the other condition, they were instructed that their explanations would be audio-recorded. Their explanations in both conditions were video-recorded and analyzed. The analysis shows that subjects produced more gestures in the video-recorded condition than audio-recorded condition. When those gestures were classified into two groups according to their viewpoints, gestures from the objective viewpoint were produced more frequently than those from the subjective viewpoint in the video-recorded condition. These results suggest the strong effect of the communicative factor on the process of gesture production. Gestures are not mere externalization of mental representation, but they are modified according to demands of communicative interaction between the speaker and the listener.
The present study argues for communicational unity of spoken language and gesture. In particular, it argues that textual structure, in addition to grammar, is the plane on which language and gesture, through the emerging indexical relationships between and within the two different kinds of signs, together reveal communicational intent. The study first discusses how the discursive textual structure is built up through various indexical relationships between and within the textual and gestural elements. We will see two analytically different arrangements of the linguistic and gestural signs. Indexical unity between speech and gesture realized by the performativity of speech as well as the tight synchronization tendency between the two signs is one, and poetic structure realized by the iconic-indexical relationships within each type of the signs is the other that we will semiotically examine here. It is emphasized that these two constitute, respectively, the vertical and horizontal threads that interweave one piece of fabric which we call a discursive textual structure. An example of such a structure is shown in an excerpt from a cartoon story narration by a Japanese female speaker and her interlocutor. The example is meant to demonstrate the significance and utility of the analytic framework promoted in the present study. It shows that what McNeill calls catchments, i.e., gesturally achieved poetic structure, help disambiguate the referent of the actor/subject NP of a verb phrase ‘soko made kuru (come up to there),’ which is ambiguous because of an apparent violation of Hinds' ellipsis chain hypothesis by continuing to elide the subject NP when it is expected to be explicitly marked as a topic. Communicational intent, it is argued at last, emerges through the indexical relationship, at least in part, between and within the linguistic and gestural signs.
Based on the videotaped interaction between Japanese and American workers at a Japanese manufacturing plant in the US, this paper examines their use of gesture as an indispensable communicative strategy. Unlike most intercultural interactions examined in previous studies, the informants of the two groups had limited knowledge of one another's language. Nevertheless, they needed to communicate as accurately as possible in order to accomplish tasks. Two functions of gesture were found in the present data. Firstly, the interactants used iconic gesture together with some of the most commonly appearing vocabulary in their interactions: namely, gestures of ‘assembly’, ‘stamping’, ‘finish’, and the like. This type of gesture was far from redundant. Instead, it served to secure the currently discussed major topic as a reference point for the interactants. That is, by amplifying the major topic being discussed such as ‘assembly’ through combining speech and gesture, the interactants could make sure that they were talking something about assembly. Secondly, when the interactants were temporarily lost in their interaction, iconic gesture was used as a means to elicit clarification. In the example segment, a Japanese engineer repeated the ‘stamping’ gesture used by his American interlocutor. This repetition of gesture functioned to request clarification from the interlocutor. These findings present ways in which gestures were actively and creatively utilized as an important communicative strategy in the intercultural work setting.
Nobe et al. (1998), using a gaze tracking method, has investigated aspects of an anthropomorphic agent's hand gestures which human listeners were looking at in a real-time setting. The current study was conducted to evaluate further the human-anthropomorphic agent interaction, by increasing the number of human subjects involved and by focusing upon how the subjects' attention to gestures leads to a comprehension of them. Subjects gazed at and fixated on most gestures. Moreover, subjects imitated and understood most of the gestures in a reproduction test and a comprehension test. Sometimes subjects answered these tests correctly without any fixations, suggesting they might view the gestures in their near-foveal and peripheral vision and/or use the co-occurring speech information. These results were compared to the gaze behavior of gestures in a human-human interaction reported by Gullberg & Holmqvist (1999).
Participants in conversation take speaking turns regularly and smoothly although who speaks and when he speaks are not decided in advance. Great attention has been paid to these phenomena in various areas. As for processing models of turn-taking, however, only the code model based on turn-taking signals has been proposed so far, wihch has been pointed out, by several researchers, to have serious problems. In this paper, we propose an alternative model, the autonomous model. The basic assumptions of this model are: (1) the speaker and the hearer share cognitive environment including the speaker's speech, according to which the hearer takes her own speech actions, and (2) unlike the code model, the hearer need not decode speaker's intent on his speech actions. We predict the distribution of smooth, as well as non-smooth, transitions between speakers based on the autonomous model and the code model, and compare them with those observed in real spoken dialogues, showing that the autonomous model can account for fundamental characteristics of turn-taking phenomena more precisely than the code model. We further discuss how the autonomous model accounts for the predominant occurrence of smooth transitions.
It is known that the chiming-in and the postpositional particle “ne” play important roles when people try to communicate smoothly with one another, and that there exists a relationship between them. But it is not clear how the frequency of the chiming-in affects the frequency of the postpositional particle “ne”. We conducted an experiment to examine it. The participants in the experiment were sixteen students (S1, ..., S16) and one teacher (T), and each set of subjects consisted of two students (Si, Sj) and the teacher (T). We had eight subject-sets. Before the experiment, the two students in each subject-set were supposed to decide the theme they would talk about, and, to have thought about the contents of their talk. In the experiment, they were supposed to talk about the theme to the teacher. For the half of eight subject-sets (called “Many Group”), the teacher chimed in at each transition relevance place (TRP) as possible as he could. For the other half (called “Few Group”), the teacher chimed in only when the utterance terminated. The conversation was recorded by a tape-recorder, and transcribed to count the number of appearances of the postpositional particle “ne”. The frequency of the postpositional particle “ne” had a statistical tendency to be larger in the Few Group than in the Many Group.