Eye tracking shows use of speech disfluencies among young children

The Rochester Baby Lab used eye tracking to test whether infants could make use of the information contained in speech disfluencies, such as "uh" and "um".


Adults tend to produce speech disfluencies before words that are infrequent (e.g., "theeee, uh, mangosteen") and words that have not previously been mentioned in the conversation. Researchers at the Rochester Baby Lab used a Tobii 1750 Eye Tracker to examine whether young word learners (mean age 2;6) are sensitive to this statistical regularity and make use of the disfluency during comprehension. They demonstrate that young children use speech disfluencies to anticipate that an upcoming word is likely to refer to a previously unmentioned or novel object by monitoring their eye-gaze to objects as they hear speech disfluencies embedded in sentences.

Language comprehension is not directly observable, so we use eye movements as an indirect way of inferring the possible referents a child is considering as a sentence is unfolding. Eye tracking uniquely enables us to investigate children’s expectations about upcoming material during comprehension in real time.

Celeste Kidd, Graduate Student Researcher, Brain & Cognitive Sciences, University of Rochester

Tools and methods

To test whether infants could make use of the information contained in speech disfluencies, researchers at the Rochester Baby Lab presented sixteen children (ages 2;4 – 2;8) with pictures of object pairs on their table-mounted Tobii 1750.

Within each trial, one known object (e.g., shoe) and one novel object (e.g., mog) were presented three times in succession. During the first two presentations, children heard an utterance referring to the known object.

A stimuli used during the test (each display contained one familiar object and one novel object).
 A toddler sitting on his mom’s lap watches the  disfluency study movie on the Tobii Pro screen based eye tracker.

During the critical third presentation, the child was told to look at either the known or unknown object. In one condition, the command was fluent; in the other, the command contained a disfluency before the object name. For example, in the fluent condition, the child might hear, "Look! Look at the mog!" In the disfluent condition, "Look! Look at theeeeee... uh.... mog!". Across trials, the critical presentation was fluent or disfluent and referred to the known or the novel object an equal number of times.

Researchers predicted that if children have learned that disfluencies occur before discourse-new and unfamiliar referents, then in the 2-second window prior to the onset of the object name (during the speech disfluencey in the disfluent condition), they would look more toward the novel object in the disfluent condition than in the fluent condition. Thus, the researchers compared the total looking time and the proportion of looking time to the novel object in this critical window of interest.

Our subjects here were toddlers, and toddlers are generally very active. Time is a huge constraint on how much data we can collect from subjects at this age. Tobii eye tracking enabled us to get more data from these children by greatly reducing the amount of time needed to calibrate each child.

Celeste Kidd, Graduate Student Researcher, Brain & Cognitive Sciences, University of Rochester


Researchers calculated the proportion of fixations to the novel object at each time point during the critical phase of the fluent and disfluent trials using Matlab.

Graph 1 and 2.

Timecourse plots (Figures 1 and 2) suggest that children were biased to interpret the disfluency as signaling that the upcoming word would refer to the novel and previously unmentioned object.

To test that hypothesis, researchers compared average total looking time to the novel object across fluent and disfluent trials in the 2-second window of interest before the onset of the target word. Children looked longer overall at the novel object in disfluent trials (1158 ms) that in fluent trials (893 ms) – a difference that a Wilcoxon signed-rank test found to be highly significant (p < 0.008). This result is illustrated in Figure 3. Children also looked proportionally longer at the novel object during the same temporal window of interest in disfluent trials (0.66) than in fluent trials (0.54). This difference (Figure 4) is also significant (p < 0.005). Importantly, the proportion of looking time to the novel object was significantly above chance in the disfluent trials (p < 0.001), but not in the fluent trials (p > 0.37).

Graphs 3 and 4.

These results indicate that young children (1) have learned that disfluencies contain information, (2) attend to disfluencies in speech, and (3) can make use of the information contained in disfluencies online during comprehension in order to infer speaker intention.

We have found the Tobii Eye Tracker to be easy to use with infants and young children and to have sufficient spatial and temporal resolution to provide new insights about spoken word recognition.

Richard Aslin, Professor, Brain & Cognitive Sciences and Center for Visual Science, University of Rochester

Related information

Use our contact form

Contact Sales