Spatial terms such as “above”, “in front of”, and “on the left of” are all essential for describing the location of one object relative to another object in everyday communication. Apprehending such spatial relations involves relating linguistic to object representations by means of attention. This requires at least one attentional shift, and models such as the Attentional Vector Sum (AVS) predict the direction of that attention shift, from the sausage to the box for spatial utterances such as “The box is above the sausage”. To the extent that this prediction generalizes to overt gaze shifts, a listener’s visual attention should shift from the sausage to the box. However, listeners tend to rapidly look at referents in their order of mention and even anticipate them based on linguistic cues, a behavior that predicts a converse attentional shift from the box to the sausage. Four eye-tracking experiments assessed the role of overt attention in spatial language comprehension by examining )