Chapter 3 Speed limits and temporal limits

Naturally there are speeds at which moving objects cannot be tracked. If we had a particular sort of brain, we’d be able to track any object whose motion we could perceive. But animal brains like ours provide for the perception of motion with a system that is quite independent of the processes that allow for tracking.

Motion direction is sensed by direction-selective cells that are arrayed in retinotopic cortex such that there are cells that respond fairly independently to motion in each part of the visual field. The responses of these cells, as they feed into higher motion-processing areas such as MT/MST, eventually give rise to the experience of motion, even though this does not seem to involve tracking. That is, these cells do not know where the object they are responding to has been previously, they are basically motion sensors that respond when there is motion of a particular direction in their receptive field.

Tracking implies the existence of some index or pointer that represents that a particular object is the same as one of the set of objects designated as targets at the beginning of the trial. Even when there is only one target, this process falters at far lower speeds than perception of the target’s motion (Verstraten, Cavanagh, and Labianca 2000). Moreover, the maximum speed at which one can track is lower when there are more targets (Alex O. Holcombe and Chen 2012). What is it about the tracking process that gives it these properties?

An increase in object speed will have multiple consequences in a typical MOT experiment. In a standard MOT display, as the targets and distractors wander around the screen, they occasionally come very close to each other (in some experiments, they touch each other or even pass through each other). As reviewed in this chapter of my Cambridge Element, very close encounters can result in the loss of a target. That is relevant to the issue of speed because when MOT researchers vary object speed, they typically keep trial duration constant, so that the objects travel farther during the higher-speed trials. As a result, the objects have more close encounters, so the reason for poorer performance could simply be due to that.

A first step to revealing the effect of speed, then, is to assess it without the contaminating effect of an increase in close passes. Alex O. Holcombe and Chen (2012) did this by by keeping the objects very far from each other as well as using shorter trials for fast speeds, so that objects traveled the same total distance for different speeds, just in case there were any long-distance spatial interactions. The speed thresholds that resulted were still far below those for motion perception, suggesting that speed has a deleterious effect on tracking even without any concomitant close encounters, and in a range where the simple perception of motion is yet to be affected. Moreover, participants’ speed thresholds were much lower when two targets had to be tracked compared to when just one target was tracked. One way to refer to this is to say that speed consumes the tracking resource.

It is tempting to conclude that devoting more tracking resource to a target results in the associated internal pointer being able to move faster across the retina. This conclusion would be premature. There remains another possible reason that tracking falters at high speeds.

3.1 A temporal limit on perception

When two objects appear in a common location very close in time, they will be combined by the visual system. If one flickers a light off and on at a very rapid rate (about 60 times a second, depending upon display characteristics), the flicker will not be perceived; instead, one perceives the average of the dark and light phases. That is, the individual on-phases of the light cannot be perceived due to their temporal proximity with the off-phases. This is the basis of projection in the cinema, and is the reason that you can’t perceive the flicker of the long tube-style fluorescent lights that fill the ceilings of old office buildings.

The same phenomenon occurs with moving objects, as Ptolemy noted in his Optics, a book written almost two thousand years ago. Viewing a rapidly rotating potter’s wheel inspired Ptolemy to write, “If spots of a color different from that of the disc are marked on it, they will appear to form circles of the same color [as the given spot] when the disc is rapidly spun.” He also noted that “This also happens in the case of shooting stars, whose light seems distended on account of their speed of motion, all according to the amount of perceptible distance it passes along with the sensible impression that arises in the visual faculty” (A. M. Smith 1996). Ptolemy was correct to suggest that these phenomena are caused by our “visual faculty” rather than the physics of light. Our visual systems combine photoreceptor activations that occur in a single location within a certain amount of time, resulting in the perception of trails behind shooting stars.

While the temporal blurring that fuses together the flickering phases of a fluorescent light, the different colors on a potter’s wheel, and the successive locations of a shooting star reflects the temporal resolution of early stages of our visual system, later stages of visual processing also have temporal limits.

3.2 Temporal limits on visual cognition

Task: judge whether the red color is paired with leftward tilt or rightward title.

Figure 3.1: Task: judge whether the red color is paired with leftward tilt or rightward title.

In the above display, one can easily perceive that the color is alternating between green and red, and that the contour on the left is alternating rapidly between leftward tilt and rightward tilt. This means that the alternation rate does not exceed the temporal resolution of the early visual system - if it did, you would perceive just one color (yellow or brown).

Nevertheless, it is very difficult or impossible to judge which color, red or green, is presented at the same time as the leftward tilt (Alex O. Holcombe and Cavanagh 2001). When the animation is slowed to a rate much slower than about 200 ms per stimulus presentation however, the task becomes quite easy, as you can see below.

Task: judge whether the red color is paired with leftward tilt or rightward title.

Figure 3.2: Task: judge whether the red color is paired with leftward tilt or rightward title.

In the first movie, the temporal resolution of one’s ability to pair the features was exceeded. The temporal dissociation here, and in other circumstances, between perceiving individual features and perceiving their pairing suggests that feature binding requires processes that take longer (have coarser temporal resolution) than those that provide perception of the individual features (A. O. Holcombe 2009; Fujisaki and Nishida 2010).

In the above example, it is tempting to suggest that the dissociation results from a need to make a spatial shift of attention from one of the features’ locations to the other in order to identify both before the other features are presented. However, the phenomenon can also occur with spatially superposed features, such as in the case of color and motion below:

Task: For each row, judge whether the dots, when white, are moving to the left or to the right.

Figure 3.3: Task: For each row, judge whether the dots, when white, are moving to the left or to the right.

While at the slow rate of the top row, it is easy to judge the pairing of motion direction and white/black color, it is very difficult in the middle row, where the speed is slightly faster.

The first to suggest this sort of thing reflected a general limit on temporal individuaation was Dutch guy

attend to the location of one feature first to identify it and the colors and then of the

This phenomenon can also occur for features that are superposed.

not something specific to features

Thus, while early visual processing can deliver motion and color features even from stimuli that are temporally very close to each other, the processing required to judge which features are at the same time requires processing that fails when temporal proximity is very high

3.3 Low-level and high-level temporal limits

In the previous two sections I pointed out that while we can perceive the flicker in a rapidly changing light at rates as high as 60 Hz, some feature binding judgments begin to fail at 3 Hz. A. O. Holcombe (2009) reviewed all the known temporal limits on human visual judgments, from flicker to binocular depth and motion as well as the binding of various features. These limits clustered into two groups, with one set of tasks limited to 8 Hz or below and another set with limits substantially greater than 8 Hz. The summary figure below, based on one in A. O. Holcombe (2009) but with the addition of more recent evidence, highlights these two groups.

Temporal limits on perception

Figure 3.4: Temporal limits on perception

1A. O. Holcombe and Judson (2007); 2Werkhoven, Snippe, and Toet (1992); 3Verstraten, Cavanagh, and Labianca (2000); 4Clifford, Holcombe, and Pearson (2004); 5Alex O. Holcombe and Cavanagh (2001); 6D. H. Arnold (2005); 7Maruya, Holcombe, and Nishida (2013); 8Rogers-Ramachandran and Ramachandran (1998); 9Morgan and Castet (1995); 10Burr and Ross (1982); 11von Segner (1740)

The percepts limited to slow rates are likely to be computed by specialized perceptual mechanisms, whereas those limited to slow rates may require attentional selection and possibly parietal or temporal cortex to bind together two of the constituent features. This idea is schematized in Figure 3.5.

Fast temporal limits on visual perception may reflect early and mid-level stages in the cortical processing hierarchy, while the slow limits seem to reflect later processing stages, often involving attentional selection.

Figure 3.5: Fast temporal limits on visual perception may reflect early and mid-level stages in the cortical processing hierarchy, while the slow limits seem to reflect later processing stages, often involving attentional selection.

3.4 Temporal limits on tracking

Where does object tracking fit into the above-reviewed temporal limits on visual judgments? A good starting point is the ambiguous apparent motion depicted in the “higher-order motion” part of Figure 3.4. If those two frames are alternated, one can see apparent motion clockwise or counter-clockwise, as both interpretations are equally viable. You may even be able to choose to see the figure to rotate clockwise or to rotate counter-clockwise, even while keeping your eyes fixed (Wertheimer 1912). Verstraten, Cavanagh, and Labianca (2000) found that the maximum alternation rate at which this could be done was between 4 and 8 Hz, depending on the participant. These alternation rates, 4 to 8 Hz, are also the rate at which a dot is presented at any given location. In a further experiment, Verstraten, Cavanagh, and Labianca (2000) inserted frames between the two original frames to make the apparent motion unambiguously clockwise or counter-clockwise. They then used a tracking task, where participants had to follow with their attention one target disc as it stepped about the circular trajectory. None of their participants were able to do this when the flicker rate at an individual location exceeded 8 Hz. This truly seemed to be a temporal limit rather than a speed limit, because by varying the number of concurrently-presented discs and the number of intervening steps, the speed about the circle was varied, but what mattered most was the rate at which a disc appeared - the temporal frequency.

Importantly, these findings are not specific to jumpy apparent motion displays. Verstraten, Cavanagh, and Labianca (2000) found a similar result with continuous motion of a grating, where temporal frequency is how often a bright (or dark) bar of the grating traverses any one location. Specifically, they used a circular sine-wave grating presented in an annulus. Participants fixated in the center, attempted to covertly track one light bar of the grating that was cued at the beginning of the trial, and performance fell to 75% correct when the time between successive light bars of the grating was shorter than about 150 milliseconds (6.7 Hz) for the best of the three participants and about 238 ms (4 Hz) for the worst of the three.

Alex O. Holcombe and Chen (2013) found a similar result using discs rather than a grating - with 6 participants, once the discs were moving fast enough that two visited a location within 150 ms (6.6 Hz), tracking performance fell to a similar criterion (halfway to chance) as that used by Verstraten, Cavanagh, and Labianca (2000).

You can get a taste of this, first view the below movie, fixating on the dot in the center, and try to track the two targets that are initially white. If the movie isn’t displayed properly, view it here. When the movie is at its beginning (when the speed readout at top right indicates 0.02 rps), one object in each of the two rings is drawn in white. These are the targets for you to track while you keep your gaze fixed on the dot in the center. As the speed gradually increases, try to keep tracking and see how fast it goes before you lose the targets.

Task: fixate the white dot, track the initially-white targets, and note how fast you can track, using the speed in the upper right corner.

Figure 3.6: Task: fixate the white dot, track the initially-white targets, and note how fast you can track, using the speed in the upper right corner.

Many people can track the targets even at the movie’s fastest speed of approximately 0.6 rps (the exact speed it reached depends on your computer). This is to be expected, because at 0.6 rps, 3 objects corresponds to a an inter-object interval of 556 milliseconds, far higher than the temporal limit. The situation is quite different, however, for the below movie. If the movie isn’t displayed properly, view it here.

Fixate on the dot in the center, track the two targets that are initially white, and note the speed at which you are no longer able to track.

Figure 3.7: Fixate on the dot in the center, track the two targets that are initially white, and note the speed at which you are no longer able to track.

This movie uses the same speeds as the previous one. The only difference is that eight distractors are presented in each array instead of two. In this case people find that as the objects accelerate, very quickly they feel that they can no longer track the objects. Note that this is not due to spatial interference - only when the number of equidistant objects in an array exceeds 13 will spatial interference become significant (Alex O. Holcombe and Chen 2013, 11; Toet and Levi 1992; Pelli and Tillman 2008).

Another way to think about temporal frequency limits such as this is that they reflect when temporal interference becomes strong due to close encounters in time. That is, just as increasing the spatial density of a display high enough will impair tracking due to spatial interference (crowding), a more dense display will also mean a higher temporal frequency, with an object and a distractor visiting the same spatial region in a short span of time.

This brings me to how Verstraten, Cavanagh, and Labianca (2000) and Alex O. Holcombe and Chen (2013) established that the limit is a temporal one rather than a speed limit. They relied on the fact that a particular temporal frequency corresponds to different combinations of speed and spatial density, rather than a single speed as one would expect from a speed limit. The space-time diagrams in Figure 3.8 schematizes this, using the height of a pink rectangle to represent the temporal resolution of the tracking processes. At low stimulus speed and density the time between stimuli occupying any one spatial location is long, so tracking succeeds (top panel). When one increases either speed (middle panel) or density (lower panel), the interval between visits to each location decreases.

The purple rectangle represents the spatial resolution (width) and temporal resolution (height) of tracking. Top panel: In a space-time diagram, one piece of sushi of a sushi rain is designated as the target. Density and speed are low. Tracking processes are able to select an individual sushi. Middle panel: At medium speed, despite low density, tracking fails because temporal resolution is exceeded. Bottom panel: At low speed, but medium density, tracking fails because temporal resolution is exceeded.

Figure 3.8: The purple rectangle represents the spatial resolution (width) and temporal resolution (height) of tracking. Top panel: In a space-time diagram, one piece of sushi of a sushi rain is designated as the target. Density and speed are low. Tracking processes are able to select an individual sushi. Middle panel: At medium speed, despite low density, tracking fails because temporal resolution is exceeded. Bottom panel: At low speed, but medium density, tracking fails because temporal resolution is exceeded.

Quantitatively, the rate of stimulation of each location is the product of speed and density. For a circular array, then, temporal frequency equals speed (in revolutions per second) times the number of discs in the circular array. This means that a temporal limit provides a quantitative prediction for the speed and density combinations that will correspond to participants’ thresholds. This prediction was confirmed by the aforementioned studies. Alex O. Holcombe and Chen (2013), for example, used four different inter-object spacings and many different speeds (speeds were adjusted by a staircase) to assess the speed threshold for each object spacing. The thresholds across the different spacings were close to that predicted by a 6.6 Hz limit. In a study discussed in more detail later, Roudaia and Faubert (2017) replicated this finding that thresholds were more consistent with a temporal frequency limit than with a speed limit - they tested two different spacings and found that the corresponding thresholds were more similar when expressed as temporal frequency than as speed.

The figure below summarises what we know about the limits on covertly tracking a single target.

Spatial and temporal limits on covertly tracking one object.

Figure 3.9: Spatial and temporal limits on covertly tracking one object.

Based on the studies to date, the temporal frequency limit seems to vary substantially between different participants, but 7 Hz is near the top of the range and is used for Figure 3.9. For a circular array, 7 Hz corresponds to lower and lower speeds when more and more distractors are in the array. Note that these speeds are far, far below those that correspond to the limit on motion perception documented for drifting gratings - 25 Hz (Burr and Ross 1982). Spatial crowding likely imposes another limit on tracking (Alex O. Holcombe, Chen, and Howe 2014; Intriligator and Cavanagh 2001), at a point far lower than the spatial acuity limit (not shown). Finally, as we will describe in the “Speed limits” section below, an actual speed limit (as opposed to a temporal limit) also seems to constrain tracking. The combination of these limits yields the combinations of speeds and number of distractors in a circular array indicated by the pink region.

These findings suggest an individuation limit wherein if a stimulus repeats at a particular location within a certain amount of time, about 120 ms if the limit is 8 Hz, the processes responsible for tracking fail. While they never studied tracking, Van de Grind, Grusser, and Lunkenheimer (1973) anticipated this phenomenon to some extent. They coined the term “Gestalt fusion” to refer to how they saw the two phases of a flickering light as being perceived as a single thing when the flicker rate was above approximately 7 Hz. If this is an individuation limit, it might possibly have broader consequences than simply limiting tracking. It might be the reason for some or all of the other slow temporal limits reviewed in 3.3 above. Before considering that in detail, however, an important property that we have not discussed is how resource-intensive the temporal limit on tracking is.

3.5 Temporal interference is highly resource-intensive

In addition to replicating the finding of Verstraten, Cavanagh, and Labianca (2000) of an approximately 7 Hz limit on attentional tracking, Alex O. Holcombe and Chen (2013) also investigated the limits on tracking with two targets and with three targets. They found that the temporal limit was markedly worse for higher target loads. Specifically, while when tracking one target, 143 ms had to elapse between when a target and a distractor visited a location (7 Hz), for two targets the threshold was about 238 ms, and for three targets it was about 385 ms. This dramatic effect of target load on temporal limit was replicated by Roudaia and Faubert (2017), who also replicated the finding that these thresholds were more consistent with a temporal frequency limit than with a speed limit.

The effect of target load on temporal limit observed by Roudaia and Faubert (2017) was remarkably similar in size to the large effect found by Alex O. Holcombe and Chen (2013). The eight participants tested by Alex O. Holcombe and Chen (2013) were all relatively young men, apart from two young women. Roudaia and Faubert (2017) tested both old and young participants, and reported a statistically significant difference. Among their young group, however, they tested only nine young men and nine young women, which for most known gender differences would be provide low statistical power, so the finding of a gender difference should be considered provisional.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
The results of the two experiments of Holcombe & Chen (2013) plotted with the comparable sample (young people) of Roudaia & Faubert (2017). The data symbols are horizontally offset to avoid overlap.

Figure 3.10: The results of the two experiments of Holcombe & Chen (2013) plotted with the comparable sample (young people) of Roudaia & Faubert (2017). The data symbols are horizontally offset to avoid overlap.

In each of the four datasets plotted (and also in the data from the old participants excluded because of outliers), the temporal limit decreases dramatically with increasing target load. Of all the effects of increasing target load that we have discussed, this one may be the largest.

Attentional tracking is a complex task - in A DELETED CHAPTER, six factors likely to affect tracking performance were listed. However, some of these factors might affect practically any task; it is those that are most resource-intensive, and thus most limit our capacity, that should be most illuminating for understanding tracking processes.

From the rather unconstrained trajectories used in most MOT studies, researchers were unable to make strong inferences about how target load was adversely affecting performance. For example, it was necessary to carefully control the distances between objects to disconfirm the suggestion that spatial interference even beyond the crowding range was the primary determinant of the effect of tracking load (Alex O. Holcombe, Chen, and Howe 2014; Alex O. Holcombe 2019).

For temporal interference, the evidence is strong that it is dramatically increased by target load. This leads to two important questions. The first is: what does this effect tell us about how tracking works? Discussion of this is deferred to the 4 section, but in short, it supports serial switching theories of tracking.

The second question is: what role does temporal interference play in typical MOT displays that use more linear trajectories? Unfortunately, none appear to have done so for temporal interference. And some of the evidence from studies that set out to investigate the role of spatial proximity might alternatively be explained by temporal proximity (e.g., Bae and Flombaum 2012)- in typical MOT displays, spatial proximity is likely to be highly correlated with temporal proximity.

3.6 Relation to other temporal limits

A plausible interpretation of the temporal limit on tracking is that it is an attentional selection individuation limit. Above the limit, stimuli cannot be individually selected by attention for processing by higher-level, limited-capacity processes such as cognition. This could prevent successful performance of many, or all, of the tasks in the slow group of Figure 3.4. For example, to correctly identify that red is paired with leftward-tilted in 3.1, both the color and the orientation have to be identified. If attention is unable to select an individual color and orientation and instead has access only to two or more successive frames, then from the perspective of higher-level processes, both colors and both shapes were essentially presented simultaneously. This is illustrated in Figure 3.11.

A rapidly alternating color-orientation pairing stimulus (top) is processed first by high temporal resolution feature processors, which independently determine the color and orientation. Subsequently the pairing of the two features is determined by a process that, because it is low temporal resolution, unfortunately 'sees' multiple colors and orientations simultaneously.

Figure 3.11: A rapidly alternating color-orientation pairing stimulus (top) is processed first by high temporal resolution feature processors, which independently determine the color and orientation. Subsequently the pairing of the two features is determined by a process that, because it is low temporal resolution, unfortunately ‘sees’ multiple colors and orientations simultaneously.

While the temporal limit on binding for this and related tasks is less than 3 Hz (Alex O. Holcombe and Cavanagh 2001; D. H. Arnold 2005), the tracking limit for one target is significantly higher, close to 7 Hz 3.10. The reason for this may be that while both tasks require individuation, the binding tasks require additional processing such as labeling of the features (Alex O. Holcombe and Cavanagh 2001; Fujisaki and Nishida 2010). This theory is consistent with the evidence reviewed in a chapter of my Cambridg Element that encoding of features (such as color or orientation) does not occur in basic tracking. The successive locations of a tracked object must be paired, but this may be faster than other forms of binding, both because feature encoding is not required and possibly because it may piggy-back on motion perception.

Marinovic, Pearce, and Arnold (2013) investigated the role of motion mechanisms with an adaptation experiment in which participants were exposed to a prolonged period of either slow or fast motion. Exposure to slow motion decreased the participants’ maximum tracking speed, while exposure to fast motion actually increased it. Note that this is opposite in direction to what one might expect from other instances of sensory adaptation, wherein adaptation to high spatial frequency, for example, reduces the maximum spatial frequency one can perceive, which is thought to be because the underlying mechanism becomes less responsive.

Marinovic, Pearce, and Arnold (2013) theorized that this surprising effect reflected changes in the relative contribution of the two types of spatiotemporal filters thought to mediate speed perception. Specifically, Marinovic, Pearce, and Arnold (2013) suggested that the low-pass filters are needed to individuate a target from the distractors, because, they said, only the low-pass filters have small enough receptive fields. On that basis, adaptation to slow motion reduces the temporal limit on tracking. They further proposed that adapting to fast motion adapts the band-pass temporal frequency filters that, being large, tend to group the target with the distractors, and thus reducing their responsiveness is a good thing. This account seems plausible for the stimulus they used, as the moving discs were quite close to each other, as there were twelve of them sharing the circular trajectory. I am not sure whether the same can be said if there were only six discs in the trajectory. Unfortunately Marinovic, Pearce, and Arnold (2013) did not test this. They also did not test whether the speed limit, as opposed to the temporal limit, was changed by adaptation; this could be investigated by using two or three discs in the circular trajectory. The finding of Marinovic, Pearce, and Arnold (2013) is an important one as further investigation seems likely to provide additional insights.

3.7 Speed limits

We have seen that the apparent speed limits on tracking in dense displays may actually be caused by temporal limits. This is because when there are a lot of objects in a display, at high object speeds both targets and distractors may occupy a particular location within three or four hundred milliseconds, which can impair or prevent tracking. Temporal interference is less of an issue when the targets and distractors are kept very far apart from each other.

When using a circular trajectory with just one target and one distractor on opposite sides, such that the distractor did not replace the target very quickly, Verstraten, Cavanagh, and Labianca (2000) found evidence that tracking was truly limited by speed rather than by temporal interference. What this was that the maximum speed at which participants could track was much lower than what was predicted by the 7 Hz limit found when several distractors shared the circular trajectory with the target. With one target and one distractor, the 7 Hz limit should result in a speed threshold of 3.5 revolutions per second. Instead, participants’ thresholds were on average less than 2 revolutions per second. My lab found a very similar result (Alex O. Holcombe and Chen 2013). When we tested with 5, 8, or 12 distractors, the speed limit was close to that predicted by a 7 Hz temporal limit. But when there were only 2 distractors in the array, the average speed limit was only 1.7 rps, rather than 3.5 rps predicted by a 7 Hz limit.

It seems, then that tracking of a single object is limited both by a speed limit of about 2 revolutions per second and by a temporal limit of about 7 Hz. The computer screens that were used for testing had refresh rates of 160 Hz, and when the speed of an object is very fast, one can see gaps between the successive frames, which conceivably could be contributing to the speed limit. However, when Wei-Ying Chen and I used a mechanical device rather than an intermittent computer display, we found a speed limit that was only slightly faster, about 2.3 rps (A. O. Holcombe and Chen, W-Y 2020).

Changing the radius of the circular trajectory in experiments like these yielded a truly remarkable finding. Well, Verstraten, Cavanagh, and Labianca (2000) mentioned that for their experiments where participants had to track one bar of a circular 2-cycle grating, they also informally tested annular gratings of different sizes. When the grating was larger, the length of the path traveled by the bar when making one revolution was, of course, longer. One would therefore expect that the speed threshold, when expressed in revolutions per second, should decrease in proportion to the radius of the grating. Consider that a grating of radius 2 deg has twice the circumference as that of a 1 deg radius grating, so the revolutions per second should be halved in order for a bar to travel the same distance in the same amount of time. However, Verstraten, Cavanagh, and Labianca (2000) said that they saw no change in the speed limit in terms of rps. That is, participants could track an object moving twice as fast when the trajectory had a larger radius.

Wei-Ying Chen and I also found that the speed threshold, when expressed in revolutions per second, was robust to the increases in length of the trajectory associated with larger radii (Alex O. Holcombe and Chen 2013; Alex O. Holcombe and Chen 2012), despite the substantial increase in speed in terms of linear distance traveled. We will therefore refer to the speed limit as an angular speed limit.

What does this mean for the processes that underlie tracking? Conceivably, the limit could be imposed by the cortical distance traveled, as the amount of retinotopic cortex per deg of visual angle may diminish linearly with eccentricity. However, for the limit to stay close to constant would require that the scaling constant be equal to one, but empirically this does not seem to be the case, based both on psychophysical and physiological measures (Strasburger, Rentschler, and Jüttner 2011). Another possibility is we might call the costly hemifield-crossing theory. Doubling the revolutions per second doubles the rate of crossing the vertical meridian, and crossing that meridian is known to impair tracking (Strong and Alvarez 2020). If so, a faster limit should be found for trajectories that do not cross the vertical meridian, but in an unpublished experiment (A. O. Holcombe and Chen, W-Y 2020), this was not found. Finally, Verstraten, Cavanagh, and Labianca (2000) suggested that the limit may coincide with that found for mental rotation of objects (Cooper 1976), so possibly the same processes limit both.

Other than Verstraten, Cavanagh, and Labianca (2000) and myself and Chen, no tracking researchers appear to have grappled with the angular speed limit, or mentioned it in any published papers. Instead, MOT researchers continue to write as if tracking is limited by linear distance traveled per unit time, not by revolutions per second or by temporal interference. Admittedly, the speed limit may have little effect in conventional MOT displays with linear trajectories, because for the speeds tested in all or practically all such experiments, objects probably take longer than a second to ever move a full revolution around a point in the display. However, many recent papers use circular trajectories, where the angular speed limit may come into play, although again they tend to use speeds slower than 1 rps Carlson, Alvarez, and Cavanagh (2007).

Is the angular speed limit caused by a more structural limitation, what Norman and Bobrow (1975) called data-limited, or is it resource intensive like the temporal frequency limit? If it is resource intensive, the speed limit should be lower when more targets are tracked, as that would result in less resource available per target. Alex O. Holcombe and Chen (2013) did document an associated decline in speed thresholds, from 1.7 rps with one target to 1.2 rps with two targets and 0.8 rps with three targets. Unfortunately, however, it is difficult to discern whether this reduction is due to a reduction in the actual speed limit, or instead was caused by the previously-documented decline in temporal frequency limit. The problem is that the temporal frequency limit with two targets (approximately 4 Hz) corresponds to a speed, with three objects in a trajectory, below that of the one-target speed limit, so it is difficult to know whether the speed threshold reflects a decline in both the temporal limit and the speed limit or just the temporal limit.

Using fMRI, Shim et al. (2010) investigated the brain areas associated with multiple object tracking, and varied objects’ speeds. If higher speeds consumed more of the tracking resource, one might expect that higher speeds would increase the activation in the same areas that increase in activation with more targets. Shim et al. (2010) identified a parietal region that increased in activation with the number of targets. However, there was no increase in activation of any parietal area with target speed. In a whole-brain analysis, however, Shim et al. (2010) did find some scattered voxels whose activity increased with speed, including the frontal eye field, which is often associated with attentional tasks. Piers D. Howe et al. (2009) also found strong FEF activation in MOT. They did not vary speed, but did compare tracking moving targets to monitoring targets that were stationary, and found that that comparison also yielded strong FEF activation. They suggested that given the FEF’s involvement in eye movements, the activation might reflect suppression of eye movements. Although they did not cite any evidence that suppression of eye movements is more demanding for tracked moving targets than for monitored stationary targets, that seems highly plausible.

3.8 Putting it all together

The temporal interference and associated temporal limits on tracking are clearly highly resource-intensive. Tracking three targets rather than one almost triples the severity of the temporal limit. We will discuss the theoretical implications in a subsequent section. Tracking is also constrained by a speed limit, but we do not yet know whether the speed limit decreases with more targets or is instead a fixed limit. We also do not understand the nature of the speed limit - whether it is truly a rotational limit and how it relates to other mental processes.

Spatial and temporal limits on covertly tracking one, two, and three targets.

Figure 3.12: Spatial and temporal limits on covertly tracking one, two, and three targets.


Arnold, D H. 2005. “Perceptual Pairing of Colour and Motion.” Vision Research 45 (24): 3015–26.
Bae, Gi Yeul, and Jonathan I Flombaum. 2012. “Close Encounters of the Distracting Kind: Identifying the Cause of Visual Tracking Errors.” Attention, Perception & Psychophysics 74 (4): 703–15.
Burr, D C, and J Ross. 1982. Contrast Sensitivity at High Velocities.” Vision Research 22 (4): 479–84.
Carlson, Thomas, George Alvarez, and Patrick Cavanagh. 2007. “Quadrantic Deficit Reveals Anatomical Constraints on Selection.” Proceedings of the National Academy of Sciences of the United States of America 104 (33): 13496–500.
Clifford, Colin W G, Alex O Holcombe, and Joel Pearson. 2004. Rapid Global Form Binding with Loss of Associated Colors. Journal of Vision 4 (12): 1090–1101.
Cooper, Lynne A. 1976. “Mental Transformations and Visual Comparison Processes: Effects of Complexity and Similarity.” Journal of Experimental Psychology. Human Perception and Performance 2 (4): 503–14.
Fujisaki, Waka, and Shin’ya Nishida. 2010. “A Common Perceptual Temporal Limit of Binding Synchronous Inputs Across Different Sensory Attributes and Modalities.” Proceedings of the Royal Society B: Biological Sciences 277 (1692): 2281–90.
Holcombe, A O. 2009. “Seeing Slow and Seeing Fast: Two Limits on Perception.” Trends in Cognitive Sciences 13 (5): 216–21.
Holcombe, A O, and Chen, W-Y. 2020. “The Speed Limit on Attentional Tracking.”
Holcombe, A O, and J Judson. 2007. “Visual Binding of English and Chinese Word Parts Is Limited to Low Temporal Frequencies.” Perception 36 (1): 49–74.
Holcombe, Alex O. 2019. “Comment: Capacity Limits Are Caused by a Finite Resource, Not Spatial Competition.” PsyArXiv.
Holcombe, Alex O., and Patrick Cavanagh. 2001. “Early Binding of Feature Pairs for Visual Perception.” Nature Neuroscience 4 (2): 127–28.
Holcombe, Alex O., and Wei-Ying Chen. 2012. “Exhausting Attentional Tracking Resources with a Single Fast-Moving Object.” Cognition 123 (2).
Holcombe, Alex O, and Wei-ying Chen. 2013. “Splitting Attention Reduces Temporal Resolution from 7 Hz for Tracking One Object to \(<\)3 Hz When Tracking Three.” Journal of Vision 13 (1): 1–19.
Holcombe, Alex O, W Chen, and Piers D L Howe. 2014. “Object Tracking: Absence of Long-Range Spatial Interference Supports Resource Theories.” Journal of Vision 14 (6): 1–21.
Howe, Piers D, Todd S Horowitz, Jeremy Wolfe, and Margaret S Livingstone. 2009. “Using fMRI to Distinguish Components of the Multiple Object Tracking Task.” Journal of Vision 9 (4): 1–11.
Intriligator, J, and P Cavanagh. 2001. “The Spatial Resolution of Visual Attention.” Cognitive Psychology 43 (3): 171–216.
Marinovic, Welber, Samuel L Pearce, and Derek H Arnold. 2013. “Attentional-Tracking Acuity Is Modulated by Illusory Changes in Perceived Speed.” Psychological Science 24 (2): 174–80.
Maruya, Kazushi, Alex O Holcombe, and Shinya Nishida. 2013. “Rapid Encoding of Relationships Between Spatially Remote Motion Signals.” Journal of Vision 13 (4): 1–20.
Morgan, M. J., and E. Castet. 1995. “Stereoscopic Depth Perception at High Velocities.” Nature 378 (6555): 380–83.
Norman, D A, and D G Bobrow. 1975. “On Data-Limited and Resource-Limited Processes.” Cognitive Psychology 7: 44–64.
Pelli, Denis G., and Katharine A. Tillman. 2008. “The Uncrowded Window of Object Recognition.” Nature Neuroscience 11 (10): 1129–35.
Rogers-Ramachandran, D C, and V S Ramachandran. 1998. “Psychophysical Evidence for Boundary and Surface Systems in Human Vision.” Vision Research 38 (1): 71–77.
Roudaia, Eugenie, and Jocelyn Faubert. 2017. “Different Effects of Aging and Gender on the Temporal Resolution in Attentional Tracking.” Journal of Vision 17 (11): 1.
Shim, Won Mok, G a Alvarez, T J Vickery, and Y V Jiang. 2010. “The Number of Attentional Foci and Their Precision Are Dissociated in the Posterior Parietal Cortex.” Cerebral Cortex 20 (6): 1341–49.
Smith, A. Mark. 1996. “Ptolemy’s Theory of Visual Perception: An English Translation of the "Optics" with Introduction and Commentary.” Transactions of the American Philosophical Society 86 (2): iii.
Strasburger, Hans, I Rentschler, and Martin Jüttner. 2011. “Peripheral Vision and Pattern Recognition : A Review.” Journal of Vision 11: 1–82.
Strong, Roger W., and George A. Alvarez. 2020. “Hemifield-Specific Control of Spatial Attention and Working Memory: Evidence from Hemifield Crossover Costs.” Journal of Vision 20 (8): 24.
Toet, A, and D M Levi. 1992. “The Two-Dimensional Shape of Spatial Interaction Zones in the Parafovea.” Vision Research 32 (7): 1349–57.
Van de Grind, W A, O J Grusser, and H U Lunkenheimer. 1973. “Temporal Transfer Properties of the Afferent Visual System.” In Handbook of Sensory Physiology, Central Vision Information, A, edited by R Jung, 431–573. Springer.
Verstraten, F A J, P Cavanagh, and A Labianca. 2000. “Limits of Attentive Tracking Reveal Temporal Properties of Attention.” Vision Research 40 (26): 3651–64.
von Segner, Joan Andreas. 1740. De Raritae Luminis. Göttingen: A. Vandenhoeck.
Werkhoven, P, H P Snippe, and A Toet. 1992. “Visual Processing of Optic Acceleration.” Vision Research 32 (12): 2313–29.
Wertheimer, Max. 1912. “Experimentelle Studien Über Das Sehen von Bewegung.” Zeitschrift Für Psychologie 61: 161–65.