Sound design and the perception of time

A nonsense image generated by neural network
The image was generated by ruDALL-E with a text query “Game audio and perception of time”

We often say that humans are visual creatures. The Colavita visual dominance effect demonstrates that visually abled people are strongly biased towards visual information. When they see an image and hear a sound, they might pay so much more attention to the visuals that they entirely neglect the audio. There are some known phenomena when visual information overrides auditory input, such as the famous McGurk effect. But visual dominance is not universal. There are contexts and situations where vision becomes less efficient as our primary sense, and we start relying on other modalities. One of such contexts is the subjective perception of time.

Disclaimer: When reading my blog, you may get a false impression that I know something about cognitive psychology and other research fields I typically refer to. I have no expertise in those. I am a practicing sound designer, curious enough to read a couple of research papers every now and then. I check my sources and mostly mention things that make sense based on my professional experience, but I lack the competence and supervision to make scientifically accurate statements. Keep in mind that most studies I refer to were done outside of the video game context, and I did not conduct any experiments to confirm my hypotheses. In other words, prepare your grains of salt!

There is a lot of evidence that audition dominates vision in temporal processing. Experiments show that when we perceive the duration of audiovisual events based on the duration of the auditory, not the visual component. The effect of sound modulating the perceived duration of a visual stimulus is often called temporal ventriloquism, as opposed to the classic ventriloquist effect. The ventriloquist effect is the reason why we perceive movie characters’ speech as it is coming from the TV screen itself, not from our speakers. In this case, our vision “captures” sound, influencing our judgment of its spatial location. Temporal ventriloquism is the reverse effect that happens in the time domain.

My posts are usually oversimplified but pragmatic explanations of complex perceptual mechanisms tailored to the game development context. This one is no different, so here is my take on the phenomenon. Whenever we perceive audiovisual information, we mostly rely on visual cues to understand how things exist and behave in space, but we prefer sound to understand how they exist and behave in time. I’m not saying we don’t perceive time visually; rather, we use auditory information as the clock or the source of truth whenever we get somewhat conflicting inputs on both sensory channels. Given that hearing is faster than sight (more on this in an upcoming separate post), it is not surprising that the faster sense delivers more reliable input data to inform us about time.

A nonsense image generated by neural network
The image was generated by ruDALL-E with a text query “Temporal ventriloquism”

How can we use this knowledge in our day-to-day work? I see three levels of practical application. On the smallest scale, we can use individual sound effects to alter the visual events on the screen, making them subjectively faster, smoother, snappier, etc. On a medium scale, rhythmic sound patterns become helpful in efficiently communicating the gameplay dynamics or timing of individual events. Finally, on the largest scale, we can alter the soundtrack or soundscape to make the player feel that time passes subjectively faster or slower.

Sound effects and visual motion

A well-timed sound may affect the perception of visual events. One famous example is the so-called Double Flash Illusion, where sound makes some people see two rapid flashes instead of one. A less known but more fascinating example is the Motion-Bounce Illusion that demonstrates how sound can alter the visual motion perception and, in a way, completely change the meaning of the visual event. Those illusions are not particularly useful in the game development context, but they show how strongly can audio modulate what we see.

Thanks to auditory dominance, we can intentionally break audiovisual synchrony to separate simultaneous events in time and better communicate their quantity. I made a short example video to demonstrate this:

In the video, the arrows reach both targets at the same time. Intuitively, we want to synchronize sound with visuals and play them simultaneously, as in the beginning. However, without a proper separation in time, two sounds blend into one, which feels odd — such coincidence is unlikely in real life. But notice what happens when I start incrementally delaying the sound, associated with the right target, 50 milliseconds at a time. Both 50 and 100 ms delays feel natural and believable. Some of you may even perceive the second hit as visually delayed (thanks, temporal ventriloquism!). At 150 ms, the delay is noticeable but still acceptable. And only at 200 ms does the lack of synchronization become apparent, which aligns with the thresholds mentioned in ITU-R BT.1359–1.

This trick is helpful to perceptually clarify the visually cluttered scene with many simultaneous events. But there are other practical applications. One study shows that sound effects can influence the perceived smoothness of rendered animations. Motion smoothness variations at lower framerates became more apparent to the audience when the animations were presented with no sound. On top of that, it is not uncommon to observe a sharp, snappy sound making the visual movement appear faster than it would be if seen with no audio cue.

Note: I do not advocate using audio to compensate for visual shortcomings. The proper way to solve the problem above would be to separate the events visually in the first place. Any lack of audiovisual congruity decreases perceptual fluency, potentially adding to the cognitive load the player experiences. But these tricks could be helpful when you are desperately short on resources or want to experiment with different feels.

Keep in mind that these effects only appear on a relatively short time scale with a limited range of asynchrony. If audio and visuals are noticeably separated in time, they appear as two different messages, disconnected from each other. ITU-R BT.1359–1 recommends specific thresholds of audiovisual desynchronization in broadcasting: detectability thresholds of +45;-125 ms and acceptability thresholds of +90;-185 ms where positive value means that sound precedes the visuals. Given the interactive nature of our medium, I’d stick to even smaller ranges of detectability and acceptability to be safe.

Rhythmic patterns

Remember those countless videos on YouTube with funny animals dancing to music. Watching them carefully makes you realize that the animal’s movement doesn’t usually match the music rhythm that well. Your brain adjusts your perception of the movement based on the song’s rhythmic structure, tricking you into thinking they fit, even if they are out of sync. Most humans are pretty bad at visually analyzing rhythmic sequences. If you want to find out how bad you are, check the video demo on this page. Unless you have a kind of synesthesia that allows you “auralize” visual rhythms in your head, you will have difficulty differentiating between the pairs of flash sequences.

From the game development perspective, it means that audio becomes a primary information channel to communicate rhythmic patterns. This is obvious in rhythm- and music-based games but easy to overlook in cases when understanding a rhythm could help the player win the challenging fight or time their jumps in a platforming sequence. Of course, you don’t want to turn every game with repetitive event sequences into a rhythm game, but luckily you don’t have to. Accurate and synchronized sonic representation of in-game events is usually enough to guide the player. It is easy to understand this idea if you carefully listen to any popular fighting game and observe how rhythmic, not necessarily in the musical sense, the character moves are and how sound helps you understand these rhythms.

You may argue that intentionally omitting the auditory component of rhythmic action could add to the challenge. I think this is a valid point, but please remember that dealing with the shortcomings of our sensory systems is rarely a fun challenge. So, I’d strongly recommend carefully evaluating such design decisions in the context.

Auditory dominance is also why many sound designers seek framerate-independent implementation of gunfire sounds in shooter games. The player may not notice when the game skips a frame or two, but any deviation in steady auditory rhythm becomes too obvious to ignore. Check this video about the weapon audio of Borderlands 3 if you want to hear an example.

Music and temporal judgments

Although thematically connected to the other effects I describe in this post, long interval judgments, at least to my knowledge, have nothing to do with temporal ventriloquism. But given a vast amount of research on the effects of music on the perception of time, I thought I should mention it in this post.

A nonsense image generated by neural network
The image was generated by ruDALL-E with a text query “Music and chronoception”

First, there is strong evidence that the mere presence of music leads to time overestimation in an audiovisual context. It means that whenever any music plays while we experience something audiovisually, we think that experience has lasted longer than it did. Plot twist: there is also evidence about the mere presence of music is causing people to underestimate time! And both sides tend to agree that the mere presence of music leads to less accurate time estimations than an absence of music.

A large-scale study by Ansani and colleagues links the overestimation with arousal: the more intense music is, the more people overestimate time. Authors particularly highlight tempo and musical complexity as factors increasing arousal and thus influencing the perception of time. And there is no shortage of studies that support those ideas. It also makes perfect sense to me: whenever we are activated, time slows down to make us react to whatever happens around us.

Another study shows that adding music to the game causes the players to underestimate experienced (but not remembered) time. Researchers asked one group of test subjects to keep track of time while playing, while the other group was asked to evaluate the duration of play after the experiment. Only the first group members have significantly underestimated time when playing with music.

And as a cherry on the cake, a study on racing games demonstrated that players overestimated time when the music was selected by them and underestimated time when others chose the music. It parallels the notion that people spend less time shopping when familiar music plays in the background. More interestingly, the racing game study says that arousing music makes people report shorter periods of time, not longer ones, as mentioned above.

So, what the hell is going on? Overall, the area of research on music and perception of time is a rabbit hole, and I could spend months investigating it. I did not, so my takeovers are probably not very profound, if not just lame. One logical explanation for the contradiction would be that the influence is bidirectional; some music features cause overestimation, others result in underestimation. But to my knowledge, there is no clear, non-conflicting evidence supporting this idea.

There is evidence that perception of time shifts depending on whether people like the music or not. In an audiovisual context, overestimation likely happens when music is congruent with the experience. Unpleasantness and incongruity result in underestimation. If this is true, we can expect overestimation in most game-related contexts: games usually have congruent music that supports the experience even when the music itself is unpleasant to hear. Both studies on games linked above showed underestimation, but none of them used pieces explicitly authored to support the gameplay experience, so people likely perceived the music as incongruent.

Ansani and colleagues propose an alternative explanation that aligns with what I see from the studies. In most cases of underestimation, people were consciously aware of time passing by — either waiting for something to happen or knowing that somebody would ask them to estimate time spent. On the contrary, in most cases of overestimation, people did not track time and evaluated it retrospectively. So, music may have opposite effects on prospective and retrospective judgments of time. In cases when people are aware of time, music can be a distractor that drags our attention away from monitoring the time flow. In instances where we are not aware of time, it adds to the complexity of experience, making the brain register more events and use more attention and memory resources, leading to overestimating. The intensity of the music, resulting in higher arousal, could be a modulating factor in both effects.

A nonsense image generated by neural network
The image was generated by ruDALL-E with a text query “Temporal effects of music confuse me”

Why would we want to shape the player’s perception of time in the first place? Game designers may have a better answer for this question, but I see a few creative applications. For instance, we could alter the soundscape to make certain moments perceptually longer and more memorable. Or we could try to increase the average session length in a free-to-play game. I am especially interested in the audio treatment of low-intensity moments when the players wait for something to happen, such as matchmaking, loading screens, or similar idle periods.

Being familiar with only part of the evidence before writing this post, I thought I’d finish it with a clear recommendation: don’t add any complex custom audio to idle moments in your game, or they will appear to last longer than they are. Every bit of my subjective experience and professional intuition screams this is still true: when we add a custom music track to, say, loading screen, we make the players consciously aware of the time they need to wait for the game to load and seemingly stretch that time for them. But as the evidence suggests, there could be an opposite effect.

As an individual who writes this blog on weekends, I cannot test this in a proper experiment. But I would be very interested in finding this out: knowing how to make the idle moments less noticeable, we can tremendously improve player experience in many games. I’d be happy to discuss this! If you share my interest and know the answer or can find the answer (by experimenting or in any other way) — please reach out.


Sound design and the perception of time was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

from UX Collective – Medium https://uxdesign.cc/game-audio-and-perception-of-time-9569a963772a