What is 'Time Scaling' and why do I need it?

Time Scaling (also known as 'Time Stretching', 'Time Compression/Expansion' and 'Time Correction') is the process to change a sound's length without changing its pitch. When transposing a sound by playing it back at a different speed, like when slowing down the playback speed of a tape recorder, it will play back at a different tempo but also at a different pitch. While this may be fine when tuning drum loops to match the speed of a recording it will make pitched sounds - like vocals - sound totally out of tune. Thus it is desired to provide a process that enables you to change duration and pitch of a recording independently from each other.

What's wrong with common techniques?

All methods commonly used in todays recording hard- and software are based on strict mathematical models. They do not really 'understand' the sound they process, they just force-fit it to the new tempo. Some of them look at the sound as being made up of a sum of simple sine tones, like a complex building can be made from simple shaped bricks. But what if your sounds don't have much in common with a sine tone? It's like trying to build a round dome from quadratic stones - you will end up by cutting and wearing them off to get a curved hull, this way losing material as dust - or losing the quality of your audio recording. Other methods simply scatter your audio files into thousands of little fragments, putting them together again to build a differently sized file. Ever tried to put back together a scattered coffee cup? You never get lucky with the result.

Simply put, all Time Scaling processes presently used are lossy processes by their nature. They alter the sound you process in unwanted ways, deteriorate its quality and their outcome is highly dependent on the source material.

What can be done about it?

Imagine listening to a piece of music, say a string quartet. Can you imagine the four players playing slower, but still at the same pitch? You surely can. Why don't you have a problem with this, while your computer will fail at the same task? Simply, because you understand what's being played, and your computer doesn't. So, if we could find a way to make your computer somehow understand what's being played, we could make a good Time Scaling.

What is MPEX and why is it better?

Prosoniq's main research interest during the last 8 years is to find out how our perception works, and how the processes involved can be successfully simulated in a computer software. As it turns out, the techniques developed during this research can be successfully applied to common problems, one of which is Time Scaling. By simulating the behaviour of a network of artificial nerve cell models the computer develops capabilities such as the ability to 'learn' and to 'generalize', ie. derive rules from a set of examples. Still a long way from being as sophisticated as a human, but still far better than anything else based on pure maths.

MPEX is an algorithm that simulates some properties of the human perception. It makes your computer 'learn' what's being played, much like you can learn a melody someone is whistling. For this, it uses a technique called 'Artificial Neural Networks', which is a computer simulation of the activities of human nerve cells. Our ear is very good at adapting and learning what is being presented to it, so if we simulate the processes that make our ear work the way it does, we will to some extent simulate a part of its abilities.

MPEX looks at your recording in regular intervals and 'learns' its musical aspects, and then tries to extend it in a natural sounding way. You can look at this as if you were an architect who builds a complicated building from a set of differently shaped stones, by selecting the stones who fit a given part of the building best. It is this 'intelligence' and flexibility in the sound representation that makes MPEX a better Time Scaling.


Recommended further reading: "The DSP Dimension" by S.M. Bernsee.