What is 'Time Scaling'
and why do I need it? Time Scaling (also known
as 'Time Stretching', 'Time Compression/Expansion' and 'Time
Correction') is the process to change a sound's length
without changing its pitch. When transposing a sound by
playing it back at a different speed, like when slowing down
the playback speed of a tape recorder, it will play back at
a different tempo but also at a different pitch. While this
may be fine when tuning drum loops to match the speed of a
recording it will make pitched sounds - like vocals - sound
totally out of tune. Thus it is desired to provide a process
that enables you to change duration and pitch of a recording
independently from each other. What's wrong with
common techniques? All methods commonly used
in todays recording hard- and software are based on strict
mathematical models. They do not really 'understand' the
sound they process, they just force-fit it to the new tempo.
Some of them look at the sound as being made up of a sum of
simple sine tones, like a complex building can be made from
simple shaped bricks. But what if your sounds don't have
much in common with a sine tone? It's like trying to build a
round dome from quadratic stones - you will end up by
cutting and wearing them off to get a curved hull, this way
losing material as dust - or losing the quality of your
audio recording. Other methods simply scatter your audio
files into thousands of little fragments, putting them
together again to build a differently sized file. Ever tried
to put back together a scattered coffee cup? You never get
lucky with the result. Simply put, all Time
Scaling processes presently used are lossy processes by
their nature. They alter the sound you process in unwanted
ways, deteriorate its quality and their outcome is highly
dependent on the source material. What can be done about
it? Imagine listening to a
piece of music, say a string quartet. Can you imagine the
four players playing slower, but still at the same pitch?
You surely can. Why don't you have a problem with this,
while your computer will fail at the same task? Simply,
because you understand what's being played, and your
computer doesn't. So, if we could find a way to make your
computer somehow understand what's being played, we could
make a good Time Scaling. What is MPEX and why is
it better? Prosoniq's main research
interest during the last 8 years is to find out how our
perception works, and how the processes involved can be
successfully simulated in a computer software. As it turns
out, the techniques developed during this research can be
successfully applied to common problems, one of which is
Time Scaling. By simulating the behaviour of a network of
artificial nerve cell models the computer develops
capabilities such as the ability to 'learn' and to
'generalize', ie. derive rules from a set of examples. Still
a long way from being as sophisticated as a human, but still
far better than anything else based on pure
maths. MPEX is an algorithm that
simulates some properties of the human perception. It makes
your computer 'learn' what's being played, much like you can
learn a melody someone is whistling. For this, it uses a
technique called 'Artificial Neural Networks', which is a
computer simulation of the activities of human nerve cells.
Our ear is very good at adapting and learning what is being
presented to it, so if we simulate the processes that make
our ear work the way it does, we will to some extent
simulate a part of its abilities. MPEX looks at your
recording in regular intervals and 'learns' its musical
aspects, and then tries to extend it in a natural sounding
way. You can look at this as if you were an architect who
builds a complicated building from a set of differently
shaped stones, by selecting the stones who fit a given part
of the building best. It is this 'intelligence' and
flexibility in the sound representation that makes MPEX a
better Time Scaling.
Recommended further
reading: "The
DSP Dimension"
by S.M. Bernsee.