[H-GEN] Re: converting from speex to ogg vorbis

Tue Sep 14 20:53:21 EDT 2004

Clinton Roy wrote:
> speexdec --stereo foo.spx foo.wav
> oggenc foo.wav
> Like the problem, the solution does'nt make any sense, enlightenment
> appreciated :)

Digital audio is a signal level sampled at a given frequency.  When 
people talk about 44KHz audio, that refers to the frequency: 44,000 
samples per second.  CD audio is 44.1KHz, telephones operate somewhere 
around 8KHz, FM radio is 22KHz or so IIRC.

On top of that, there's the size of each sample.  Usually these days 
that's 16 bits, though it can also be 8 bits or some weird arrangement. 
  When doing signal processing, you usually expand it up to 32 or 64 
bits, or even more, to reduce error accumulation.

And finally, there's the number of audio channels being sampled.  Mono 
data is one channel.  Stereo data is two channels, left and right.  You 
can get more complex than that, if you really want.  Dolby 5.1 refers to 
5.1 channels: front left, front right, rear left, rear right, bass and 
center.  The center is the point one.  I don't know how Dolby audio is 
encoded, though.

In raw PCM audio, which is essentially the WAV file minus its header, 
you have all the audio data in a time based stream.  That means, for a 
16 bit stereo file, you have 16 bits for the left channel, 16 bits for 
the right channel, 16 left, 16 right, etc.  In a 16 bit mono file, you 
have 16 bits, 16 bits, 16 bits, ...  In other words, exactly half the 
data of a 16 bit stereo file.

One of the properties of audio is that if you decrease the period of a 
signal (ie, bring peaks and troughs closer together, or increase the 
decode/playback frequency) you raise the pitch.

So, if you play a 22KHz, 16bit mono audio file as a 22KHz, 16bit stereo 
file, you are playing the mono data at 44KHz, since you're consuming the 
data at twice the expected rate.  Doubling the playback frequency raises 
the pitch substantially, and gives you a chipmunk sound.

Ok?

-- 
bje