Libsox encoding - audio

Why do i get distorted output if I convert a wav file using libsox to:
&in->encoding.encoding = SOX_ENCODING_UNSIGNED;
&in->encoding.bits_per_sample = 8;
using the above code?
The input file has bits_per_sample = 16.

So you're saying that you tell SOX to read a 16 bit sample WAV file as an 8 bit sample file? Knowing nothing about SOX, I would expect it to read each 16 bit sample as two 8 bit samples... the high order byte and the low order byte like this: ...HLHLHLHLHL...
For simplicity, we'll call high order byte samples 'A' samples. 'A' samples carry the original sound with less dynamic range, because the low order byte with the extra precision has been chopped off.
We'll call the low order byte samples "B samples." These will be roughly random and encode noise.
So, as a result we'll have the original sound, the 'A' samples, shifted down in frequency by a half. This is because there's a 'B' sample between every 'A' sample which halves the rate of the 'A' samples. The 'B' samples add noise to the original sound. So we'll have the original sound, shifted down by a half, with noise.
Is that what you're hearing?
Edit Guest commented that the goal is to downconvert a WAV to 8 bit audio. Reading the manpage for SoX, it looks like SoX always uses 32 bit audio in memory as a result of sox_read(). Passing it a format will only make it attempt to read from that format.
To downconvert in memory, use SOX_SAMPLE_TO_SIGNED_8BIT or SOX_SAMPLE_TO_UNSIGNED_8BIT from sox.h, ie:
sox_format_t ft = sox_open_read("/file/blah.wav", NULL, NULL);
if( ft ) {
sox_ssample_t buffer[100];
sox_size_t amt = sox_read(ft, buffer, sizeof(buffer));
char 8bitsample = SOX_SAMPLE_TO_SIGNED_8BIT(buffer[0], ft->clips);
}
to output a downconverted file, use the 8 bit format when writing instead of when reading.

Related

Working out sample rate and bit depth of aiff audio from file size

I need some help with Maths/logic here. Working with aif files.
I have written the following:
LnByte = FileLen(ToCheck) 'Returns Filesize in Bytes
LnBit = LnByte * 8 'Get filesize in Bits
Chan = 1 'Channels in audio: mono = 1
BDpth = 24 'Bit Detph
SRate = 48000 'Sample Rate
BRate = 1152000 'Expected Bit Rate
Time_Secs = LnBit / Chan / BDpth / SRate 'Size in Bits / Channels / Bit Depth / Sample Rate
FSize = (BRate / 8) * Time_Secs '(Bitrate / 8) * Length of file in seconds
ToCheck is the current file when looping through a folder of files.
So I'm finding the length of audio based on the file size in bits / channels / bit depth / sample rate. This assumes that the bit depth and sample rate are correct (I need the files to be 24-bit/48kHz).
Time_Secs = Length of the file in seconds.
FSize = File size based on 24/48kHz using the Time_Secs
Probably because the FSize uses Time_Secs, I can't work out how to, from this, work out if the file sample rate and/or bit depth are indeed correct...
Assuming 24/48k should give 144,000 Bytes per second
Assuming 16/48k should give 96,000 Bytes per second
If I check a file that is 16-bit/48 kHz using the above code it gives the incorrect time in secs (naturally) but the correct file size... even though the Bit Rate is 1,152,000 should be wrong.
-- It would seem that the difference in time is making up for the difference in Bit Rate - or I'm looking at it wrong.
How would I adapt my formula, or do the maths to work out if the sample rate/bit depth of a file is actually 48,000 Hz /24-bit? Or is there a different way entirely? Remembering that they are aif files, not wavs.
Hope that makes sense.
Many Thanks in advance!

What is the correct audio volume slider formula?

I'm building a VoIP application. If I take the slider value and just multiply audio samples by it, I get incorrect, nonlinear sounding results. What's the correct formula to get smooth results?
The correct formula is the decibel formula solved for Prms. Here's example code in C:
// level is 0 to 1, silence is dBFS at level 0
void AdjustVolume(int16_t* buffer, size_t length, float level, float silence = -96)
{
float factor = pow(10.0f, (1 - level) * silence / 20.0f);
for (size_t i = 0; i < length; i++)
buffer[i] = static_cast<int16_t>(buffer[i] * factor);
}
There's one tweakable: silence. It's the amount of noise when there's no sound. Or: the loudness level below which you can't hear the sound because of the background noise. The theoretical maximum silence for 16 bit audio samples is -96 dB (a sample with integer value of 1 out of 32767). In the real world however, there's background noise produced by the audio equipment and the surroundings of the listener, so you might want to pick a noisier silence level, like -30 dB or something. Picking the correct silence value will maximize the useful surface area of your volume slider, or minimize the amount of slider area where no perceptible change in volume occurs.

How to program sound card to output a specific signal?

I'm trying to program my sound card to output specific values.
Let's say, I have below sequence
[1,2,3,2,10,4,1,50,20,1]
I want the sound card to output the specified analog signal according to this sequence.
I can use Windows Multimedia API of course. However, my task is light-weighted and I don't want to use such heavy framework.
Any suggestions on this?
I propose you generate a .wav file and play it with a media player.
It's easy with python and its wave module. In my below example I worked with python 3.3
import wave
import struct
# define your sequence
frames = [1,2,3,2,10,4,1,50,20,1]
output = wave.open('out.wav', mode='wb') # create the file that will contain your sequence
output.setnchannels(1) # 1 for mono
output.setsampwidth(1) # resolution in number of bytes
output.setframerate(10) # sampling rate, being usually 44100 Hz
output.setnframes(len(frames)) # sequence length
for i in range(0, len(frames)):
output.writeframes(struct.pack('h',frames[i])) # convert to string in hex format
# close the file, you're done
output.close()
You can do this in one line if you use Matlab or the free equivalent, Octave. The relevant documentation is here.
soundsc(x, fs, [ lo, hi ])
Scale the signal so that [lo, hi] -> [-1, 1], then play it
at sampling rate fs. If fs is empty, then the default 8000 Hz
sampling rate is used.
Your function call in the console would look like this ...
soundsc([1,2,3,2,10,4,1,50,20,1], fs, [1 50]);
... or like this with manual normalisation of the positive integer vector to give values between +/- 1 ...
x = [1,2,3,2,10,4,1,50,20,1];
x=x-min(x); % get values to range from zero up
x=x/max(x); % get floating point values to range between 0.0 and 1.0
x=x*2-1; % get values to range between +/- 1.0;
soundsc(x);

How to change the volume of a PCM data stream (failed experiment)

Solved
My code was never before used for processing signed values and as such bytes -> short conversion was incorrectly handling the sign bit. Doing that properly solved the issue.
The question was...
I'm trying to change the volume of a PCM data stream. I can extract single channel data from a stereo file, do various silly experimental effects with the samples by skipping/duplicating them/inserting zeros/etc but I can't seem to find a way to modify actual sample values in any way and get a sensible output.
My attempts are really simple: http://i.imgur.com/FZ1BP.png
source audio data
values - 10000
values + 10000
values * 0.9
values * 1.1
(value = -value works fine -- reverses the wave and it sounds the same)
The code to do this is equally simple (I/O uses unsigned values in range 0-65535) <-- that was the problem, reading properly signed values solved the issue:
// NOTE: INVALID CODE
int sample = ...read unsigned 16 bit value from a stream...
sample -= 32768;
sample = (int)(sample * 0.9f);
sample += 32768;
...write unsigned 16 bit value to a stream...
// NOTE: VALID CODE
int sample = ...read *signed* 16 bit value from a stream...
sample = (int)(sample * 0.9f);
...write 16 bit value to a stream...
I'm trying to make the sample quieter. I'd imagine making the amplitude smaller (sample * 0.9) would result in a quieter file but both 4. and 5. above are clearly invalid. There is a similar question on SO where MusiGenesis saying he got correct results with 'sample *= 0.75' type of code (yes, I did experiment with other values besides 0.9 and 1.1).
The question is: am I doing something stupid or is the whole idea of multiplying by a constant wrong? I'd like the end result to be something like this: http://i.imgur.com/qUL10.png
Your 4th attempt is definitely the the correct approach. Assuming your sample range is centered around 0, multiplying each sample by another value is how you can change the volume or gain of a signal.
In this case though, I'd guess something funny happening behind the scenes when you're multiplying an int by a float and casting back to int. Hard to say without knowing what language you're using, but that might be what's causing the problem.

Finding number of samples in .wav file and Hex Editor

Need help with Hex Editor and audio files.I am having trouble figuring out the formula to get the number of samples in my .wav files.
I downloaded StripWav which tells me the number of samples in the .waves,but still cannot figure out the formula.
Can you please download these two .wavs,open them in a hex editor and tell me the formula to get the number of samples.
If you so kindly do this for me,pleas tell me the number of samples for each .wav so I can make sure the formula is correct.
http://sinewavemultimedia.com/1wav.wav
http://sinewavemultimedia.com/2wav.wav
Here is a problem I have two programs,
One reads the wav data and the other shows the numsamples
here is the data
RIFF 'WAVE' (wave file)
<fmt > (format description)
PCM format
2 channel
44100 frames per sec
176400 bytes per sec
4 bytes per frame
16 bits per sample
<data> (waveform data - 92252 bytes)
But the other program says NumSamples is
23,063 samples
/*******UPDATE*********/
One more thing I did the calculation with 2 files
This one is correct
92,296 bytes and num samples is 23,063`
But this other one is not coming out correctly it is over 2 megs i just subracted 44 bytes and I doing it wrong here? here is the filesize
2,473,696 bytes
But the correct numsamples is
617,400
WAVE format
You must read the fmt header to determine the number of channels and bits per sample, then read the size of the data chunk to determine how many bytes of data are in the audio. Then:
NumSamples = NumBytes / (NumChannels * BitsPerSample / 8)
There is no simple formula for determining the number of samples in a WAV file. A so-called "canonical" WAV file consists of a 44-byte header followed by the actual sample data. So, if you know that the file uses 2 bytes per sample, then the number of samples is equal to the size of the file in bytes, minus 44 (for the header), and then divided by 2 (since there are 2 bytes per sample).
Unfortunately, not all WAV files are "canonical" like this. A WAV file uses the RIFF format, so the proper way to parse a WAV file is to search through the file and locate the various chunks.
Here is a sample (not sure what language you need to do this in):
http://msdn.microsoft.com/en-us/library/ms712835
A WAVE's format chunk (fmt) has the 'bytes per sample frame' specified as wBlockAlign.
So: framesTotal = data.ck_size / fmt.wBlockAlign;
and samplesTotal = framesTotal * wChannels;
Thus, samplesTotal===FramesTotal IIF wChannels === 1!!
Note how the above answer elegantly avoided to explain that key-equations the spec (and answers based on them) are WRONG:
consider flor example a 2 channel 12 bits per second wave..
The spec explains we put each 12bps sample in a word:
note: t=point in time, chan = channel
+---------------------------+---------------------------+-----
| frame 1 | frame 2 | etc
+-------------+-------------+-------------+-------------+-----
| chan 1 # t1 | chan 2 # t1 | chan 1 # t2 | chan 2 # t2 | etc
+------+------+------+------+------+------+------+------+-----
| byte | byte | byte | byte | byte | byte | byte | byte | etc
+------+------+------+------+------+------+------+------+-----
So.. how many bytes does the sample-frame (BlockAlign) for a 2ch 12bps wave have according to spec?
<sarcasm> CEIL(wChannels * bps / 8) = 3 bytes.. </sarcasm>
Obviously the correct equation is: wBlockAlign=wChannels*CEIL(bps/8)

Resources