I have spent the evening messing around with raw A-law audio input/output from the built in ALSA tools aplay and arecord, and passing them through an offline moving average filter I have written.
My question is: the audio seems to be encoded using values between 0x2A and 0xAA - a range of 128. I have been reading through this guide which is informative but doesn't really explain why and offset of 42 (0x2A) has been chosen. The file I used to examine this was a square wave exported from audacity as unsigned 8-bit 8kHz audio and examined in a hex editor.
Can anyone shed some light on how A-law is encoded in a file?
This may help;
/dev/dsp
8000 frames per second, 8 bits per frame (1 byte);
# Max volume = \xff (or \x00).
# No volume = \x80 (the middle).
Related
In audio terms there is no difference between AIF and WAV because they're both uncompressed audio. The only difference is the byte order (endianness).
My question is, can any software tell the difference between an AIF that is recorded as such and an AIF that was recorded WAV and converted? I've looked at a hex editor and there appears to be a difference in the chunks - the recorded AIF has more empty space in the COMM and SSND chunks, it would seem.
Is there a reason for this?
Many Thanks
"...the recorded AIF has more empty space in the COMM and SSND chunks, it would seem."
That might be a problem with the specific recorder you use.
In general there is no size difference in the uncompressed PCM data. I've tested a 10-second AAC file converted into WAVE and also into AIFF, result is both formats have the PCM data at 1572864 bytes long.
Also explain "more empty space in the COMM and SSND chunks" since...
COMM only holds 10 bytes worth of metadata, but in WAV file there'll be up to 84 bytes for metadata.
SSND is 16 bytes followed by PCM data, in .wav the DATA chunk is 8 bytes followed by PCM.
I have a large number of .PCM files (248 total) that are all encoded as:
Encoding: Signed 16-bit uncompressed PCM
Byte order: Little-endian
Channels: 2 channel (stereo)
Sample rate: 44100 Hz
8 Byte header
I need to apply a -7.5 db amplification (deamplification?) to every single one of these files.
The problem I have is that all of these tracks are looped, and I need to preserve the loop data (contained in the 8-byte header).
I've yet to see a batch audio editing problem that sox couldn't handle, so I'm hoping someone on here would know how to use sox to accomplish this, or failing that, know of a program that can do this for me.
Thanks for the help!
*Edit- A bit of research got me the exact encoding of the PCM audio I need to edit:
"The audio tracks are 44.1 kilohertz, 16-bit stereo uncompressed unsigned PCM files in little-endian order, left channel first, with a simple eight-byte header. The first four bytes spell out “MSU1” in ASCII. This is followed by a 32-bit unsigned integer used as the loop point, measured in samples (a sample being four bytes) – if the repeat bit is set in the audio state register, this value is used to determine where to seek the audio track to."
*Edit2-I've managed to develop the needed sox command, I just have no idea how to turn it into a batch. Also, turns out the files were 16-bit signed, not unsigned, PCM.
sox -t raw -e signed -b 16 -r 44100 -c 2 -L [filename].pcm -t raw -L [filename].raw vol -7.5dB
I'm fine with either a .BAT I drag and drop files onto or a .BAT that just converts every .PCM file in the folder.
Help appreciated, because I don't even know where to start looking for this one...
I am a newbie here.
I am looking to know about any tool/quick way to convert a 24bit PCM raw(headerless) file, having 3 byte PCM samples,
into a 32 bit PCM raw file which has 4 bytes per sample, with the MSByte of the 4 byte data as sign/zero extension of the 3byte sample.
Apart from the 24bit raw file, I have its corresponding WAVE file as well if it helps.
When tried in audacity, although it converted 24 bit to 32bit, it did not sign/zero extend, but it left shifted by 8, the 24 bit sample. So in effect the 24 bit sample was sitting in the left aligned 24 bits of the 32 bit , which is not what was desired.
Thanks.
I'm going to assume you meant shifted left by 8 instead of shifted right by eight.
In this case the notion of sign extension is unnecessary. Imagine you have a negative 24-bit value 0x800000. Then the left shifted version would be 0x80000000. No sign extension but it still has the correct negative sign.
In summary I think audacity is doing exactly as it should, which is to simply shift the bits up. Unless for some reason your data is unsigned which would be exceptionally unusual.
Upon more search was pointed a way to do this is using sox - on linux.
sox -t s24 --endian little input.pcm -t s32 output.pcm vol 0.00390625
It worked fine.
the vol 0.00390625 is to reduce the volume by 48dB because conversion of raw PCM sample from 24bit to 32bit, by default left shifts by 8 bit, but I want it to be down-shifted back by 8 bits which is reduction in volume by 48dB
I want relation between time and bytes in ogg file. If I have 5 second ogg and it's length 68*1024 bytes. If I chunk from that ogg file and save it can I knew that size from before chunk? Like I knew it I want to chunk from 2.4 to 3.2.
And give some mathematical calculation and get accurate answer of bytes I can get. Can anyone tell me please if this is possible?
Bit rate 128kbps, 16 bit , sample rate - 44.1Khz, stereo
I used below logic but can't get accurate answer.
Click here
Any such direct mapping between file size and play time will work, but not if the codec uses variable bit rate (vbr) encoding ... meaning the compression algorithm is vbr if its success in compressing is dependent on the informational density of the source media ... repetitive audio is more efficiently compressed than say random noise ... vbr algorithms are typically more efficient since to maintain a constant bit rate the algo pads the buffer with filler data just so its throughput is in constant bytes per second
I'm working on an XNA script in which I want to read data from the microphone every couple of frames and estimate its pitch. I took input based almost exactly on this page (http://msdn.microsoft.com/en-us/library/ff827802.aspx).
Now I've got a buffer full bytes. What does it represent? I reset everything and look at my buffer every 10th frame, so it appears to be a giant array that has 9 instances of 1764 bytes at different points in time (The whole thing is 15876 bytes large). I'm assuming it's the time domain of sound pressure, because I can't find any information on the format of microphone input. Anybody know how this works? I have a friend who has an FFT up and running, but we're trying to learn as much as we can about that data I'm collecting before we attempt to plug it in.
The samples are in Little-Endian 16 bit Linear PCM. Convert each pair of bytes into a signed short as
short sample = (short)(buffer[i] | buffer[i+1] << 8);