Sox audio concatenation file length wrong - audio

I am trying to concatenate multiple audio files using Sox. Each file is very hi-res: 4ch, PCM, 256k sampling (yes, not a typo), 24 bit. Each file is approx 2 mins long. I can concatenate up to 9 files successfully with: sox file1.wav file2.wav file3.wav outfile.wav.
After 9 files I have the following sox summary which is correct:
Channels : 4
Sample Rate : 256000
Precision : 24-bit
Duration : 00:21:33.89 = 331236000 samples ~ 97041.8 CDDA sectors
File Size : 3.97G
Bit Rate : 24.6M
Sample Encoding: 24-bit Signed Integer PCM
When I add a 10th ~2 minute file I get:
Channels : 4
Sample Rate : 256000
Precision : 24-bit
Duration : 00:00:25.18 = 6445658 samples ~ 1888.38 CDDA sectors
File Size : 4.37G
Bit Rate : 1.39G
Sample Encoding: 24-bit Signed Integer PCM
You'll note here that we went from a length of 00:21:33.89 to a length of 00:00:25.18 with a corresponding drop in samples. Expected result would be a file of ~00:23:xx.xx with the 2 minutes added. The actual file size grew from 3.97GB to 4.37GB so the data is there, it appears to be a problem in the header.
Does anyone know of an upper limit in sox that we might be meeting?
Alternatively does anyone know how I might fix the file post facto? I tried sox --ignore-length infile.wav outfile.wav but the output file was identical.
Thanks

Related

How find sampleCount knowing length audio file and sampleRate?

I have been looking for a long time how to find sampleCount, but there is no answer. It is possible to say an algorithm or formula for calculation. It is known 850ms , the file weight is 37 KB, the resolution of the wav file , sampleRate is 48000.... I can check , you should get sampleCount equal to 40681 as I have in the file . this is necessary so that I can calculate sampleCount for other audio files.I am waiting for your help
I found and I get 40800 . I multiplied the rate with the time in seconds
Yes, the sample count is equal to the sample rate, multiplied by the duration.
So for an audio file that is exactly 850 milliseconds, at 48 kHz sample rate:
850 * 48000 = 40800 samples
Now, with MP3s you have to be careful. There is some padding at the beginning of the file for cleanly initializing the decoder, and the amount of padding can vary based on the encoder and its configuration. (You can read all about the troubles this has caused on the Wikipedia page for "gapless playback".) Additionally, your MP3 duration will be determined on MP3 frame boundaries, and not arbitrary PCM boundaries... assuming your decoder/player does not support gapless playback.

How samples are aligned in the audio file?

I'm trying to better understand how samples are aligned in the audio file.
Let's say we have a 2s audio file with sampling rate = 3.
I think there are three possible ways to align those samples. Looking at the picture below, can you tell me which one is correct?
Also, is this a standard for all audio files or does different formats have different rules?
Cheers!
Sampling rate in audio typically tells you how many samples are in one second, a unit called Hertz. Strictly speaking, the correct answer would be (1), as you have 3 samples within one second. Assuming there's no latency, PCM and other formats dictate that audio starts at 0. Next "cycle" (next second) also starts at zero, same principle like with a clock.
To get total length of the audio (following question in the comment), you should simply take number of samples / rate. Example from a 30s WAV using soxi, one of canonical tools used in the community for sound manipulation:
Input File : 'book_00396_chp_0024_reader_11416_5_door_Freesound_validated_380721_0-door_Freesound_validated_381380_0-9IfN8dUgGaQ_snr10_fileid_1138.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:30.00 = 480000 samples ~ 2250 CDDA sectors
File Size : 960k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
480000 samples / (16000 samples / seconds) = 30 seconds exactly. Citing manual, duration is "Equivalent to number of samples divided by the sample-rate."

Audio data format being rejected in Speech Studio

I'm uploading a zip file of audio data to Custom Speech project in Speech Studio. However, the files are being rejected after upload.
I've tried sox and ffmpeg to do the file conversion. The output of sox is matching the requirements on the doc pages. I don't understand why the files are being rejected.
sox.exe" --i audio1.wav
Input File : 'audio1.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:02.27 = 36320 samples ~ 170.25 CDDA sectors
File Size : 72.7k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
I zip up the file and upload it. I believe this matches with the requirements below.
File format RIFF (WAV)
Sample rate 8,000 Hz or 16,000 Hz
Channels 1 (mono)
Maximum length per audio 2 hours
Sample format PCM, 16-bit
Archive format .zip
Maximum archive size 2 GB
The UI displays "Failed to upload data. Please check your data format and try to upload again."
I can only believe that there's an issue with the service.
I have little experience with sox, but you use ffmpeg with:
ffmpeg.exe -i -ac 1 -ar 16000
You can find ffmpeg here: https://www.ffmpeg.org/
It is free.
Hope this helps.

What is the difference between these 2 wav files?

I am trying to use a program called arss to create a spectrogram from a wav file. I have 2 wav files, one works and the other does not (it was converted to wav from mp3).
The error that arss throws at me is:
This WAVE file is not currently supported.
Which is fine, but I have no idea what parts of my wav file to change so that it will be supported. The docs don't help here (as far as I can tell)
When I run mediainfo on both wav files, I get the following specs:
working wav:
General
Complete name : working.wav
Format : Wave
File size : 1.15 MiB
Duration : 6 s 306 ms
Overall bit rate mode : Constant
Overall bit rate : 1 536 kb/s
Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 6 s 306 ms
Bit rate mode : Constant
Bit rate : 1 536 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Stream size : 1.15 MiB (100%)
not working wav:
General
Complete name : not_working.wav
Format : Wave
File size : 5.49 MiB
Duration : 30 s 0 ms
Overall bit rate mode : Constant
Overall bit rate : 1 536 kb/s
Writing application : Lavf57.83.100
Audio
Format : PCM
Format settings : Little / Signed
Codec ID : 1
Duration : 30 s 0 ms
Bit rate mode : Constant
Bit rate : 1 536 kb/s
Channel(s) : 2 channels
Sampling rate : 48.0 kHz
Bit depth : 16 bits
Stream size : 5.49 MiB (100%)
Comparing the audio specs of both files, I can't tell any difference between anything other than the file size and duration. I even updated the Sampling rate of the non-working wav using ffmpeg so that it would match the working one at 48.0kHz, but no luck.
Any idea?
Both wav files are available here.
FFmpeg, by default, writes a LIST chunk, with some metadata, before the data chunk. ARSS has a rigid parser and expects the data chunk to start at a fixed byte offset (0x24). FFmpeg can be told to skip writing the LIST chunk using the bitexact option.
ffmpeg -i not_working.wav -c copy -bitexact new.wav
Note that ARSS doesn't check for sampling rate, only that WAVs have little endian PCM.
Here's a related Q, not quite a duplicate, linked for future readers:
ffmpeg - Making a Clean WAV file
I have an older version of ffmpeg and the -bitextact option alone did not help much. That is, it did remove some of the data from the hunk, but the LIST hunk was still present with other data.
You may have to also ask for no metadata at all with the -map_metadata option like so:
ffmpeg ... -i <input> ... -flags +bitexact -map_metadata -1 ... <output>
(The ... represent location with other command line options as required in your case)
By adding the -map_metadata -1, it really removed everything and the LIST hunk is now fully gone.

WAV File missing Audio and Recordings

We use FreePBX to record a conference line. This Line appears not to have disconnected and it created a continuous WAV file for 209 hours.
[matt#ait-debian ~/SLP ]$ mediainfo 7000-7000-always-20170823-162901-
1503469728.35757-1503469748.wav
General
Complete name : 7000-7000-always-20170823-162901-
1503469728.35757-1503469748.wav
Format : Wave
File size : 11.2 GiB
Duration : 209 h
Overall bit rate mode : Constant
Overall bit rate : 128 kb/s
Audio
Format : PCM
Format settings, Endianness : Little
Format settings, Sign : Signed
Codec ID : 1
Duration : 209 h
Bit rate mode : Constant
Bit rate : 128 kb/s
Channel(s) : 1 channel
Sampling rate : 8 000 Hz
Bit depth : 16 bits
Stream size : 11.2 GiB (100%)
But when I check with sox (Sound Exchange) it shows only 60hours worth of audio. VLC shows the same when listening to the file.
[matt#ait-debian ~/SLP ]$ soxi 7000-7000-always-20170823-162901-1503469728.35757-1503469748.wav
Input File : '7000-7000-always-20170823-162901-1503469728.35757-
1503469748.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 60:22:32.63 = 1738821024 samples ~ 1.63014e+07 CDDA sectors
File Size : 12.1G
Bit Rate : 444k
Sample Encoding: 16-bit Signed Integer PCM
The issue is that some timer after the 60 hours, at the about 72 hour mark another conference call was made that I need the recording for.
Now I would have thought that the conference continued to record so it should have recorded this audio.
Issue is. VLC, SOX don't see it. But mediainfo says there is 209h worth. So which is correct. I would think that VLC, SOX should show 109h duration.
Can anyone help or advise what happened?
I also posted this to reddit - https://www.reddit.com/r/linuxquestions/comments/6xbz8u/wav_file_missing_audio/
And was able to get an answer:
Using Audacity
Import the WAV file as RAW Data
Set Encoding to Signed 16-bit PCM Set
Set the Start Offset to 44 bytes
Set the Sample Rate 8000 Hz

Resources