I was given an uncompressed .wav audio file (360 mb) which seems to be broken. The file was recorded using a small usb recorder (I don't have more information about the recorder at this moment). It was unreadable by any player and I've tried GSpot (https://www.headbands.com/gspot/) to detect whether it was perhaps of a different format than wav but to no avail. The file is big, which hints at it being in some uncompressed format. It misses the RIFF-WAVE characters at the start of the file though, which can be an indication this is some other format or perhaps (more likely in this case) the header is missing.
I've tried converting the bytes of the file directly to audio and this creates a VERY noisy audio file, though voices could be made out and I was able to determine the sample rate was probably 22050hz (given a sample size of 8-bits) and a file length of about 4 hours and 45 minutes. Running it through some filters in Audition resulted in a file that was understandable in some places, but still way too noisy in others.
Next I tried running the data through some java code that produces an image out of the bytes, and it showed me lots of noise, but also 3 byte separations every 1024 bytes. First a byte close to either 0 or 255 (but not 100%), then a byte representing a number distributed somewhere around 25 (but with some variation), and then a 00000000 (always, 100%). The first 'chunk header' (as I suppose these are) is located at 513 bytes into the file, again close to a 2-power, like the chunk size. Seems a bit too perfect for coincidence, so I'm mentioning it as it could be important. https://imgur.com/a/sgZ0JFS, the first image shows a 1024x1024 image showing the first 1mb of the file (row-wise) and the second image shows the distribution of the 3 'chunk header' bytes.
Next to these headers, the file also has areas that clearly show structure, almost wave-like structures. I suppose this is the actual audio I'm after, but it's riddled with noise: https://imgur.com/a/sgZ0JFS, third image, showing a region of the file with audio structures.
I also created a histogram for the entire file (ignoring the 3-byte 'chunk headers'): https://imgur.com/a/sgZ0JFS, fourth image. I've flipped the lower half of the range as I think audio data should be centered around some mean value, but correct me if I'm wrong. Maybe the non-symmetric nature of the histogram has something to do with signed/unsigned data or two's-complement. Perhaps the data representation is in 8-bit floats or something similar, I don't know.
I've ran into a wall now. I have no idea what else I can try. Is there anyone out there that sees something I missed. Perhaps someone can give me some pointers what else to try. I would really like to extract the audio data out of this file, as it contains some important information.
Sorry for the bother. I've been able to track down the owner of the voice recorder and had him record me a minute of audio with it and send me that file. I was able to determine the audio was IMA 4-bit ADPCM encoded, 16-bit audio at 48000hz. Looking at the structure of the file I realized simple placing the header of the good file in front of the data of the bad file should be possible, and lo and behold I had a working file again :)
I'm still very much interested how that ADPCM works and if I can write my own decoder, but that's for another day when I'm strolling on wikipedia again. Have a great day everyone!
Related
I am writing my own Opus Ogg writer following these specifications: RFC7845 and RFC3533.
Currently, I am facing an issue that I believe is related to how I am setting the lacing values (segment table).
My current setup is to basically read (using an existing Ogg reader) an Ogg file with a single Opus track and put that Opus track in another Ogg file that I create using my own Ogg writer.
So I have a function that takes the Opus content of each page from the original Ogg file and put it in pages in my new Ogg file.
I am being able to create the file successfully, but when I try playing it on VLC, it shows the correct timestamp but it does not play any sound.
I noticed that the issue is being caused by the way my segment table (or lacing values) is set.
I am currently creating it by filling each segment with as much data as possible (i.e 255 bytes), and letting only the last segment have a size < 255. This seems to be the way that other implementations are doing it (see Rust implementation, C implementation).
However, when I inspect the lacing values for a page containing that Opus content in the original Ogg file, it is not filled with 255s. It's another combination of segment sizes that still sums up to the same page size, but that uses more segments (since it's not taking up the max segment size). When I try using the exact segments combination in the original file, the file plays on VLC successfully.
So that makes me conclude that the approach I am taking with creating as many 255-sized segments is incorrect. Does anyone have any idea how to properly set the lacing values?
The common situation when the integrity of an MP3 file is not correct, is when the file has been partially uploaded to the server. In this case, the indicated audio duration doesn't correspond to what is really in the MP3 file: we can hear the beginning, but at some point the playing stops and the indicated duration of the audio player is broken.
I tried with libraries like node-ffprobe, but it seems they just read metadata, without making comparison with real audio data in the file. Is there a way to detect efficiently a corrupted or incomplete MP3 file from node.js?
Note: the client uploading MP3 files is a hardware (an audio recorder), uploading files on a FTP server. Not a browser. So I'm not able to upload potentially more useful data from the client.
MP3 files don't normally have a duration. They're just a series of MPEG frames. Sometimes, there is an ID3 tag indicating duration, but not always.
Players can determine duration by choosing one of a few methods:
Decode the entire audio file.This is the slowest method, but if you're going to decode the file anyway, you might as well go this route as it gives you an exact duration.
Read the whole file, skimming through frame headers.You'll have to read the whole file from disk, but you won't have to decode it. Can be slow if I/O is slow, but gives you an exact duration.
Read the first frame's bitrate and estimate duration by file size.Definitely the fastest method, and the one most commonly used by players. Duration is an estimate only, and is reasonably accurate for CBR, but can be wildly inaccurate for VBR.
What I'm getting at is that these files might not actually be broken. They might just be VBR files that your player doesn't know the duration of.
If you're convinced they are broken (such as stopping in the middle of content), then you'll have to figure out how you want to handle it. There are probably only a couple ways to determine this:
Ideally, there's an ID3 tag indicating duration, and you can decode the whole file and determine its real duration to compare.
Usually, that ID3 tag won't exist, so you'll have to check to see if the last frame is complete or not.
Beyond that, you don't really have a good way of knowing if the stream is incomplete, since there is no outer container that actually specifies number of frames to expect.
The expression for calculating the filesize of an mp3 based on duration and encoding (from this answer) is quite simple:
x = length of song in seconds
y = bitrate in kilobits per second
(x * y) / 1024 = filesize (MB)
There is also a javascript implementation for the Web Audio API in another answer on that same question. Perhaps that would be useful in your Node implementation.
mp3diags is some older open source software for fixing mp3s and which was great for batch processing stuff like this. The source is c++ and still available if you're feeling nosy and want to see how some of these features are implemented.
Worth a look since it has some features that might be be useful in your context:
What is MP3 Diags and what does it do?
low quality audio
missing VBR header
missing normalization data
Correcting files that show incorrect song duration
Correcting files in which the player cannot seek correctly
I receive data from a Kinect v2, which is (I believe, information is hard to find) 16kHz mono audio in 32-bit floating point PCM. The data arrives in up to 4 "SubFrames", which contain 256 samples each.
When I send this data to lame.exe with -r -s 16 --bitwidth 32 -m m I get an output containing gaps (supposedly where the second channel should be). These command line switches should however take stereo and downmix it to mono.
I've also tried importing the raw data into Audacity, but I still can't figure out the correct way to get continuous audio out of it.
EDIT: I can get continuous audio when I only save the first SubFrame. The audio still doesn't sound right though.
In the end I went with Ogg Vorbis. A free format, so no problems there either. I use the following command line switches for oggenc2.exe:
oggenc2.exe --raw-format=3 --raw-chan=1 --raw-rate=16000 - --output=[filename]
I've written an MP4 parser that can read atoms in an MP4 just fine, and stitch them back together - the result is a technically valid MP4 file that Quicktime can open and such, but it can't play any audio as I believe the timing/sampling information is all off. I should probably mention I'm only interested in audio.
What I'm doing is trying to take the moov atoms/etc from an existing MP4, and then take only a subset of the mdat atom in the file to create a new, smaller MP4. In doing so I've altered the duration in the mvhd atom, as well as the duration in the mdia header. There are no tkhd atoms in this file that have edits, so I believe I don't need to alter the durations there - what am I missing?
In creating the new MP4 I'm properly sectioning the mdat block with a wide box, and keeping the 'mdat' header/size in their right places - I make sure to update the size with the new content.
Now it's entirely 110% possible I'm missing something crucial about the format, but if this is possible I'd love to get the final piece. Anybody got any input/ideas?
Code can be found at the following link:
https://gist.github.com/ryanmcgrath/958c602cff133bd7fa0b
I'm going to take a stab in the dark here and say that you're not updating your stbl offsets properly. At least I didn't (at first glance) see your python doing that anywhere.
STSC
Lets start with the location of data. Packets are written into the file in terms of chunks, and the header tells the decoder where each "block" of these chunks exists. The stsc table says how many items per chunk exist. The first chunk says where that new chunk starts. It's a little confusing, but look at my example. This is saying that you have 100 samples per chunkk, up to the 8th chunk. At the 8th chunk there are 98 samples.
STCO
That said, you also have to track where the offsets of these chunks are. That's the job of the stco table. So, where in the file is chunk offset 1, or chunk offset 2, etc.
If you modify any data in mdat you have to maintain these tables. You can't just chop mdat data out, and expect the decoder to know what to do.
As if this wasn't enough, now you have to also maintain the sample time table (stts) the sample size table (stsz) and if this was video, the sync sample table (stss).
STTS
stts says how long a sample should play for in units of the timescale. If you're doing audio the timescale is probably 44100 or 48000 (kHz).
If you've lopped off some data, now everything could potentially be out of sync. If all the values here have the exact same duration though you'd be OK.
STSZ
stsz says what size each sample is in bytes. This is important for the decoder to be able to start at a chunk, and then go through each sample by its size.
Again, if all the sample sizes are exactly the same you'd be OK. Audio tends to be pretty much the same, but video stuff varies a lot (with keyframes and whatnot)
STSS
And last but not least we have the stss table which says which frame's are keyframes. I only have experience with AAC, but every audio frame is considered a keyframe. In that case you can have one entry that describes all the packets.
In relation to your original question, the time display isn't always honored the same way in each player. The most accurate way is to sum up the durations of all the frames in the header and use that as the total time. Other players use the metadata in the track headers. I've found it best to just keep all the values the same and then players are happy.
If you're doing all that and I missed it in the script then can you post a sample mp4 and a standalone app and I can try to help you out.
My code using NAudio to read one particular MP3 gets different results than several other commercial apps.
Specifically: My NAudio-based code finds ~1.4 sec of silence at the beginning of this MP3 before "audible audio" (a drum pickup) starts, whereas other apps (Windows Media Player, RealPlayer, WavePad) show ~2.5 sec of silence before that same drum pickup.
The particular MP3 is "Like A Rolling Stone" downloaded from Amazon.com. Tested several other MP3s and none show any similar difference between my code and other apps. Most MP3s don't start with such a long silence so I suspect that's the source of the difference.
Debugging problems:
I can't actually find a way to even prove that the other apps are right and NAudio/me is wrong, i.e. to compare block-by-block my code's results to a "known good reference implementation"; therefore I can't even precisely define the "error" I need to debug.
Since my code reads thousands of samples during those 1.4 sec with no obvious errors, I can't think how to narrow down where/when in the input stream to look for a bug.
The heart of the NAudio code is a P/Invoke call to acmStreamConvert(), which is a Windows "black box" call which I can't think how to error-check.
Can anyone think of any tricks/techniques to debug this?
The NAudio ACM code was never originally intended for MP3s, but for decoding constant bit rate telephony codecs. One day I tried setting up the WaveFormat to specify MP3 as an experiment, and what came out sounded good enough. However, I have always felt a bit nervous about decoding MP3s (especially VBR) with ACM (e.g. what comes out if ID3 tags or album art get passed in - could that account for extra silence?), and I've never been 100% convinced that NAudio does it right - there is very little documentation on how exactly you are supposed to use the ACM codecs. Sadly there is no managed MP3 decoder with a license I can use in NAudio, so ACM remains the only option for the time being.
I'm not sure what approach other media players take to playing back MP3, but I suspect many of them have their own built-in MP3 decoders, rather than relying on the operating system.
I've found some partial answers to my own Q:
Since my problem boils down to consuming too much MP3 w/o producing enough PCM, I used conditional-on-hit-count breakpoints to find just where this was happening, then drilled into that.
This showed me that some acmStreamConvert() calls are returning success, consuming 417 src bytes, but producing 0 "dest bytes used".
Next I plan to try acmStreamSize() to ask the codec how many src bytes it "wants" to consume, rather than "telling" it to consume 417.
Edit (followup): I fixed it!
It came down to passing acmStreamConvert() enough src bytes to make it happy. Giving it its acmStreamSize() requested size fixed the problem in some places but then it popped up in others; giving it its requested size times 3 seems to cure the "0 dest bytes used" result in all MP3s I've tested.
With this fix, acmStreamConvert() then sometimes returned much larger converted chunks (almost 32 KB), so I also had to modify some other NAudio code to pass in larger destination buffers to hold the results.