Differences in WAV format (JS / NodeJS) - node.js

I'm trying to use WebRTC to record audio and then store it on the server side. My server is made using NodeJS and express, and I'm using POST to transmit the data from the client to the server.
On the client I'm translating the data from the wav BLOB to base64, transfer that, and on the server side, read it, translate it to binary, and then write it in a file. Should be fine, right?
There's just one problem : I'm getting some really bad inconsistencies between what you can download from the client, and what gets sent to the server. Sometimes it's added bytes, other times it's just deleted chunks of data. If it were just bytes added, that would mean a charset problem (translating from one to another, and then another, etc), but at some points I had 280 bytes added for example.
I've added a picture here of a hex diff :
http://i.stack.imgur.com/psqf4.png (sorry, I don't have enough reputation so far to post an image directly)
Also, running file with these gives me the following :
(uuid.wav is the server one, while output (1).wav is the client one)
9F2B75D3-4C34-4C8F-935E-FC7637D7A054.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 4 bit, stereo 11321924 Hz
output (1).wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz
... so clearly something is going wrong here. Also, trying to fix the headers, or convert the WAV gives me an error that goes along the lines of : could not find data chunk / data chunk has size 0.
Any ideas what might be causing this?

This looks suspiciously like some layer of code is attempting to convert the binary data to Unicode. 0x44 0xAC (which is 0xAC44 in little endian, which is 44100, which indicates the 44.1 kHz sample rate) is turning into 0x44 0xC2 0xAC. This gets byte-swapped to 0x00ACC244, which is 11321924 Hz, which reconciles with what you saw in the corrupted file.
Those 0xC2 additions definitely look like Unicode (UTF-8) artifacts. I don't know exactly which data types and functions you are using, but you will need to audit the steps to make sure none of them attempt to do implicit Unicode conversions.

Related

Is it possible to splice advertisements or messages dynamically into an MP3 file via a standard GET request?

Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.

Decoding incomplete audio file

I was given an uncompressed .wav audio file (360 mb) which seems to be broken. The file was recorded using a small usb recorder (I don't have more information about the recorder at this moment). It was unreadable by any player and I've tried GSpot (https://www.headbands.com/gspot/) to detect whether it was perhaps of a different format than wav but to no avail. The file is big, which hints at it being in some uncompressed format. It misses the RIFF-WAVE characters at the start of the file though, which can be an indication this is some other format or perhaps (more likely in this case) the header is missing.
I've tried converting the bytes of the file directly to audio and this creates a VERY noisy audio file, though voices could be made out and I was able to determine the sample rate was probably 22050hz (given a sample size of 8-bits) and a file length of about 4 hours and 45 minutes. Running it through some filters in Audition resulted in a file that was understandable in some places, but still way too noisy in others.
Next I tried running the data through some java code that produces an image out of the bytes, and it showed me lots of noise, but also 3 byte separations every 1024 bytes. First a byte close to either 0 or 255 (but not 100%), then a byte representing a number distributed somewhere around 25 (but with some variation), and then a 00000000 (always, 100%). The first 'chunk header' (as I suppose these are) is located at 513 bytes into the file, again close to a 2-power, like the chunk size. Seems a bit too perfect for coincidence, so I'm mentioning it as it could be important. https://imgur.com/a/sgZ0JFS, the first image shows a 1024x1024 image showing the first 1mb of the file (row-wise) and the second image shows the distribution of the 3 'chunk header' bytes.
Next to these headers, the file also has areas that clearly show structure, almost wave-like structures. I suppose this is the actual audio I'm after, but it's riddled with noise: https://imgur.com/a/sgZ0JFS, third image, showing a region of the file with audio structures.
I also created a histogram for the entire file (ignoring the 3-byte 'chunk headers'): https://imgur.com/a/sgZ0JFS, fourth image. I've flipped the lower half of the range as I think audio data should be centered around some mean value, but correct me if I'm wrong. Maybe the non-symmetric nature of the histogram has something to do with signed/unsigned data or two's-complement. Perhaps the data representation is in 8-bit floats or something similar, I don't know.
I've ran into a wall now. I have no idea what else I can try. Is there anyone out there that sees something I missed. Perhaps someone can give me some pointers what else to try. I would really like to extract the audio data out of this file, as it contains some important information.
Sorry for the bother. I've been able to track down the owner of the voice recorder and had him record me a minute of audio with it and send me that file. I was able to determine the audio was IMA 4-bit ADPCM encoded, 16-bit audio at 48000hz. Looking at the structure of the file I realized simple placing the header of the good file in front of the data of the bad file should be possible, and lo and behold I had a working file again :)
I'm still very much interested how that ADPCM works and if I can write my own decoder, but that's for another day when I'm strolling on wikipedia again. Have a great day everyone!

Correct way to encode Kinect audio with lame.exe

I receive data from a Kinect v2, which is (I believe, information is hard to find) 16kHz mono audio in 32-bit floating point PCM. The data arrives in up to 4 "SubFrames", which contain 256 samples each.
When I send this data to lame.exe with -r -s 16 --bitwidth 32 -m m I get an output containing gaps (supposedly where the second channel should be). These command line switches should however take stereo and downmix it to mono.
I've also tried importing the raw data into Audacity, but I still can't figure out the correct way to get continuous audio out of it.
EDIT: I can get continuous audio when I only save the first SubFrame. The audio still doesn't sound right though.
In the end I went with Ogg Vorbis. A free format, so no problems there either. I use the following command line switches for oggenc2.exe:
oggenc2.exe --raw-format=3 --raw-chan=1 --raw-rate=16000 - --output=[filename]

MJPEG data doesn't have SOI marker. What kind of mjpeg I'm dealing with?

I'm writing a simple MJPEG streamer in Java.
Since it has very specific application, I had to reverse engineer the rtp packets sent by the GStreamer using the Wireshark sniffer. The GStreamer, in turn, just restreams the mjpeg avi file taken from specific camera.
The first and the most straightforward way is to prepare packets as an array of data and send them to the consumer, particularly VLC player, that uses live555 library. I'm populating the rtp headers well, so the internal payload is hardcoded; in my experiment VLC successfully displays the encoded screen, i.e. it decodes the hardcoded rtp payload well.
So, that is the value of first two packets payload (from total 25) in hex view:
String[] hexArray =
{
"0000000000ffa05a00000080100b0c0e0c0a100e0d0e1211101318281a181616183123251d283a333d3c3933383740485c4e404457453738506d51575f626768673e4d71797064785c6567631112121815182f1a1a2f634238426363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363636363ad8a5a00296801692800a5ed40052d001450014b40052f5a004dbce694e68000c7d29435002e452e68017a8a2800c51400518c500141a004c5348a0031498a005a2800a2800a2800a28014514001a2800a51400b450014500251400941a004a2800a5a004c668a004a5a00292800a5a0029734006697340052d0019a5cd004d15c4917dd6e3d0d5d8751523f7831ee2802d32c3728090187622aacba79eb1367d8d00547478db0ea54fbd20a0008a4e94005140052e6801411de9719a00694c8e29bb4d001d29680145385002629306800e476a379e38a00a9711bc97d0b80762839fd2926b677ba128e9b81340178e49a4a0028a004a2800a2801ebcd0450018a422801299707103ffba6802b45129d1d1d87ce4819a6c56ed2c45d0f4ed4010c91329c329151b479a008da32298548a004a334012a4acbdea64ba1d186280275955ba1a75002e714e0fc608cd002ec46e876d4b1d94afca1523d734012fd825c7381504700695e3923f987dd6cf7a009134fee5003f5a71b50a083b4714019de63472ba30fba719f5a956e79a0099270dc77a833feb1cf6a0087c3d8315c49d8c86b54d001d39a963b9963e8d91e86802c2df82312c60fd283159ce3e5f9188fa50042fa5b6331386fad5592d668cfcc87ea28021a280108a4c500262931400845215a00694cd31d38a00a8edf3800f5ab7167cb258f4a004a4ed4007bd283400b450014b40051400b4500140a005cd1400b45002d2e2801368349b481807a50028c8a5dd8eb400bb8529a000506800a2800a280034dc50018a4a0028a0028a0028a00296800a2800a05002d1400b4500145002628c5001494009de96800a280128a00292800a2800a5a0028068017345002e696800a5cd00392578db2848abb0ea2540120dc3d6802ec734370080437b1a8a5b147c98ced3e9da8029c904917df5e3d474a8e801297140094b400519a007034bd6801a541a69522800cfad2e680141a703400b46280136d2e280168c0228010ad26d34006293140062931400e14fc50021146df949c8a006e3bd57bdff8f57f718a0046f9747b65f5c1a92c86db66f76a0091803503db46dd06d3ed4015ded9d738f9876aaee9ce08c1a0089a3a8ca500260d1400a1b078a952775ef9a009d2e54f0dc54c1c1e86801c0d3d6464395623e940121bb988ff00586985d8924b1c9a008e57900cef6fceab34ae5c61cf1d6802c8903f2c050608dcf1c6680226b778db767a75fa5131db67330e323028022d29bc8d3e3ec5f923deafa5c83de80255954f7a76e1400b45003d26923395722ac25fb8e1d430a007f99693ff00ac4009f6a6369b138cc52fe7401564b09d3f8770f6aaecaca70ca41f7a0069146280022931400dc5262802bc96b193bb6f23de95b88cfd2800a3b5001411400500e68014519a00052d002f4a2800a2800a05002d2d0019a5cd002e68a005a050018e68c500041ed4738a003340606801720d2d001450036908c50025140051400514001a5a0028a002945002d250014b4005140052d002518a004c5140084514005140098a08a0028a00292800a280168a005a33400b9a5a00296801c188390715662be963c03f30f7a00bd15e4528c13b4fa1a596d62979c6d6f51401525b2953ee8de3d4557f62306800a4c50014a680014b400669d9a000804629bb3d2801304500f3400b9a706a00506945001462800a28016930280108a4c50018a70340013450004554d40edb5ff810fe74005e7cb6b6d1f60b9a9edc62", //1
"000004ec00ffa05ad23f53c9a0075260500211cd35a35718619a00aef66adf74e2abc96cebd5723da802131d4663a00615c52722800cd2ab9071934013a5cb0ebcd4cb72a7af140130604706968003cd44f0ab1c8e0d0048221b786e7de8daebd680012718a64f189ad9a20719e45003e2b74fb3c48c7e64183ef4c7b47ce55b1400d2658f191d0734b1dd6680275ba1532cc08eb400f0c294e2800a504af20e0d004e97b3271907eb538bc8a5189a3feb40086d6d67e51b6fb0a824d324519460d40151e0963fbc845474005262801920e2abce76c4dec2801d4940051d4d0021eb4a38a00297140074a5068003d68cd0014a38140051400b466800a5a0029680168cd0028a075a005ef4668014e29314009b79cd1839ce6800e7346ef6a000114b40084669b834008296800068a005a4a00296800a5a0028a0031450014a0d002d25002d140098a280131477a004a2800a280131462800a2800a2800a2800a2800a5cd002d2d0014b9a00506a68aea48beeb71e86802f43a8231c48369f5ed561922b850480c3b11401525b060731b6e1e86aaba321c3020fbd00251400519a005a41400a0d381a005e0d26d1400d2b8a43c50000d3b3400a1a9c0d0014500145001d697140098a5071400871da9a3ad002e6a9ea4730aafabaff3a005d48fef2251d92acc38fb3463da801d8a154b1c0ea68002b8241ea28c500376d211400c7851faad5692cffb87f034015e48193ef2d4263a00618e98460d001466801eb2329e0d4eb72470c280265995bbd48083400b9a5dc68011f0c3a60fad33045003be6039a72c8477a0078914f0c3229ad04321ce3140113da1504a1a882cabfc3400e5b82ad86e2a64ba05b19e940132ce0f7a915c3500381a5a004a952e254fbae7f1a00b09a81e049183f4a5c594e390149fc280227d3323314991ef55a4b39e3ce53207714015241818354af8e2dcfb9a009e92800029318a004ef4b400519cd002d038a005cd140052d00140a0028a00052e79a005a280169450014b40051400b4b400521a000d2e280108a4d9ce6800c11df349c8eb40099e69280168a005a4a005a3b500252d0014b4005140051400b40a005a28016931400946280128c500149400514001a4a0028a0028a004a280168a005a2801734b9a00296800cd491c8f19ca311401761d408e2519f7156d648a71c10d9ed4010cb60a7988edf6354e58258befaf1ea3a50047450019a5cd0019a280141a50d400b9a0806801a57d29bcd00283466801c1a9dba801783498a0028068017349400628228012a95f73244bfed668013516cdd7d302ae8e2341ed40052e6800dc73934a2800146280108cd3769a0031eb51496d1b83c60fb500577b361f77e6aaaf110704608a008da3a618e801b82282680141c53d6565e8680274b9fef0a996556e86801f9a5a0050d8a3e461c8e7d6800f2f8e0e693e65eb400a1cf4cd49bc746140018e293d326a07b21bb2a714010b452a1c0e68499875c8a009d6e7d4d4cb7008e680241203de9fb85002f1498e6801caee872ac454e97d2a8c361beb4014f559567f28c636b03f37b8ac7d41c2f94a7a33806802d5140094b400d22968010d274a005cd2d0014a0d0014500145002d1400b45001d296800ef477a005a51400a0f34b40051400b40e7ad0018e694f4a004a05002d21140098a42b91400d2b4878a00514b4001a4a005a280168a0028a0028a00296800cd19a005a33400b450025262801283400521a0028a0028a004a2800a280128a002945002d19a005a01a005a5a0001a72b10410706802e417ee9c3fcc3f5abb0dd4530c6707d0d0012da452f206d3ea2a9cb672c63206f1ea2802be292800a2800a5a0039a5cd0019a5a00695f4a6f22800cd2eea0050d4e0d400bba8cd0019a5cd002e68a004c550ba39d42dd3d89fa7228021ba7dd76deed5a8dc1c7b500369680128cd002834b", //2
According to the spec, the first bytes of the first packet are:
0000000000ffa05a
this is Type-specific, Fragment Offset, Type, Q, Width, Height, MBZ, Precision, Length
the next data starting with 0000008010 and ending with 636363 is Quantization Table 128 bytes long.
So, the question is what is the data starting right after it?
ad8a5a0029
According to specs, no one of it can be recognized as possible JPEG marker, so the question is what is the encoding used and how can I encode the image I like with this particular encoding?
Attempt to encode on the manner of http://www.media.mit.edu/pia/Research/deepview/src/JpegEncoder.java works well for the jpeg files, but is not decoded with VLC's live555.

Live audio streaming container formats

When I start receiving the live audio (radio) stream (e.g. MP3 or AAC) I think the received data are not kind of raw bitstream (i.e. raw encoder output), but they are always wrapped into some container format. If this assumption is correct, then I guess I cannot start streaming from arbitrary place of the stream, but I have to wait to some sync byte. Is that right? Is it usual to have some sync bytes? Is there any header following the sync byte, from which I can guess the used codec, number of channels, sample rate, etc.?
When I connect to live stream, will I receive data starting by the nearest sync byte or I will get them from the actual position and I have to check for the sync byte first?
Some streams like icecast use headers in the HTTP response, where stream related information are included, but i think i can skip them and deal directly with the steam format.
Is that correct?
Regards,
STeN
When you look at SHOUTcast/Icecast, the data that comes across is pure MPEG Layer III audio data, and nothing more. (Provided you haven't requested metadata.)
It can be cut at an arbitrary place, so you need to sync to the stream. This is usually done by finding a potential header, and using the data in that header to find sequential headers. Once you have found a few frame headers, you can safely assume you have synced up to the stream and start decoding for playback.
Again, there is no "container format" for these. It's just raw data.
Now, if you want metadata, you have to request it from the server. The data is then just injected into the stream every x number of bytes. See http://www.smackfu.com/stuff/programming/shoutcast.html.
Doom9 has great starting info about both mpeg and aac frame formats. Shoutcast will add some 'metadata' now and then, and it's really trivial. The thing I want to share with you is this; I have an application that can capture all kind of streams, and shoutcast, both aac and mp3 is among them. First versions had their files cut at arbitrary point according to the time, for example every 5 minutes, regardless of the mp3/aac frames. It was somehow OK for the mp3 (the files were playable) but was very bad for aacplus.
The thing is - aacplus decoder ISN'T that forgiving about wrong data, and I had everything from access violations to mysterious software shutdowns with no errors of any kind.
Anyway, if you want to capture stream, open the socket to the server, read the response, you'll have some header there, then use that info to strip metadata that will be injected now and then. Use the header information for both aacplus and mp3 to determine frame boundaries, and try to honor them and split the file at the right place.
mp3 frame header:
http://www.mp3-tech.org/programmer/frame_header.html
aacplus frame header:
http://wiki.multimedia.cx/index.php?title=Understanding_AAC
also this:
aacplus frame alignment problems
Unfortunately it's not always that easy, check the format and notes here:
MPEG frame header format
I will continue the discussion byu answering myself (even we are discouraged to do that):
I was also looking into streamed data and I have found that frequently the sequence ff f3 82 70 is repeated - this I suggest is the MPEG frame header, so I try to look what that means:
ff f3 82 70 (hex) = 11111111 11110011 10000010 01110000 (bin)
Analysis
11111111111 | SYNC
10 | MPEG version 2
01 | Layer III
1 | No CRC
1000 | 64 kbps
00 | 22050Hz
1 | Padding
0 | Private
01 | Joint stereo
11 | ...
Any comments to that?
When starting receiving the streaming data, should I discard all data prior this header before giving the buffer to the class which deals with the DSP? I know this can be implementation specific, but I would like to know what are in general the proceedings here...
BR
STeN

Resources