I'm trying to extract thumbnails from a torrent stream by downloading the first couple of chunks to get the headers, another set of chunks from the middle and then concat them to have a single video file.
For this I'm using nodejs but I'm having trouble with the concatenation part. Obviously the headers include the length of the video so if I simply concat another chunk to the end of the headers chunk, it won't work.
In other words, I have 2 chunks of a video file: The first one contains the headers and some material and the other one is fully composed of a video stream. I want to combine the two to form a single video file
So my question is how can I make this work properly if at all?
I found out that this is actually not possible. It has something to do with the encoded stream and I can't simply make a sparse file out of the two chunks
Related
I am writing my own Opus Ogg writer following these specifications: RFC7845 and RFC3533.
Currently, I am facing an issue that I believe is related to how I am setting the lacing values (segment table).
My current setup is to basically read (using an existing Ogg reader) an Ogg file with a single Opus track and put that Opus track in another Ogg file that I create using my own Ogg writer.
So I have a function that takes the Opus content of each page from the original Ogg file and put it in pages in my new Ogg file.
I am being able to create the file successfully, but when I try playing it on VLC, it shows the correct timestamp but it does not play any sound.
I noticed that the issue is being caused by the way my segment table (or lacing values) is set.
I am currently creating it by filling each segment with as much data as possible (i.e 255 bytes), and letting only the last segment have a size < 255. This seems to be the way that other implementations are doing it (see Rust implementation, C implementation).
However, when I inspect the lacing values for a page containing that Opus content in the original Ogg file, it is not filled with 255s. It's another combination of segment sizes that still sums up to the same page size, but that uses more segments (since it's not taking up the max segment size). When I try using the exact segments combination in the original file, the file plays on VLC successfully.
So that makes me conclude that the approach I am taking with creating as many 255-sized segments is incorrect. Does anyone have any idea how to properly set the lacing values?
I come to a technical problem and I need you.
Situation data:
I record the screen as well as 1 to 2 audio tracks (microphone and speaker).
These three recordings are done separately (it could be mixed but I don't prefer) and every 10s (this is configurable), I send the chunk of recorded data to my backend. We, therefore, have 2 to 3 chunks sent every 10s.
These data chunks are interdependent. Example: The 1st video chunk starts with the headers and a keyframe. The second chunk can be in the middle of a frame. It's like having the entire video and doing a random one-bit split.
The video stream is in h264 in a WebM container. I don't have a lot of control over it.
The audio stream is in opus in a WebM container. I can't use aac directly, nor do I have much control.
Given the reality, the server may be restarted randomly (crash, update, scaled, ...). It doesn't happen often (4 times a week). In addition, the customer can, once the recording ends on his side, close the application or his computer. This will prevent the end of the recording from being sent. Once it reconnects, the missing data chunks are sent. This, therefore, prevents the use of a "live" stream on the backend side.
Goals :
Store video and audio as it is received on the server in cloud storage.
Be able to start playing the video/audio even when the upload has not finished (so in a live stream)
As soon as the last chunks have been received on the server, I want the entire video to be already available in VoD (Video On Demand) with as little delay as possible.
Everything must be distributed with the audios in AAC. The audios can be mixed or not, and mixed or not with the video.
Current and blocking solution:
The most promising solution I have seen is using HLS to support the Live and VoD mode that I need. It would also bring a lot of optimization possibilities for the future.
Video isn't a problem in this context, here's what I do:
Every time I get a data chunk, I append it to a screen.webm file.
Then I spit the file with ffmpeg
ffmpeg -ss {total_duration_in_storage} -i screen.webm -c: v copy -f hls -hls_time 8 -hls_list_size 0 output.m3u8
I ignore the last file unless it's the last chunk.
I upload all the files to the cloud storage along with a newly updated output.m3u8 with the new file information.
Note: total_duration_in_storage corresponds to the time already uploaded
on cloud storage. So the sum of the parts presents in the last output.m3u8.
Note 2: I ignore the last file in point 3 because it allows me to have keyframes in each song of my playlist and therefore to be able to use a seeking which allows segmenting only the parts necessary for each new chunk.
My problem is with the audio. I can use the same method and it works fine, I don't re-encode. But I need to re-encode in aac to be compatible with HLS but also with Safari.
If I re-encode only the new chunks that arrive, there is an auditory glitch
The only possible avenue I have found is to re-encode and segment all the files each time a new chunk comes along. This will be problematic for long recordings (multiple hours).
Do you have any solutions for this problem or another way to achieve my goal?
Thanks a lot for your help!
Say you have an MP3 file and it's 60,000,000 bytes, and you also have an MP3 advertisement that's 500,000 bytes, both encoded at the same bit rate.
Would it be possible using an nginx or apache module to change the MP3 "Content-Length" header value to 60,500,000 and then control the incoming "Content-Range" requests so the first 500,000 bytes return the advertisement audio, and any range request greater than 500,000 begins returning the regular audio file with a 500,000 byte offset?
Or is it only possible to splice advertisements (or messages) into an MP3 file using an application such as FFmpeg to re-render the entire file?
Apologies if this is a stupid question, I'm just trying to think outside of the box.
You cannot arbitrarily splice MP3 without artifacts and decoder errors.
You also generally cannot cut/splice MP3 on frame boundaries due to the Bit Reservoir. Basically, a particular MP3 frame may contain data from another frame to more efficiently use the available bandwidth when its needed. Ignoring the bit reservoir can also cause artifacts and/or decoder errors.
What you can do is re-encode your advertisement and eventually re-join the stream. That is, at the point of ad insertion, decode the stream to PCM, mix (or replace in the audio) for your ad, and have this parallel stream re-encoded to PCM. If the encoding parameters are the same, eventually (after a couple of extra MP3 frames), you'll have identical bitstreams, and you can go back to reading the stream from the same buffer.
If you're doing this for ad-insertion on internet radio (live) streams, keep in mind that you'll have to do this on the server for every client (or at least, for each ad variant and timing variant). If this is for podcasts or other pre-recorded content, I'd recommend the FFmpeg route. You won't have to build anything, you can stream and cache the output as its being encoded, and you'll have compatibility with other codecs without building one-off code for each codec/container.
I've written a script to "normalize" all my FLAC files by stripping unneeded tags, padding tracknumber/discnumber, removing pictures, etc. As part of the normalization process, my script re-compresses the FLAC file to level 8. Since re-compressing an already level-8 FLAC is pointless and time consuming, I want a way to know if the audio of the FLAC file has been changed since my last compression (I don't want to use file modification time because changing the metadata would change this as well). Is there an easy way to get the MD5 hash or something of the FLAC audio section so I can quickly check if it's been altered? Thanks!
I ended up using the python-audio-tools over at http://audiotools.sourceforge.net/. Here's the relevant code, for future reference:
track = audiotools.open('file.flac')
metadata = track.get_metadata()
raw_hash = metadata.get_block(audiotools.flac.Flac_STREAMINFO.BLOCK_ID).md5sum
print(audiotools.hex_string(raw_hash))
How can we distinguish between PCM and BWF format?
Is it necessary for BWF to have "bext" header?
I have some streams that don't have "bext" header but contains "JUNK" header... Are these files BWF files?
Thanks you.
The JUNK chunk is reserved space to allow a BWF file to be converted into an RF64 file on the fly if the size goes over 4GB. The JUNK chunk is the same size as a ds64 chunk, and will be replaced with a ds64 chunk if the conversion to RF64 is needed. Read more about it here.
My reading of the BWF spec is that you have to have a bext chunk for it to be a BWF.
As far as I know, a broadcast wave file will have the 'bext' header extension.
If a file does not have the 'bext' header extension, it will be a normal WAV/AIFF or whatever file.
Broadcast wave headers are used especially if you want to give a file more information about itself in the header which isn't to be seen immediately from its name.
For playing back, this info isn't necessary to know. Just if you want to show or search the meta information somehow.
PCM isn't a file format. All files that handle uncompressed data are PCM files.
Such as WAV/BWF, AIFF or SD2 for example.
With encoded files like MP3 or AAC you get the raw PCM values after decoding.
Yes. The 'bext' chunk is what distinguishes a BWF file from a wav file.
Some manufacturers actually use '.bwf' as a file extension but mostly the '.wav' extension will be used. It is only the presence of this chunk that makes the difference.
Other chunks can also be present and a well designed player will ignore chunks that it doesn't recognize.
Generally the 'data' chunk containing the audio data will be the last one in the file. However I have seen a few examples of other chunks, usually xml metadata, appearing after the 'data' chunk. This confuses some players.
For more information search for tech3285.pdf from the European Broadcasting Union website (tech.EBU.ch).