I've gotten a Opus stream (specifically one from Discord voice servers), and I'm trying to convert it to a .wav file that can be used for DeepSpeech. I've already done some internet digging, and found opusdec, which almost perfectly fits my use case. I exported some test files to test upon, but when I tried to use them with opusdec, I got a error:
$ opusdec 293434418808314550478788892773147202909.opus
Decoding complete.
This doesn't look like a Opus file
(the file used above can be downloaded here)
I know these files have no Ogg container in them, so what I'm mainly looking for is a nice way to create a .opus file with a Ogg container in Rust. All the libraries I have found so far seem to be poorly documented, and given how long compiling a test program on my laptop takes, I'm reluctant to do the fiddling around required to use them.
So far, to convert the files to .wav, I've tried using ffmpeg with multiple types of formats passed in (-f s16be, -f libopus) but they either threw a error or resulted in valid wav files, but with static as their contents.
I'm open to a completely different way of doing this, if any are suggested.
Thanks in advance!
Related
Currently, I am implementing a new feature of my software using the Libav API. This is the requirement: to merge a list of audio files (MP3 and WAV) and create a unique
audio file (MP3) as output. Note: The challenge is not about concatenating files, but merging them. When the output sound is played, all the input audio content must sound at the same time, as when you merge several files in a video editor.
I was researching about Libav audio streams, and I am just guessing that my requirement is related to the "channels" concept, I mean, that there is possible to include several audios in the stream, using one channel per audio or something like that. I was hoping to find more information about this topic, but FFmpeg/Libav documentation is actually scarce.
Right now, I am able to merge several audio streams to a video stream successfully and I can create a playable MP4 file. My problem is that players like MPlayer/VLC only reproduce the first audio stream with the video, the other two audio streams are ignored.
I was looking at the set of examples included in the FFmpeg source code, but there is nothing specifically related to my requirement, so I would appreciate any
source code reference or algorithm explanation about how to merge several audio files into one using libav. Thanks.
Update:
The ffmpeg command to merge several audio files requires de filter flag "amix", like in this example:
ffmpeg -i 1.mp3 -i 2.mp3 -i 3.mp3 -filter_complex amix=inputs=3:duration=first result.mp3
All the syntax related to this option is described in the FFmpeg Documentation
Checking the FFmpeg source code, it seems the amix feature implementation is included in the file af_amix.c
I am not 100% sure, but it seems the general algorithm is described in the function:
static int activate(AVFilterContext *ctx)
Do you know how to merge several audio files using command line ffmpeg? It would help you if you first understand how to do it with the ffmpeg command then reverse engineer how it achieves it. It's all about how to constrct a filtergraph and pass data through it.
As for examples, check out examples/filter_audio.c and examples/filtering_audio.c
This C example gets two WAV audio files and merges them to generate a new WAV file using ffmpeg-4.4 API. Tip: The key of the process is to use these filters: abuffer, amix and abuffersink.
https://github.com/xtingray/audio_mixer/
Although it doesn't support MP3 format as the output, it gives you the basics to understand how to implement your own requirements. I hope it can be handy for anyone looking for references about this specific topic.
I'm writing some code where I rely on the file utility to determine the file type of arbitrary files, typically audio files. For the most part, it works great, an ogg file for example might give the following output:
Ogg data, Vorbis audio, mono, 44100 Hz, ~80000 bps, created by: Xiph.Org libVorbis I (1.0.1)
A simple regexp can classify this as ogg vorbis.
But for some other file types, file tries to get clever, an nsf (NES sound format) file for example, can yield this output:
NES Sound File ("The Legend of Zelda" by Konchano, copyright 1987 Nintendo), version 1, 8 tracks, NTSC
"NES Sound File" is clear enough, but it is followed by a string of unstructured data that is clearly just copied from the file itself. A malicious user could create an nsf file where this string is replaced by something like "Ogg data, Vorbis audio", making classification a lot harder.
Now let's say I fix this by discarding anything within parentheses (ignoring the fact that the title of the track could itself contain parentheses), along comes a Protracker module:
4-channel Protracker module sound data Title: "space_debris"
Again, untrusted data straight from the file, in a different position, now with the prefix "Title:". I can attempt to filter it out but really this is becoming a hassle.
I'm not finding any help in the man page. Is there really no way to tell file not to mix these unsafe strings into its output? Or is file simply not the right tool for this job?
Apple gives an example of support for byte-range segments in m3u8 files for HLS
#EXTM3U
#EXT-X-TARGETDURATION:11
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-VERSION:4
#EXTINF:10.0,
#EXT-X-BYTERANGE:75232#0
media.ts
#EXTINF:10.0,
#EXT-X-BYTERANGE:82112#752321
media.ts
#EXTINF:10.0,
#EXT-X-BYTERANGE:69864
media.ts
But I cannot figure out how to create such playlist for given .ts file.
Are there any tools for that?
There is -hls_flags as a ffmpeg option. (https://www.ffmpeg.org/ffmpeg-formats.html)
Following command generates single ts file which is segmented by byte range feature(supported from HLS version 4) in m3u8 index file.
$ ffmpeg -i sample.mp3 -hls_time 20 -hls_flags single_file out.m3u8
Looks like
ffprobe -show_frames media.ts -print_format json
gives enough information about frames to build such playlist, although some scripting will be required to construct it.
I'll update this answer with script if I succeed with that approach.
Update:
Here is couple of useful links I've found by now:
Bash scripts for generating iframe playlists - needs a bit of optimization, as it calls ffprobe multiple times
iframe-playlist-generator - project on python that can be used to generate iframe playlists from usual ones
It is not exactly what I've searched initially, but I-Frame playlists are similar to byte-range ones and fit for my task even better, so I'm going to use these two projects as a reference/starting point to create something a bit more suitable for me.
The projects actually use different methods to find size of I-Frame - the bash script just uses what ffprobe shows in pkt_size, and the python project adds a bit of voodoo by calculating size as difference of positions of packets and adding 188 to match example playlists from apple. 188 bytes is the size of mpeg-ts packet, probably that is related somehow, I have not managed to understand how, however. This difference in size calculation causes different playlists to be generated, probably one of them is incorrect in some way, but actually VLC plays both without any problems, so I'm going to stick to simpler method until it will be proven as incorrect.
Update 2:
I've created a ruby module that can extract I-Frame information of given .ts file with ffprobe and build both I-Frame and usual byterange m3u8 playlist (as it was requested in question) based on that information.
I've found the simple method of creating I-Frame playlist I mentioned before to be incorrect, so I used the method from iframe-playlist-generator. The output is almost similar to the I-Frame playlist generated by mediafilesegmenter -output-single-file -file-base output-dir/ input.ts, mentioned by Duvrai, but sometimes there are some 188-byte size misses for random frames, I could not understand the pattern, so it is currently ignored.
You can use a standard segmenter such as Apple's mediafilesegmenter, check the lengths of the files, and then concatenate (with the cat program) them into a single file. From the file sizes you have all the information needed to specify the byte ranges in a playlist file.
Not as nice as just downloading a tool from the net, but it's not a very complicated algorithm.
Unified Streaming also offers a tool that can do this for you:
mp4split --package-hls output-single-file -o prog_index.m3u8 input.mp4
This is part of their commercial streaming package (they offer a free trial upon request). They also provide an Amazon AWS instance with hourly fees.
I've been asked to sample some data in a .wac file type. I'm not familiar with this standard and there is very little on the internet with regards to this format. I got given the .wav file but I don't think it was converted correctly, in that there was a none existent of the RIFF header so no .wav reader was able to read it.
Could anyone therefore shed some light into how I could possibly convert the .wac file into a .wav file? Doing some research, I cannot seem to find a converter tool on the internet, and, MatLab does not have a module for reading in .wac data.
NOTE: I've put the tag "game-engine" because according to this website: Here it is used in the infinity game engine.
I've come up with the following solution, however, massive thanks to #jpaari for his input.
Basically, I used sox:
sox -r 44100 -e unsigned -b 8 -c 1 input.raw output.wav
I was able to re-name the file to .raw and this worked. I'm going to update the Sample Rate to what #Aybe posted.
Try this http://www.shsforums.net/topic/39117-ps-gui-v304/
I think Audacity can do it aswell. Also the "unity3d" tag is not quite right.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've got many, many mp3 files that I would like to merge into a single file. I've used the command line method
copy /b 1.mp3+2.mp3 3.mp3
but it's a pain when there's a lot of them and their namings are inconsistent. The time never seems to come out right either.
David's answer is correct that just concatenating the files will leave ID3 tags scattered inside (although this doesn't normally affect playback, so you can do "copy /b" or on UNIX "cat a.mp3 b.mp3 > combined.mp3" in a pinch).
However, mp3wrap isn't exactly the right tool to just combine multiple MP3s into one "clean" file. Rather than using ID3, it actually inserts its own custom data format in amongst the MP3 frames (the "wrap" part), which causes issues with playback, particularly on iTunes and iPods. Although the file will play back fine if you just let them run from start to finish (because players will skip these is arbitrary non-MPEG bytes) the file duration and bitrate will be reported incorrectly, which breaks seeking. Also, mp3wrap will wipe out all your ID3 metadata, including cover art, and fail to update the VBR header with the correct file length.
mp3cat on its own will produce a good concatenated data file (so, better than mp3wrap), but it also strips ID3 tags and fails to update the VBR header with the correct length of the joined file.
Here's a good explanation of these issues and method (two actually) to combine MP3 files and produce a "clean" final result with original metadata intact -- it's command-line so works on Mac/Linux/BSD etc. It uses:
mp3cat to combine the MPEG data frames only into a continuous file, then
id3cp to copy all metadata over to the combined file, and finally
VBRFix to update the VBR header.
For a Windows GUI tool, take a look at Merge MP3 -- it takes care of everything. (VBRFix also comes in GUI form, but it doesn't do the joining.)
As Thomas Owens pointed out, simply concatenating the files will leave multiple ID3 headers scattered throughout the resulting concatenated file - so the time/bitrate info will be wildly wrong.
You're going to need to use a tool which can combine the audio data for you.
mp3wrap would be ideal for this - it's designed to join together MP3 files, without needing to decode + re-encode the data (which would result in a loss of audio quality) and will also deal with the ID3 tags intelligently.
The resulting file can also be split back into its component parts using the mp3splt tool - mp3wrap adds information to the IDv3 comment to allow this.
Use ffmpeg or a similar tool to convert all of your MP3s into a consistent format, e.g.
ffmpeg -i originalA.mp3 -f mp3 -ab 128kb -ar 44100 -ac 2 intermediateA.mp3
ffmpeg -i originalB.mp3 -f mp3 -ab 128kb -ar 44100 -ac 2 intermediateB.mp3
Then, at runtime, concat your files together:
cat intermediateA.mp3 intermediateB.mp3 > output.mp3
Finally, run them through the tool MP3Val to fix any stream errors without forcing a full re-encode:
mp3val output.mp3 -f -nb
The time problem has to do with the ID3 headers of the MP3 files, which is something your method isn't taking into account as the entire file is copied.
Do you have a language of choice that you want to use or doesn't it matter? That will affect what libraries are available that support the operations you want.
MP3 files have headers you need to respect.
You could ether use a library like Open Source Audio Library Project and write a tool around it.
Or you can use a tool that understands mp3 files like Audacity.
What I really wanted was a GUI to reorder them and output them as one file
Playlist Producer does exactly that, decoding and reencoding them into a combined MP3. It's designed for creating mix tapes or simple podcasts, but you might find it useful.
(Disclosure: I wrote the software, and I profit if you buy the Pro Edition. The Lite edition is a free version with a few limitations).
As David says, mp3wrap is the way to go. However, I found that it didn't fix the audio length header, so iTunes refused to play the whole file even though all the data was there. (I merged three 7-minute files, but it only saw up to the first 7 minutes.)
I dug up this blog post, which explains how to fix this and also how to copy the ID3 tags over from the original files (on its own, mp3wrap deletes your ID3 tags). Or to just copy the tags (using id3cp from id3lib), do:
id3cp original.mp3 new.mp3
I would use Winamp to do this. Create a playlist of files you want to merge into one, select Disk Writer output plugin, choose filename and you're done. The file you will get will be correct MP3 file and you can set bitrate etc.
I'd not heard of mp3wrap before. Looks great. I'm guessing someone's made it into a gui as well somewhere. But, just to respond to the original post, I've written a gui that does the COPY /b method. So, under the covers, nothing new under the sun, but the program is all about making the process less painful if you have a lot of files to merge...AND you don't want to re-encode AND each set of files to merge are the same bitrate. If you have that (and you're on Windows), check out Mp3Merge at: http://www.leighweb.com/david/mp3merge and see if that's what you're looking for.
If you want something free with a simple user interface that makes a completely clean mp3 I recommend MP3 Joiner.
Features:
Strips ID3 data (both ID3v1 and ID3v2.x) and doesn't add it's own (unlike mp3wrap)
Lossless joining (doesn't decode and re-encode the .mp3s). No codecs required.
Simple UI (see below)
Low memory usage (uses streams)
Very fast (compared to mp3wrap)
I wrote it :) - so you can request features and I'll add them.
Links:
MP3 Joiner website: Here
Latest installer: Here
Personally I would use something like mplayer with the audio pass though option eg -oac copy
Instead of using the command line to do
copy /b 1.mp3+2.mp3 3.mp3
you could instead use "The Rename" to rename all the MP3 fragments into a series of names that are in order based on some kind of counter. Then you could just use the same command line format but change it a little to:
copy /b *.mp3 output_name.mp3
That is assuming you ripped all of these fragment MP3's at the same time and they have the same audio settings. Worked great for me when I was converting an Audio book I had in .aa to a single .mp3. I had to burn all the .aa files to 9 CD's then rip all 9 CD's and then I was left with about 90 mp3's. Really a pain in the a55.