I noticed that when using Replay-Gain in liquidsoap in order for all the tracks to be played at roughly the same volume that it adjusts the actual volume levels of the track itself and the bass becomes very flat.
I've been searching for some function (or such) to internally correct this within my liquidsoap script but have been unable to find anything alike.
Does such a function/method even exist within the liquidsoap scripting language or will I end up just having to turn up my subwoofer to achieve this?
To clarify: the bass is not completely gone, its just noticeable lower and sounds more flat.
After even more days of searching I managed to find the ladspa.* plugins.
Please note that you will need to install certain dependencies in order to get all of them to work, for me these consisted of:
swh-plugins
caps
I only can confirm this for Debian, although I suspect other Linux systems (i.e. Ubuntu) will be able to use the same packages.
Related
I'm looking for a program that is able to recognize individual audio samples from my computer and reroute them to trigger WAV files from a library. In my project, it would need to be realtime as the latency would not be a desired result. I tried using dictation software that would recognize words to trigger opening a file and that's the direction where I want to go, but instead of words I want it to be sounds and it would happen in realtime. I'm not sure where to go and am just looking for some guidance. Does anyone have any suggestions of what I should do?
That's a fairly broad question, but I can tell you how I would do it. (Hardly the only way, but where I would start.)
If you're looking for real time input, the Java Sound library (excellent tutorial here) allows for that. (Just note that microphone input from a web page is difficult on anything, due to major security concerns, so this would be a desktop application.)
If it needs to be real time, the first thing I would suggest is stream and multithread the hell out of it. I would suggest the Java 8 Stream API, but since you're looking for subsamples that match a specific pattern, then each data point will have to be aware of the state of its neighbors, and that isn't easy with streams.
You will probably want to know if a sound roughly resembles an audio profile, so for that, I would pick a tolerance on just how close you want it to be for a match (remembering that samples may not line up 100% anyway, so "exact" is not an option), and then look up Hidden Markov Models. I suggest these because they're what voice recognition software typically uses, and while your sounds may not be voices, it will give you an idea of what has already been done.
You'll also want to maintain a limited list of audio samples in memory. Specifically, you will likely need the most recent data, because an audio signal is a time-variant signal, and you can't get a match from just one point. I wouldn't make it much longer than the longest sample you're looking to recognize, as audio takes up a boatload of memory.
Lastly (for audio), I would recommend picking a standard format for comparison. Make it as good as gets you decent results, and start high. You will want to convert everything to that format before you compare it.
Once you recognize a specific sound, it's basically a Command Pattern. Specific sounds can be mapped, even with a java.util.HashMap, to specific files, which (if there are few enough) you might even have pre-loaded.
Lastly, it's worth looking at the Java Speech API. It's not part of the JDK and it's quite dated, but you might get some good advice from its implementation.
This is of course the advice of a Java-preferring programmer, but I imagine that there might be some decent libraries in Python and Ruby to help you as well; and of course there's something in C somewhere. This may sound like a lot, but most of the material is already implemented and ready-to-go.
Hopefully this helps, let's look forward to other answers.
I want to start on a hobby project that focuses on displaying audio files in a folder in a certain fashion and has the ability to play such an audio file and shows basic control options for playing. However, i'm struggling to find a fit programming language for this.
The displaying part shouldn't be too hard and can probably be done in most of the programming languages. The audio part is what concerns me the most since it's not the main focus of the project and should only do limited things (so it shouldn't be too hard) and i do not know anything about sound support in the programming languages i currently know. (Java, C and C++)
Specifically i would like to be able to do these things:
Play a sound file
Stop/pause a playing song
Adjust volume
Show a bar that displays the current position in the song
Most files will be .mp3 files but being able to process other formats is certainly a plus. Since this is just a small project it's ok if it runs just on Windows. Scalabilty would be nice but not required.
It would be nice to have a small overview of audio support/audio libraries of programming languages (i'm always up for something new) that can accomplish these simple things, in a not too complicated way, aswell as personal experiences.
In this way i hope to create a better understanding of which programming language fits my project best. (i would very much like to not have to change language mid-way the project)
--
Edit:
This is only for a later stage of the project if the first part was successfull: i will want to change the file names of the audio files that are displayed. (to make them follow a specific format)
I haven't written audio processing programs much, but I know a lot of them exist for C and C++. For Java perhaps, too, but I don't know Java. I had used audio with SDL in a game, but that doesn't have that many features and I don't recommend it.
There's this question asking for a library in C, and there are a couple of similar questions that SO brings up on the side. You may want to take a look at those.
You would also need to look for a library that loads different file types. SDL at least, only opens .wav files, which I believe most of the playback libraries would support. For MP3, you will most likely need an additional library. I know Audacity uses LAME Mp3 so I'm guessing that should be good.
Some of the functionalities you want is also doable by yourself. For example, knowing the length of the music and the amount you have already read, you will know how far in the audio you are. Adjusting the volume is also a multiplication (in the simplest case) that you can do on the audio data if the library doesn't provide it.
A very good choice seems to be PortAudio which is used by Audacity, and also recommended in the accepted answer of the question I mentioned above.
I've done audio apps in both Java and C++. Java development goes way faster because it's a more powerful language and has garbage collection, but JavaSound is a pretty awful solution for audio. Of course, there are wrappers for FFMPEG and other stuff, so you can get a lot of things working. Here's an example of a Java audio app: http://www.indabamusic.com/help/mantis
OTOH, C++ gives you lots of control, low latency and wealth of libraries. (another answer mentioned Portaudio, which is, indeed, great.) But you will definitely find it also has a much longer development cycle.
You can certainly do everything you want to do with either language.
I need to develop a program that toggles a particular audio track on or off when it recognizes a parrot scream or screech. The software would need to recognize a particular range of sounds and allow some variations in the range (as a parrot likely won't replicate its sreeches EXACTLY each time).
Example: Bird screeches, no audio. Bird stops screeching for five seconds, audio track praising the bird plays. Regular chattering needs to be ignored completely, as it is not to be discouraged.
I've heard of java libraries that have speech recognition with dictionaries built in, but the software would need to be taught the particular sounds that my particular parrot makes - not words or any random bird sound. In addition as I mentioned above, it would need to allow for slight variation in the sound, as the screech will likely never be 100% identical to the recorded version.
What would be the best way to go about this/what language should I look into?
Edit: Alternatively (and perhaps this would be a more simple solution), is there a way to make the audio toggle based on the volume of input? So it wouldn't matter what kind of sound the parrot makes, just how loud it is?
This question seems to be tightly related to voice recognition. I would recomend taking a look at this post: How to convert human voice into digital format?
I have a bunch of video files that I want to process. I want to write a program that can find the audio peaks in each file and return the times where those peaks occurred.
I've looked for a lot of different APIs in different languages but couldn't get any of them to work. I am partial to php and java, so if anyone knows any good audio processing libraries in those languages that would be great! But really I don't care too much about the language. I will need to run this program on a cron.
Also, is it possible to use system calls to ffmpeg from within a script to accomplish this? Thanks in advance.
While I've only used this to work directly with audio files, the python wrapper around theechonest's audio analysis service can slurp in the audio from various video files. It uses ffmpegs shared libs to do this, though I find this wrapper much easier to work with via python then the command line.
Of particular interest within the api is echonest.video which is, to quote the docs:
Framework that turns video into silly putty.
I'd add a couple other helpful urls but apparently I can only add one since I don't have a reputation...
anyway, hopefully that's a helpful lead.
I am using SoX to create slow but pitch corrected audio files. The resulting files sound pretty good, but often have a very hard "S" sound that I would like to filter out. Many desktop programs include a "De-Essing" filter that works well, but I would like to have a filter that works on the server side.
What SoX filter and parameters should I use to De-Ess an audio file?
Edit: I should add that this needs to work on Linux.
There is a LADSPA DeEsser plugin that can be used from SoX. You need to have tap plugins installed and properly configured on your system. On Archlinux this can be easily achieved with
pacman -S tap-plugins
You can specify threshold and frequency as first and second arguments. I succesfully used a variant of the following command
# -30: threshold (dB)
# 6200: hiss frequency (Hz)
sox from.wav to.wav ladspa tap_deesser tap_deesser -30 6200
The filter has a fistful of other options I did not analyzed. More details can be found here.
While far from perfect, you may be able to get sufficient results by a suitable low-pass filter. That should not affect other parts of a speech signal too much.
You could use a de-esser VST such as spitfish and a command-line VST host such as MissWatson. Sox has very limited plugin support, so if you need something more specific, you're better off going the VST route.