My question is not completely programming-related, but nevertheless I think SO is the right place to ask.
In my program I generate some audio data and save the track to a WAV file. Everything works fine with one sound generator. But now I want to add more generators and mix the generated audio data into one file. Unfortunately it is more complicated than it seems at first sight.
Moreover I didn't find much useful information on how to mix a set of audio samples.
So is there anyone who can give me advice?
edit:
I'm programming in C++. But it doesn't matter, since I was interested in the theory behind mixing two audio tracks. The problem I have is that I cannot just sum up the samples, because this often produces distorted sound.
I assume your problem is that for every audio source you're adding in, you're having to lower the levels.
If the app gives control to a user, just let them control the levels directly. Hotness is their responsibility, not yours. This is "summing."
If the mixing is automated, you're about to go on a journey. You'll probably need compression, if not limiting. (Limiting is an extreme version of compression.)
Note that anything you do to the audio (including compression and limiting) is a form of distortion, so you WILL have coloration of the audio. Your choice of compression and limiting algorithms will affect the sound.
Since you're not generating the audio in real time, you have the possibility of doing "brick wall" limiting. That's because you have foreknowledge of the levels. Realtime limiting is more limited because you can't know what's coming up--you have to be reactive.
Is this music, sound effects, voices, what?
Programmers here deal with this all the time.
Mixing audio samples means adding them together, that's all. Typically you do add them into a larger data type so that you can detect overflow and clamp the values before casting back into your destination buffer. If you know beforehand that you will have overflow then you can scale their amplitudes prior to addition - simply multiply by a floating point value between 0 and 1, again keeping in mind the issue of precision, perhaps converting to a larger data type first.
If you have a specific problem that is not addressed by this, feel free to update your original question.
dirty mix of two samples
mix = (a + b) - a * b * sign(a + b)
You never said what programming language and platform, however for now I'll assume Windows using C#.
http://www.codeplex.com/naudio
Great open source library that really covers off lots of the stuff you'd encounter during most audio operations.
Related
I'm currently researching an problem regarding DOA (direction of arrival) regression for an audio source, and need to generate training data in the form of audio signals of moving sound sources. In particular, I have the stationary sound files, and I need to simulate a source and microphone(s) with the distances between them changing to reflect movement.
Is there any software online that could potentially do the trick? I've looked into pyroomacoustics and VA as well as other potential libraries, but none of them seem to deal with moving audio sources, due to the difficulties in simulating the doppler effect.
If I were to write up my own simulation code for dealing with this, how difficult would it be? My use case would be an audio source and a microphone in some 2D landscape, both moving with their own velocities, where I would want to collect the recording from the microphone as an audio file.
Some speculation here on my part, as I have only dabbled with writing some aspects of what you are asking about and am not experienced with any particular libraries. Likelihood is good that something exists and will turn up.
That said, I wonder if it would be possible to use either the Unreal or Unity game engine. Both, as far as I can remember, grant the ability to load your own cues and support 3D including Doppler.
As far as writing your own, a lot depends on what you already know. With a single-point mike (as opposed to stereo) the pitch shifting involved is not that hard. There is a technique that involves stepping through the audio file's DSP data using linear interpolation for steps that lie in between the data points, which is considered to have sufficient fidelity for most purposes. Lot's of trig, too, to track the changes in velocity.
If we are dealing with stereo, though, it does get more complicated, depending on how far you want to go with it. The head masks high frequencies, so real time filtering would be needed. Also it would be good to implement delay to match the different arrival times at each ear. And if you start talking about pinnas, I'm way out of my league.
As of now it seems like Pyroomacoustics does not support moving sound sources. However, do check a possible workaround suggested by the developers here in Issue #105 - where the idea of using a time-varying convolution on a dense microphone array is suggested.
I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.
I can't seem to find any information regarding the process that Ableton uses to efficiently detect atonal percussion and convert it into MIDI. I assume feature extraction and onset detection algorithms are executed, but I'm intrigued as to what algorithms. I am particularly interesting how its efficiency is maintained for a beatboxed input.
Cheers
Your guesses are as good as everyone else's - although they look plausible. The reality is that the way this feature is implemented in Ableton is a trade secret and likely to remain that way.
If I'm not mistaken Ableton licenses technology from https://www.zplane.de/ for these things.
I don't exactly know how the software assigns the different drum sounds, but the chapter in the live manual Convert Drums to New MIDI Track says that it can only detect kick, snare and hi-hat. An important thing is that they are identified by the transient Markers. For a good result you should manually check and adjust them. The transient Markers look like the warp Markers, but are grey.
compared to a kick and a snare for example, a beatboxed input is likely to have less difference between the individual sounds and therefore likely to be harder for Ableton to individually extract the seperate sounds (depends on the beatboxer). In any case, some combination of frequency and amplitude - more specifically(Attack, Decay, Sustain, Release) as well as perhaps the different overtone combinations that account for differences in timbre are going to be the characteristics that would have to be evaluated in order to separate the kick snare and hihat .
Before this feature existed I used gates and hi/low pass filters to accomplish a similar task. So perhaps Ableton's solution is not as complicated as we might imagine.
I'm struggling to choose between a vast number of audio programming languages and APIs. I'm very (totally) new to audio programming so please bear with me.
Software
I need to be able to:
Alter volume of different sounds before outputting them to anything (these sounds can have a variety of different origins, for example mp3s and microphone input)
phase shift sounds
superimpose sounds that I have tweaked (as per items 1 and 2)
control the output to each of 8 channels independently of one another
make this all happen on Windows7
These capabilities need be abstracted by a graphical frontend I will probably make myself. What I want to be able to do is create 'sound sources' and move them around a 3D environment along either pre-defined trajectories and/or in relation to the movement of whoever is inside the rig. The reason I want to do pitch bending is so I can mess with red-shift stuff.
I don't want to have to construct full tracks before-hand and just play them. I want the sound that is played to depend on external input from sensors as well as what I am doing on the frontend.
As far as I know this means I cant use any existing full audio making app.
The Question
I've been looking around for for the API or language I should use and I have not turned up a blank, quite the opposite actually. I'm struggling to narrow down my search. A lot of my problem stems from the fact that I have no experience in audio programming.
So, does anyone know off-hand of an API or language that meets my criteria?
Hardware stuff and goals
(I left this until last because I'm not sure how relevant it is)
My goal is to make three rings of speakers at different heights and to have enough control over them to be able to simulate any number of 'sound sources' within the array. The idea is to have someone stand in the middle of the rig and be able to make it sound like there are lots of things moving around them. To get this working I'm planning on doing a little trig and using 8 channels of audio from my PC. The maths is pretty straight forward, it just the rest that I need to worry about
What I want to do next is attach a bunch of cameras to the thing and do some simple image recognition stuff to be able to 'attach sound sources' to different objects. Eg. If someone is standing in the right place it can be made to seem as though all red balls quack like a duck, and all orange ones moan hauntingly.
This is not to detract from Richard Small's answer, but to comment on some of the other options out there:
If you are looking for something higher-level with which you can prototype and develop this faster, you want max/msp or it's open source competitor puredata. These are designed for musicians who are technically minded, but not so much for programmers. As a result, you can build this sort of thing quickly and efficiently.
You also have some lower level options: PortAudio can handle your audio I/O, you would have to do the sound generation and effects and so on on your own or with other libraries. Cinder and OpenFramewoks both provide interfaces for audio, cameras, and other stuff for "creative programming". I'm afraid I don't know if they meet your full requirements, but they are powerful and popular for this sort of thing so I encorage you to look at them.
The two major ones these days tend to be
WWise
WWise Download Link
FMOD
FMOD Download Link
These two engines may even in fact be overkill for what you need, but I can almost guarantee that they will be capable of anything you require.
I'm trying to compare sound clips based on microphone recording. Simply put I play an MP3 file while recording from the speakers, then attempt to match the two files. I have the algorithms in place that works, but I'm seeing a slight difference I'd like to sort out to get better accuracy.
The microphone seem to favor some frequencies (add amplitude), and be slightly off on others (peaks are wider on the mic).
I'm wondering what the cause of this difference is, and how to compensate for it.
Background:
Because of speed issues in how I'm doing comparison I select certain frequencies with certain characteristics. The problem is that a high percentage of these (depending on how many I choose) don't match between MP3 and mic.
It's called the response characteristic of the microphone. Unfortunately, you can't easily get around it without buying a different, presumably more expensive, microphone.
If you can measure the actual microphone frequency response by some method (which generally requires having some etalon acoustic system and an anechoic chamber), you can compensate for it by applying an equaliser tuned to exactly inverse characteristic, like discussed here. But in practice, as Kilian says, it's much simpler to get a more precise microphone. I'd recommend a condenser or an electrostatic one.