SOX - Slow audio without stuttering effect - audio

So, I have an audio file and I would like to slow it down to 0.5x it's speed without changing the pitch, the problem is that when I do that, I get a weird stuttering effect. Is there any way to have sox slow the audio "smoothly" so there's no noticeable stuttering? Here is an example that I have found where somebody slowed down the Windows XP startup sound to make it 24 hours long. If you skip to the middle of the video you will notice it is playing smoothly.

I take it you're using the tempo effect? Have you tried playing around with the parameters, like reducing the segment size and increasing the search space and segment overlap, ending up with something like this:
play test.aiff tempo 0.5 10 20 30
Chances are, however, that you won't ever get a pleasing result using SoX to so drastically stretch audio without changing the pitch. Not that the SoX algorithm is bad, it just isn't quite the right tool for the job.
You'd be better off using something like Amazing Slow Downer, or Paul's Extreme Sound Stretch, both employing algorithms specifically designed for stuff like this.

Related

How do I remove noise from this audio file?

I am uploading an audacity project with 2 tracks, the 1st one contains a "bitbit" sound resulted from Speex echo cancellation. I tried to remove the sound using Audacity noise cancellation, didn't work. Tried equalizer to cut off some high frequency sounds, worked but somehow degraded the sound quality. Please help how I can clear the noisy audio without significantly degrading quality.
If Audacity doesn't work, any C/C++ library can also be used.
Audacity Project
I think you should use the Noice Reduction effect from the effect menu.

How to decrease pitch of audio file in nodejs server side?

I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Signal/Sound Processing: Making text vibrate to music

I'm working on a simple music visualization. Probably not relevant, but I am doing the sound processing using the new WebKit Audio Data API and the dsp.js library.
I want to make a text vibrate (grow/shrink) to the rhythm of the music. What is the best way to do this?
What I've done so far is ran the signals through a FFT. I look at the bottom 10% of frequencies (bass notes?) and when the amplitude surpasses a certain threshold, I animate the text.
Does this sound right? Or am I completely off?
You say you've done it, and then you ask if you are way off? Well, you tell us: does it work for your application?
One potential problem is that the FFT is slow, both in that there may be a lag between your input and output and there will be a lot of CPU used. I don't expect this will matter for your application, but, in general, you are better off using a low-pass filter. When the output of the low-pass goes above some level, you can use that to trigger something for some short amount of time.
Another issue is simply that this is only a very basic beat detection algorithm. It might work for bass-heavy "four on the floor" music, but you'll need to figure out where the threshold goes and how to keep it moving when the bass stops or something. You may want to research beat detection algorithms. The open source aubio has some.
http://aubio.org/

How to reproduce C64-like sounds?

I did some of my own research and found out that SID-chips had only few hardware supported synthesizing features. Including three audio oscillators with four possible waveforms (sawtooth, triangle, pulse, noise), with ADSR envelopes and ring modulators. Accompanied with oscillator sync and ring modulators. Also read there was a way to play single PCM sound as well.
It is all so little, but still I heard lots of different sounds from my TV sets. How were they combined to produce all that variety of audio?
To give some specifics, I'd like to know how to combine those components to produce guitar, piano or drum -like audio? Another interesting things would be different buzzes and sounds specific to C64.
I used to write music on the C64 for games, demos and even services (I wrote the official QuantumLink theme, even). As for your question, the four different waveforms were typically overlaid with the sync and ring mods (less often ring, because it was unpredictable on different versions of the SID chip), and sometimes used cleanly.
For example, a typical 'snare' sound would be composed of a noise waveform with a very fast attack and sustain, and depending on whether you wanted a drumstick or brush sound, either a very fast decay and moderately short release, or a short decay and slower release.
Getting the right sound was typically trial and error, and the limitations were pretty heavy. You really never got to the point of piano or guitar sound due to the simple waveforms without overlayable harmonic waveforms, about the best you could get was things that sounded beepy, things that sounded marimba-y, and things that sounded like a snare drum.
One of the tricks used most often to extend sound was done with fast machine code playback routines that could change the played notes on voices so quickly as to give the impression of a fuller, harmonic tone. We just called it arpeggiation, although at 10 to 12 note changes a second it sounded more like a buzzy chord.
As for the sampled waveforms, they were only available as single bit and later 4 bit samples. These sounded terrible despite our best attempts, because basically the method of playback for a sample on the 64 was to play a white noise waveform and rapidly alter the volume on the SID chip to produce a rising and falling wave. Do it fast enough and it sort of sounds like the original sound, poorly tuned in on a staticky radio.
I suggest you grab hold of a C64 emulator for the PC (CCS64 is a good one) and a 64 BASIC programming guide and just play around.... the SID chip is entirely manipulatable from BASIC.
To sum up, how did we get all of those piano and guitar sounds on a C64? We didn't, really.
Take a look at some of these docs related to producing music on the C64:
http://sid.kubarth.com/articles.html
This type of music you are describing falls into the category of "chiptunes". I'd recommend checking out some modern trackers like MilkyTracker, which are used to create music in this style. There are libraries like libmodplug that allow you to play tracker in your software.
Check out some of the C64 emulators out there. I've read that some of them are 100% accurate in ther sound reproduction, true to the original.

Resources