Creating .wav files of varying pitches but still having the same fundamental frequency - python-3.x

I am using pygame to play .wav files and want to change the pitch of a particular .wav file as each level in my game progresses. To explain, my game is a near copy of the old Oric1 computer OricMunch Pacman game, where there are a few hundred pills to be munched on each level, and for every pill that is munched a short sound is played, with the pitch of the sound increasing slightly for each pill eaten/munched.
Now here is what I have tried:
1) I have used pythons wave module to create multiple copies of the sound file, each newly created file having a slight increase in pitch (by changing the 3rd parameter in params() the framerate, sometimes referred to as the sample frequency) for each cycle of a for loop. Having achieved this, I could then within the loop create multiple sound objects to add to a list, and then index through the list to play the sounds as each pill is eaten.
The problem is even though I can create hundreds of files (using the wave module) that play perfectly with their own unique pitches when played using windows media player, or even pythons winsound module, pygame does not seem to interpret the difference in pitch.
Now interestingly, I have downloaded the free trial version of Power Sound Editor which has the option to change the pitch, and so I’ve created just a few .wav files to test, and they clearly play with different pitches when played in pygame.
Observations:
From printing the params in my for loop, I can see that the framerate/frequency is changing as intended, and so obviously this is why the sounds play as intended through windows media player and winsound.
Within pygame I suspect the reason they don’t play with different pitches is because the frequency parameter is fixed, either to the default settings or via the use of pygame.mixer.pre_init, which I have indeed experimented with.
I then checked the params for each .wav file created by the Power Sound Editor, and noticed that even though the pitch sound was changing, the frequency stayed the same, which is not totally surprising since you have to select 1 of 3 options to save the files, either 22050, 44100 or 96000Hz
So now I thought time to check out the difference between pitch and frequency specifically in relation to sound, since I thought they were the same. What I found was it seems there are two principle aspects of sound waves: 1) The framerate/frequency And 2) The varying amplitude of multiple waves based on that frequency. Now I far from clearly understand this, but realise the Power Sound Editor must be altering the shape/pitch of the sound by manipulating the varying amplitude of multiple waves, point 2) above, and not by changing the fundamental frequency, point 1) above.
I am a beginner to python, pygame and programming in general, and have tried hard to find a simple way to change sound files to have gradually increasing pitches without changing the framerate/fundamental frequency. If there’s a module that I can import to help me change the pitch by manipulating the varying amplitude of mutiple waves (instead of changing the framerate/sample frequency which typically is either 22050 or 44100Hz), then it needs to take relatively no time at all if being done on the fly in order to not slow the game down. If the potential module opens, changes and then saves sound files, as opposed to altering them on the fly, then I guess it does not matter if it’s slow because I will just be creating the sound files so I can create sound objects from them in pygame to play.
Now if the only way to achieve no slow down in pygame is to create sound objects from sound files as I have already done, and then play them, then I need a way to manipulate the sound files like the Power Sound Editor (again I stress not by changing the framerate/sample frequency of typically 22050 or 44100) and then save the changed file.
I suppose in a nut shell, if I could magically automate Power Sound Editor to produce 3 to 4 hundred sound files without me having to click on the change pitch option and then save each time, this would be like having my own python way of doing it.
Conclusion:
Assuming creating sound objects from sound files is the only way not to slow my game down (as I suspect it might be) then I need the following:
An equivalent to the python wave module, but which changes the pitch like Power Sound Editor does, and not by changing the fundamental frequency like the wave module does.
Please can someone help me and let me know if there’s a way.
I am using python 3.2.3 and pygame 1.9.2
Also I’m just using pythons IDLE and I’m not familiar with using other editors.
Also I’m aware of Numpy and of various sound modules, but definitely don’t know how to use them. Also any potential modules would need to work with the above versions of python and pygame.
Thank you in advance.
Gary Townsend.
My Reply To The First Answer From Andbdrew Is Below:
Thank you for your assistance.
It does sound like changing the wave file data rather than the wave file parameters is what I need to do. For reference here is the code I have used to create the multiple files:
framerate = 44100 #Original .wav file framerate/sample frequency
for x in range(0, 25):
file = wave.open ('MunchEatPill3Amp.wav')
nFrames = file.getnframes()
wdata = file.readframes(nFrames)
params = file.getparams()
file.close()
n = list(params)
n[0] = 2
n[2] = framerate
framerate += 500
params = tuple(n)
name = 'PillSound' + str(x) + '.wav'
file = wave.open(name, 'wb')
file.setparams(params)
print(params)
file.writeframes(wdata)
file.close()
It sounds like writing different data would be equivalent or similar to how the Power Sound Editor is changing the pitch.
So please can you tell me if you know a way to modify/manipulate wdata to effectively change the pitch, rather than alter the sample rate in params(). Would this mean some relatively simple operation applied to wdata after it’s read from my .wav file. (I really hope so) I’ve heard of using numpy arrays, but I have no clue how to use these.
Please note that any .wav files modified in the above code, do indeed play in Python using winsound, or in windows media player, with the pitch increase sounding as intended. It’s only in Pygame that they don’t.
As I’ve mentioned, it seems because Pygame has a set frequency (I guess this frequency is also sample rate), that this might be the reason the pitch sounds the same, as if it wasn’t increased at all. Whereas when played with e.g. windows media player, the change in sample rate does result in a higher sounding pitch.
I suppose I just need to achieve the same increase in pitch sound by changing the file data, and not the file parameters, and so please can you tell me if you know a way.
Thank you again for helping with this.
To Summarise My Initial Question Overall, Here It Is Again:
How do you change the pitch of a .wav file without changing the framerate/sample frequency, by using the python programming language, and not some kind of separate software program such as Power Sound Editor?
Thank You Again.

You should change the frequency of the wave in your sample instead of changing the sample rate. It seems like python is playing back all of your wave files at the same sample rate (which is good), so your changes are not reflected.
Sample rate is sort of like meta information for a sound file. Read about it at http://en.m.wikipedia.org/wiki/Sampling_rate#mw-mf-search .
It tells you the amount of time between samples when you convert a continuous waveform into a discrete one. Although your (ab)use of it is cool, you would be better served by encoding different frequencies of sound in your different files all at the same sample rate.
I took a look at the docs for the wave module ( http://docs.python.org/3.3/library/wave.html ) and it looks like you should just write different data to your audio files when you call
Wave_write.writeframes(data)
That is the method that actually writes your audio data to your audio file.
The method you described is responsible for writing information about the audio file itself, not the content of the audio data.
Wave_write.setparams(tuple)
"... Where the tuple should be (nchannels, sampwidth, framerate, nframes, comptype, compname), with values valid for the set*() methods. Sets all parameters... " ( also from the docs )
If you post your code, maybe we can fix it.

If you just want to create multiple files and you are using linux, try SoX.
#!/bin/bash
for i in `seq -20 10 20`; do
sox 'input.wav' 'output_'$i'.wav' pitch $i;
done

Related

Looking to split audio from different sources that's become enmeshed in recovery

My Zoom H4n somehow decided it didn't want to properly save two recordings this weekend, leaving me with four zero byte files (which I have tried any which way to open/convert, but nothing was working).
I then used CardRescue to scan the SD card for any audio it could find, and - lo and behold - I got .wav files! However, instead of two files for each session (one was an XLR output from the desk, the other the on-Zoom mics), or even a nice stereo with one left, the other right, I have a mess.
In importing as raw data to Audacity (the rescued .wavs themselves do not open), the right channel has the on-Zoom mic audio, with intermittent silence. The left has the on-Zoom audio, followed by the same part of the XLR input audio. This follows the same pattern as the silences.
I have spent hours chopping up in Garageband, but as it is audio for a video, it needs to match what 'really' happened perfectly (I appreciate for a podcast/audio-only I could relatively simply take away the on-Zoom mic audio from the left channel). I began attempting to sync the mic audio to the on-camera audio (which, despite playing around with settings is as unusable as it always is) but because it's a pattern, can't help but wonder if there's a cleaner fix: either analysing the audio somehow as there are clean lines if I look at the spectral data, or a case of adding a couple of numbers to the wav's binary that'd click the two into place?
I've tried importing to Audacity with different settings, different offsets - this has ended up in either slow audio, fast audio, or heavily distorted audio (but always the same patterns with the files).
I use a Mac (and don't know any PC users close by!) so any software suggestions will need to run on Mac. However, I'm willing to try just about anything that's not dragging tiny clips.

I am trying to build a music visualizer but I am completely inexperienced

Where should I begin?
I am trying to build a real-time stem-split music visualizer for VJing and the like. What sets this apart is that I would like to split the input audio stream into its stems (either algorithmically or using something like Spleeter) and then use each stem data to control different aspects of the visualization.
For example:
The isolated drums to play a BPM-synced video.
I'm hoping to achieve this by making a short looping video at a fixed BPM (say, 60) and then by detecting the BPM of the stream, adjust the playback speed of the video so that the video is in sync.
The isolated synth stream could control DMX lights.
I want to try and encode this data in, say, the last row of pixels in the above video. By reading the colour, intensity, and movement data from the pixels, moves and timings could be read and sent to the lights in real-time. I'm doing this so that the user can encode all the data needed for a scene into one video file.
The isolated vocals could be synced and displayed on screen using
MusixMatch.
The isolated bassline could be parsed into MIDI data and visualized on screen.
All of the above can be controlled live.
Now the problem is that I am relatively inexperienced with programming. I am not sure where to start. Which language to use, which IDE, how to display visuals, how to interact with audio input streams, how to use DMX and how to visualize MIDI data. I know this is currently quite a bit out of my depth but I'll manage with the right resources. Please give me some advice on where to begin for a project like this.

How to decrease pitch of audio file in nodejs server side?

I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

sound synchronization in C or Python

I'd like to play a sound and have some way of reliably telling how much of it has thus far been played.
I've looked at several sound libraries but they are all horribly underdocumented and only seem to export a "PlaySound, no questions asked" routine.
I.e, I want this:
a = Sound(filename)
PlaySound(a);
while true:
print a.miliseconds_elapsed, a.length
sleep(1)
C, C++ or Python solutions preferred.
Thank you.
I use BASS Audio Library (http://www.un4seen.com/)
BASS is an audio library for use in Windows and Mac OSX software. Its purpose is to provide developers with powerful and efficient sample, stream (MP3, MP2, MP1, OGG, WAV, AIFF, custom generated, and more via add-ons), MOD music (XM, IT, S3M, MOD, MTM, UMX), MO3 music (MP3/OGG compressed MODs), and recording functions. All in a tiny DLL, under 100KB in size.*
A C program using BASS is as simple as
HSTREAM str;
BASS_Init(-1,44100,0,0,NULL);
BASS_Start();
str=BASS_StreamCreateFile(FALSE,filename,0,0,0);
BASS_ChannelPlay(str,FALSE);
while (BASS_ChannelIsActive(str)==BASS_ACTIVE_PLAYING) {
pos=BASS_ChannelGetPosition(str,BASS_POS_BYTE);
}
BASS_Stop();
BASS_Free();
This is most likely going to be both hardware-dependent (sound card etc) and OS-dependent (size of buffers used by OS etc).
Maybe it would help if you said a little more about what you're really trying to achieve and also whether we can make any assumptions about what hardware and OS this will run on ?
One possible solution: assume that the sound starts playing more or less immediately and then use a reasonably accurate timer to determine how much of the sound has played (since it will have a known, fixed sample rate).
I'm also looking for a nice Audiolibrary, where i can directly write on the Soundcards Buffer. I didn't have time yet to have a look at it myself, but pyAudio looks pretty nice. If you scroll down on the page you see an example similar like yours.
With help of the buffersize, number of channels and sample rate you can easily calculate the time each loop-step lasts and print it out.

Resources