I am trying to build a music visualizer but I am completely inexperienced - audio

Where should I begin?
I am trying to build a real-time stem-split music visualizer for VJing and the like. What sets this apart is that I would like to split the input audio stream into its stems (either algorithmically or using something like Spleeter) and then use each stem data to control different aspects of the visualization.
For example:
The isolated drums to play a BPM-synced video.
I'm hoping to achieve this by making a short looping video at a fixed BPM (say, 60) and then by detecting the BPM of the stream, adjust the playback speed of the video so that the video is in sync.
The isolated synth stream could control DMX lights.
I want to try and encode this data in, say, the last row of pixels in the above video. By reading the colour, intensity, and movement data from the pixels, moves and timings could be read and sent to the lights in real-time. I'm doing this so that the user can encode all the data needed for a scene into one video file.
The isolated vocals could be synced and displayed on screen using
MusixMatch.
The isolated bassline could be parsed into MIDI data and visualized on screen.
All of the above can be controlled live.
Now the problem is that I am relatively inexperienced with programming. I am not sure where to start. Which language to use, which IDE, how to display visuals, how to interact with audio input streams, how to use DMX and how to visualize MIDI data. I know this is currently quite a bit out of my depth but I'll manage with the right resources. Please give me some advice on where to begin for a project like this.

Related

Zoom and Moving based on audio information in FFMPEG

I recently wondered if it is possible to zoom or move things in FFMPEG based on an audio source.
I already played around with complex filters as they allow some audio visualization but didn't really manage to move/zoom things based on sound. See good examples of complex filters used for audio visualization at: https://hhsprings.bitbucket.io/docs/programming/examples/ffmpeg/audio_visualization/index.html
My current situation is that i have multiple inputs which one of should react on sound/maybe even on special frequency's.

Understanding Azure Media Service Encoding permutations - that increases file size

How system can improve a video quality automatically? For example, a dark line on my face in video can't be removed by system automatically...make sense. Here I'm trying to understand Azure Media Services encoding permutations.
When I uploaded a 55.5 MB mp4 file and encoded with "H264AdaptiveBitrateMP4Set720p" encoder preset, I received following output files:
Now look at green rectangular highlighted video file, this looks good according to input file size. But if you look at red rectangular highlighted video files, these are improved files for adaptive streaming, which looks useless if you compare with my example 'a dark line on my face'. Here's my questions and I would love to read your input on this:
What are exact reasons encoder increases the file size?
Why I should pay more for bandwidth and storage on these large files, how I convince clients?
Is there any way I can define not to create such files when scheduling encoding?
Any input is highly appreciated.
1) The dark lines appearing on your face have nothing to do with encoding. Encoding simply means re-arranging bits that make up the video using a different compression algorithm than the one used in the source video.
2) As you see from the filenames of the files generated, they all have a different bitrate, denoted in kbps. This is the amount of data, i.e. number of bits, that the transcoder has to decode to get 1-second worth of video footage. The higher the bit-rate, the better is the quality of the video because there is more detail such as better light and color information stored in every pixel and hence in every frame of such a video.
As a corollary, a higher-bit rate video is suited better for faster internet connections.
So, Azure must have converted from your source video, these 4 different videos of different bit-rates, all having the same video (h.264) and audio (AAC) encoding.
3) As to how to let Azure not make so many files, I do not know the answer to that. I am pretty sure it will be some configuration somewhere but I honestly have no idea. I am confident, though, that it is only a matter of some configuration to tell it to stop doing the other bit-rate conversions.
In summary:
a) to clear off the dark thingy on your face in the video, you have to edit the source video in a video editor and that has nothing to do with video encoding.
b) The file sizes are different due to different bit-rates, meaning differences in light and color information, i.e. shadow detail, stored in every pixel of every frame of the video footage.
Those users who have a faster Internet connection, to them you could show the option of downloading a higher-bit-rate file. The higher bit-rate file will show slightly better quality even under the same video resolution, i.e. 720p in your case.

Is there a way to use ffmpeg audio filters to automatically synchronize 2 streams with similar content

I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!

Creating .wav files of varying pitches but still having the same fundamental frequency

I am using pygame to play .wav files and want to change the pitch of a particular .wav file as each level in my game progresses. To explain, my game is a near copy of the old Oric1 computer OricMunch Pacman game, where there are a few hundred pills to be munched on each level, and for every pill that is munched a short sound is played, with the pitch of the sound increasing slightly for each pill eaten/munched.
Now here is what I have tried:
1) I have used pythons wave module to create multiple copies of the sound file, each newly created file having a slight increase in pitch (by changing the 3rd parameter in params() the framerate, sometimes referred to as the sample frequency) for each cycle of a for loop. Having achieved this, I could then within the loop create multiple sound objects to add to a list, and then index through the list to play the sounds as each pill is eaten.
The problem is even though I can create hundreds of files (using the wave module) that play perfectly with their own unique pitches when played using windows media player, or even pythons winsound module, pygame does not seem to interpret the difference in pitch.
Now interestingly, I have downloaded the free trial version of Power Sound Editor which has the option to change the pitch, and so I’ve created just a few .wav files to test, and they clearly play with different pitches when played in pygame.
Observations:
From printing the params in my for loop, I can see that the framerate/frequency is changing as intended, and so obviously this is why the sounds play as intended through windows media player and winsound.
Within pygame I suspect the reason they don’t play with different pitches is because the frequency parameter is fixed, either to the default settings or via the use of pygame.mixer.pre_init, which I have indeed experimented with.
I then checked the params for each .wav file created by the Power Sound Editor, and noticed that even though the pitch sound was changing, the frequency stayed the same, which is not totally surprising since you have to select 1 of 3 options to save the files, either 22050, 44100 or 96000Hz
So now I thought time to check out the difference between pitch and frequency specifically in relation to sound, since I thought they were the same. What I found was it seems there are two principle aspects of sound waves: 1) The framerate/frequency And 2) The varying amplitude of multiple waves based on that frequency. Now I far from clearly understand this, but realise the Power Sound Editor must be altering the shape/pitch of the sound by manipulating the varying amplitude of multiple waves, point 2) above, and not by changing the fundamental frequency, point 1) above.
I am a beginner to python, pygame and programming in general, and have tried hard to find a simple way to change sound files to have gradually increasing pitches without changing the framerate/fundamental frequency. If there’s a module that I can import to help me change the pitch by manipulating the varying amplitude of mutiple waves (instead of changing the framerate/sample frequency which typically is either 22050 or 44100Hz), then it needs to take relatively no time at all if being done on the fly in order to not slow the game down. If the potential module opens, changes and then saves sound files, as opposed to altering them on the fly, then I guess it does not matter if it’s slow because I will just be creating the sound files so I can create sound objects from them in pygame to play.
Now if the only way to achieve no slow down in pygame is to create sound objects from sound files as I have already done, and then play them, then I need a way to manipulate the sound files like the Power Sound Editor (again I stress not by changing the framerate/sample frequency of typically 22050 or 44100) and then save the changed file.
I suppose in a nut shell, if I could magically automate Power Sound Editor to produce 3 to 4 hundred sound files without me having to click on the change pitch option and then save each time, this would be like having my own python way of doing it.
Conclusion:
Assuming creating sound objects from sound files is the only way not to slow my game down (as I suspect it might be) then I need the following:
An equivalent to the python wave module, but which changes the pitch like Power Sound Editor does, and not by changing the fundamental frequency like the wave module does.
Please can someone help me and let me know if there’s a way.
I am using python 3.2.3 and pygame 1.9.2
Also I’m just using pythons IDLE and I’m not familiar with using other editors.
Also I’m aware of Numpy and of various sound modules, but definitely don’t know how to use them. Also any potential modules would need to work with the above versions of python and pygame.
Thank you in advance.
Gary Townsend.
My Reply To The First Answer From Andbdrew Is Below:
Thank you for your assistance.
It does sound like changing the wave file data rather than the wave file parameters is what I need to do. For reference here is the code I have used to create the multiple files:
framerate = 44100 #Original .wav file framerate/sample frequency
for x in range(0, 25):
file = wave.open ('MunchEatPill3Amp.wav')
nFrames = file.getnframes()
wdata = file.readframes(nFrames)
params = file.getparams()
file.close()
n = list(params)
n[0] = 2
n[2] = framerate
framerate += 500
params = tuple(n)
name = 'PillSound' + str(x) + '.wav'
file = wave.open(name, 'wb')
file.setparams(params)
print(params)
file.writeframes(wdata)
file.close()
It sounds like writing different data would be equivalent or similar to how the Power Sound Editor is changing the pitch.
So please can you tell me if you know a way to modify/manipulate wdata to effectively change the pitch, rather than alter the sample rate in params(). Would this mean some relatively simple operation applied to wdata after it’s read from my .wav file. (I really hope so) I’ve heard of using numpy arrays, but I have no clue how to use these.
Please note that any .wav files modified in the above code, do indeed play in Python using winsound, or in windows media player, with the pitch increase sounding as intended. It’s only in Pygame that they don’t.
As I’ve mentioned, it seems because Pygame has a set frequency (I guess this frequency is also sample rate), that this might be the reason the pitch sounds the same, as if it wasn’t increased at all. Whereas when played with e.g. windows media player, the change in sample rate does result in a higher sounding pitch.
I suppose I just need to achieve the same increase in pitch sound by changing the file data, and not the file parameters, and so please can you tell me if you know a way.
Thank you again for helping with this.
To Summarise My Initial Question Overall, Here It Is Again:
How do you change the pitch of a .wav file without changing the framerate/sample frequency, by using the python programming language, and not some kind of separate software program such as Power Sound Editor?
Thank You Again.
You should change the frequency of the wave in your sample instead of changing the sample rate. It seems like python is playing back all of your wave files at the same sample rate (which is good), so your changes are not reflected.
Sample rate is sort of like meta information for a sound file. Read about it at http://en.m.wikipedia.org/wiki/Sampling_rate#mw-mf-search .
It tells you the amount of time between samples when you convert a continuous waveform into a discrete one. Although your (ab)use of it is cool, you would be better served by encoding different frequencies of sound in your different files all at the same sample rate.
I took a look at the docs for the wave module ( http://docs.python.org/3.3/library/wave.html ) and it looks like you should just write different data to your audio files when you call
Wave_write.writeframes(data)
That is the method that actually writes your audio data to your audio file.
The method you described is responsible for writing information about the audio file itself, not the content of the audio data.
Wave_write.setparams(tuple)
"... Where the tuple should be (nchannels, sampwidth, framerate, nframes, comptype, compname), with values valid for the set*() methods. Sets all parameters... " ( also from the docs )
If you post your code, maybe we can fix it.
If you just want to create multiple files and you are using linux, try SoX.
#!/bin/bash
for i in `seq -20 10 20`; do
sox 'input.wav' 'output_'$i'.wav' pitch $i;
done

Converting audio to code and vice-versa

Having just witnessed Sound Load technology on the Nintendo DS game Bangai-O Spritis. I was curious as to how this technology works? Does anyone have any links, documentation or sample code on implementing such a feature, that would allow the state of an application to be saved and loaded via audio?
Its the same old thing used in ZX Spectrum era. You load programs/games from tape.Only the sound quality and the filters are probably better.
In my opinion something like Bluetooth or WiFi is better. You can also send files that can be put on some storage and then load them. I find these methods much easier than sound because if there is a lot of noise around you cannot do much.
It is just a conversion of data to audio and then back from audio to data.
Search for Zotyocopy and Copy86M on google - these are the utilities used for saving a game to tape after loading it into memory on zx spectrum.
If you want to pass data as audio through the air there are a few things you need to be aware of though, such as how the speaker and microphone interact for example. It is important that they don't distort or alter the sound too much as what you are sending are in fact the raw bytes.
Some audio software will let you open any file as audio so that you may listen to it. If you record audio as data do not use lossy compression such as mp3 on the audio file!

Resources