VHDL audio sample volume control - audio

I was searching a lot about this problem, but I cant find anything usefull...
The problem is, Im making echo efect on FPGA chip.. I have everything prepared, like BRAM for delay, input, output with delay, but I can't find out, how to change volume of output which is coming back to input, to mix them together and send them again to BRAM..
Becouse when I just simply conect output to input, it makes a cycle of BRAM to infinite, but I need to change the volume of output, which is coming back to input, to half of its volume..
I read it can be achived by shifting sample to the right, but it makes a lot of noise over the sample..
Im using 16 bit samples
So I'm asking for ideas about how to control volume of sample, everything else I have prepared..

So i find out what was my problem.. I was shifting sample vector right, but I just made it by "0" & sample(15 downto 1) but it was signed, so I had to copy MSB instead of adding just plain "0".. so answer is
sample(15) & sample(15 downto 1)
this make sample half of the original volume.. its like sample * 0,5

Related

Synthesized polyphonic sound completely different from the "real" one

I am making a software audio synthesizer and so far i've managed to play a single tone at once.
My goal was to make it polyphonic, i.e when i press 2 keys both are active and produce sound (i'm aware that a speaker can only output one waveform at a time).
From what i've read so far, to achieve a pseudo-polyphonic effect what you are supposed do, is to add the tones to each other with different amplitudes.
The code i have is too big to post in it's entirety but i've tested it and it's correct (it implements what i described above, as for whenever it's the correct thing to do i'm not so sure anymore)
Here is some pseudo-code of my mixing
sample = 0.8 * sin(2pi * freq[key1] * time) + 0.2 * sin(2pi * freq[key2] * time)
The issue i have with this approach is that when i tried to play C C# it resulted in a wierd wobble like sound with distortions, it appears to make the entire waveform oscillate at around 3-5 Hz.
I'm also aware that this is the "correct" behavior because i graphed a scenario like this and the waveform is very similar to what i'm experiencing here.
I know this is the beat effect and that's what happens when you add two tones close in frequency but that's not what happens when you press 2 keys on a piano, which means this approach is incorrect.
Just for test i made a second version that uses stereo configuration and when a second key is pressed it plays the second tone on a different channel and it produces the exact effect i was looking for.
Here is a comparison
Normal https://files.catbox.moe/2mq7zw.wav
Stereo https://files.catbox.moe/rqn2hr.wav
Any help would be appreciated, but don't say it's impossible because all of the serious synthesizers can achieve this effect
Working backwards from the sound, the "beating" sound is one that would arise from two pitches in the vicinity of 5 or 6 Hz apart. (It was too short for me to count the exact number of beats per second.) Are you playing Midi 36 (C2) = 65.4Hz and Midi 37 (C#2) 69.3Hz? These could be expected to beat at roughly 4 x per sec. Midi 48 & 49 would be closer to 8 times a second.
The pitch I'm hearing sounds more like an A than a C. And A2 (110) + A#2 (116.5) would have beat rate that plausibly matches what's heard.
I would double check that the code you are using in the two scenarios (mono and stereo) are truly sending the frequencies that you think you are.
What sample rate are you using? I wonder if the result could be an artifact due to an abnormally low number of samples per second in your data generation. The tones I hear have a lot of overtones for being sine functions. I'm assuming the harmonics are due to a lack of smoothness due to there being relatively few steps (a very "blocky" looking signal).
I'm not sure my reasoning is right here, but maybe this is a plausible scenario. Let's assume your computer is able to send out signals at 44100 fps. This should be able to decode a rather "blocky" sine (with lots of harmonics) pretty well. There might be some aliasing due to high frequency content (over the Nyquist value) arising from the blockiness.
Let's further assume that your addition function is NOT occurring at 44100 fps, but at a much lower sample rate. This would lower the Nyquist and increase the aliasing. Thus the mixed sounds would be more subject to aliasing-related distortion than the scenario where the signals are output separately.

How to decrease pitch of audio file in nodejs server side?

I have a .MP3 file stored on my server, and I'd like to modify it to be a bit lower in pitch. I know this can be achieved by increasing the length of the audio, however, I don't know of any libraries in node that can do this.
I've tried using the node web audio api, and soundbank-pitch-shift, but the former doesn't seem to have the capabilities of pitch shifting (AFAIK), and the latter seems designed toward client
I need the solution within the realm of node ONLY- that means no external programs, etc., and it needs to be automated as well, so I can't manually pitch shift.
An ideal solution would be a function that takes a file/filepath as an input, and then creates (or overwrites) another MP3 file but with the pitch shifted by x amount, but really, any solution that produces something with a lower pitch than the original, works.
I'm totally lost here. Please help.
An audio file is basically a list of numbers. Those numbers are read one at a time at a particular speed called the 'sample rate'. The sample rate is otherwise defined as the number of audio samples read every second e.g. if an audio files sample rate is 44100, then there are 44100 samples (or numbers) read every second.
If you are with me so far, the simplest way to lower the pitch of an audio file is to play the file back at a lower sample rate (which is normally fixed in place). In most cases you wont be able to do this, so you need to achieve the same effect by resampling the file i.e adding new samples to the file in between the old samples to make it literally longer. For this you would need to understand interpolation.
The drawback to this technique in either case is that the sound will also play back at a slower speed, as well as at a lower pitch. If it is a problem that the sound has slowed down as well as lowered in pitch as a result of your processing, then you will also have to use a timestretching algorithm to fix the playback speed.
You may also have problems doing this using MP3 files. In this case you may have to uncompress the data in the MP3 file before you can operate on it in such a way that changes the pitch of the file. WAV files are more ideal in audio processing. In any case, you essentially need to turn the file into a list of floating point numbers, and change those numbers to be effectively read back at a slower rate.
Other methods of pitch shifting would probably need to involve the use of ffts, and would be a more complicated affair to say the least.
I am not familiar with nodejs I'm afraid.
I managed to get it working with help from Ollie M's answer and node-lame.
I hadn't known previously that sample rate could affect the speed, but thanks to Ollie, suddenly this problem became a lot more simple.
Using node-lame, all I did was take one of the examples (mp32wav.js), and make it so that I change the parameter sampleRate of the format object, so that it is lower than the base sample rate, which in my application was always a static 24,000. I could also make it dynamic since node-lame can grab the parameters of the input file in the format object.
Ollie, however perfectly describes the drawback with this method
The drawback to this technique in either case is that the sound will
also play back at a slower speed, as well as at a lower pitch. If it
is a problem that the sound has slowed down as well as lowered in
pitch as a result of your processing, then you will also have to use a
timestretching algorithm to fix the playback speed.
I don't have a particular need to implement a time stretching algorithm at the moment (thankfully, because that's a whole other can of worms), since I have the ability to change the initial speed of the file, but others may in the future.
See https://www.npmjs.com/package/audio-decode, https://github.com/audiojs/audio-buffer, and related linked at bottom of audio-buffer readme.

Creating .wav files of varying pitches but still having the same fundamental frequency

I am using pygame to play .wav files and want to change the pitch of a particular .wav file as each level in my game progresses. To explain, my game is a near copy of the old Oric1 computer OricMunch Pacman game, where there are a few hundred pills to be munched on each level, and for every pill that is munched a short sound is played, with the pitch of the sound increasing slightly for each pill eaten/munched.
Now here is what I have tried:
1) I have used pythons wave module to create multiple copies of the sound file, each newly created file having a slight increase in pitch (by changing the 3rd parameter in params() the framerate, sometimes referred to as the sample frequency) for each cycle of a for loop. Having achieved this, I could then within the loop create multiple sound objects to add to a list, and then index through the list to play the sounds as each pill is eaten.
The problem is even though I can create hundreds of files (using the wave module) that play perfectly with their own unique pitches when played using windows media player, or even pythons winsound module, pygame does not seem to interpret the difference in pitch.
Now interestingly, I have downloaded the free trial version of Power Sound Editor which has the option to change the pitch, and so I’ve created just a few .wav files to test, and they clearly play with different pitches when played in pygame.
Observations:
From printing the params in my for loop, I can see that the framerate/frequency is changing as intended, and so obviously this is why the sounds play as intended through windows media player and winsound.
Within pygame I suspect the reason they don’t play with different pitches is because the frequency parameter is fixed, either to the default settings or via the use of pygame.mixer.pre_init, which I have indeed experimented with.
I then checked the params for each .wav file created by the Power Sound Editor, and noticed that even though the pitch sound was changing, the frequency stayed the same, which is not totally surprising since you have to select 1 of 3 options to save the files, either 22050, 44100 or 96000Hz
So now I thought time to check out the difference between pitch and frequency specifically in relation to sound, since I thought they were the same. What I found was it seems there are two principle aspects of sound waves: 1) The framerate/frequency And 2) The varying amplitude of multiple waves based on that frequency. Now I far from clearly understand this, but realise the Power Sound Editor must be altering the shape/pitch of the sound by manipulating the varying amplitude of multiple waves, point 2) above, and not by changing the fundamental frequency, point 1) above.
I am a beginner to python, pygame and programming in general, and have tried hard to find a simple way to change sound files to have gradually increasing pitches without changing the framerate/fundamental frequency. If there’s a module that I can import to help me change the pitch by manipulating the varying amplitude of mutiple waves (instead of changing the framerate/sample frequency which typically is either 22050 or 44100Hz), then it needs to take relatively no time at all if being done on the fly in order to not slow the game down. If the potential module opens, changes and then saves sound files, as opposed to altering them on the fly, then I guess it does not matter if it’s slow because I will just be creating the sound files so I can create sound objects from them in pygame to play.
Now if the only way to achieve no slow down in pygame is to create sound objects from sound files as I have already done, and then play them, then I need a way to manipulate the sound files like the Power Sound Editor (again I stress not by changing the framerate/sample frequency of typically 22050 or 44100) and then save the changed file.
I suppose in a nut shell, if I could magically automate Power Sound Editor to produce 3 to 4 hundred sound files without me having to click on the change pitch option and then save each time, this would be like having my own python way of doing it.
Conclusion:
Assuming creating sound objects from sound files is the only way not to slow my game down (as I suspect it might be) then I need the following:
An equivalent to the python wave module, but which changes the pitch like Power Sound Editor does, and not by changing the fundamental frequency like the wave module does.
Please can someone help me and let me know if there’s a way.
I am using python 3.2.3 and pygame 1.9.2
Also I’m just using pythons IDLE and I’m not familiar with using other editors.
Also I’m aware of Numpy and of various sound modules, but definitely don’t know how to use them. Also any potential modules would need to work with the above versions of python and pygame.
Thank you in advance.
Gary Townsend.
My Reply To The First Answer From Andbdrew Is Below:
Thank you for your assistance.
It does sound like changing the wave file data rather than the wave file parameters is what I need to do. For reference here is the code I have used to create the multiple files:
framerate = 44100 #Original .wav file framerate/sample frequency
for x in range(0, 25):
file = wave.open ('MunchEatPill3Amp.wav')
nFrames = file.getnframes()
wdata = file.readframes(nFrames)
params = file.getparams()
file.close()
n = list(params)
n[0] = 2
n[2] = framerate
framerate += 500
params = tuple(n)
name = 'PillSound' + str(x) + '.wav'
file = wave.open(name, 'wb')
file.setparams(params)
print(params)
file.writeframes(wdata)
file.close()
It sounds like writing different data would be equivalent or similar to how the Power Sound Editor is changing the pitch.
So please can you tell me if you know a way to modify/manipulate wdata to effectively change the pitch, rather than alter the sample rate in params(). Would this mean some relatively simple operation applied to wdata after it’s read from my .wav file. (I really hope so) I’ve heard of using numpy arrays, but I have no clue how to use these.
Please note that any .wav files modified in the above code, do indeed play in Python using winsound, or in windows media player, with the pitch increase sounding as intended. It’s only in Pygame that they don’t.
As I’ve mentioned, it seems because Pygame has a set frequency (I guess this frequency is also sample rate), that this might be the reason the pitch sounds the same, as if it wasn’t increased at all. Whereas when played with e.g. windows media player, the change in sample rate does result in a higher sounding pitch.
I suppose I just need to achieve the same increase in pitch sound by changing the file data, and not the file parameters, and so please can you tell me if you know a way.
Thank you again for helping with this.
To Summarise My Initial Question Overall, Here It Is Again:
How do you change the pitch of a .wav file without changing the framerate/sample frequency, by using the python programming language, and not some kind of separate software program such as Power Sound Editor?
Thank You Again.
You should change the frequency of the wave in your sample instead of changing the sample rate. It seems like python is playing back all of your wave files at the same sample rate (which is good), so your changes are not reflected.
Sample rate is sort of like meta information for a sound file. Read about it at http://en.m.wikipedia.org/wiki/Sampling_rate#mw-mf-search .
It tells you the amount of time between samples when you convert a continuous waveform into a discrete one. Although your (ab)use of it is cool, you would be better served by encoding different frequencies of sound in your different files all at the same sample rate.
I took a look at the docs for the wave module ( http://docs.python.org/3.3/library/wave.html ) and it looks like you should just write different data to your audio files when you call
Wave_write.writeframes(data)
That is the method that actually writes your audio data to your audio file.
The method you described is responsible for writing information about the audio file itself, not the content of the audio data.
Wave_write.setparams(tuple)
"... Where the tuple should be (nchannels, sampwidth, framerate, nframes, comptype, compname), with values valid for the set*() methods. Sets all parameters... " ( also from the docs )
If you post your code, maybe we can fix it.
If you just want to create multiple files and you are using linux, try SoX.
#!/bin/bash
for i in `seq -20 10 20`; do
sox 'input.wav' 'output_'$i'.wav' pitch $i;
done

What is the best way to remove the echo from an audio file?

I want to run an audio file through something like skype's echo cancellation feature. iChat and other VoIPs have this feature too, but I can't find any software that I can import or open my file into.
basic approach:
determine the delay. determine the amplitude offset.
invert the signal. apply the delay. adjust the amplitude. play back both audio files.
any multitrack audio app is capable of this (e.g. audacity, protools, or logic).
for more complex signals, you will need to be more smart about your filtering, and ideally you would suppress the signals before they interfere (in a Skype scenario).
Sounds cool, and makes a lot of sense theoretically, but I still don't really know how I should go about doing it. I am fairly experienced with Logic, but I don't know how to determine the delay. Should i just make a copy of the file, invert it and move it around until it sounds good?
Just line up the transients of the 2 signals visually to determine the delay. Then you have to zoom way in and determine the delay to the sample to achieve the best cancellation. If it's not close, it won't cancel but add.
What do you mean by amplitude offset, is that the volume difference between the original and the echo noise?
Exactly. Apart from very unusual cases, the echo is going to be a different (typically lower) amplitude than the source, and you need to know this difference to cancel it best (this offset is applied to the inverted signal, btw). If the amplitude is wrong, then you will introduce the inverted signal (audibly) or, in the odd event the echo is louder than the source, reduce only part of the echo.
Once the transients are aligned (to the sample) and the signal's inverted, then determine the difference in volume -- if it's too high or too low, it won't cancel as much as it could.
Again, that's a basic approach. You can do a lot to improve it, depending on the signals and processors you have. In most cases, this approach will result in suppression, not elimination.
In order to remove echo, you need TWO files: mic & reference.
The mic is the signal that contains the echo.
The reference is the signal that contains the original audio that generated the echo.
After you have both these files You can start building the logic of echo removal. Start with the wiki page on the subject.

Echo Sound Effect

I am looking to build a a small program that reads a sound file and applies echo effect to it. I am seeking guidance for how to accomplish this.
For a simple echo (delay) effect, add a time-delayed copy of the signal to itself. You will need to make the sample longer to accommodate this. Attenuating the echo by a few dB (easily accomplished by multiplying individual sample values by a constant factor < 1) will make it sound a bit more realistic.
To achieve multiple echoes, apply the effect recursively, or set up a ring buffer with an attenuated feedback (add the output to the input).
For proper reverberation, the usual approach is to pre-calculate a reverb tail (the signal that the reverb should generate for a one-sample full-amplitude click) and convolve that with the original sample, typically with a bit of additional pre-delay.
There's a pretty concise book about DSP in general called 'Getting started with DSP'. Google it, there's a free online version.
I agree with the idea of delay and mixing,
but if you directly use a structure like that :
----<--------[low pass]-----
! !
->-(+) ---[ delay line ]-------.--->
use multiple with different delays in parallel to create echo (low pass or other filters make that easyer but it's also that in reallity most of reflected signal spectrum is low, so it sound better.
and serialized to decorrelate you signal (make that more realistic like the pysical diffusion of the sound).

Resources