Echo Sound Effect - audio

I am looking to build a a small program that reads a sound file and applies echo effect to it. I am seeking guidance for how to accomplish this.

For a simple echo (delay) effect, add a time-delayed copy of the signal to itself. You will need to make the sample longer to accommodate this. Attenuating the echo by a few dB (easily accomplished by multiplying individual sample values by a constant factor < 1) will make it sound a bit more realistic.
To achieve multiple echoes, apply the effect recursively, or set up a ring buffer with an attenuated feedback (add the output to the input).
For proper reverberation, the usual approach is to pre-calculate a reverb tail (the signal that the reverb should generate for a one-sample full-amplitude click) and convolve that with the original sample, typically with a bit of additional pre-delay.
There's a pretty concise book about DSP in general called 'Getting started with DSP'. Google it, there's a free online version.

I agree with the idea of delay and mixing,
but if you directly use a structure like that :
----<--------[low pass]-----
! !
->-(+) ---[ delay line ]-------.--->
use multiple with different delays in parallel to create echo (low pass or other filters make that easyer but it's also that in reallity most of reflected signal spectrum is low, so it sound better.
and serialized to decorrelate you signal (make that more realistic like the pysical diffusion of the sound).

Related

Unknown events in nodejs/v8 flamegraph using perf_events

I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here.
Workflow is following:
run node >0.11.13 with --perf-basic-prof, which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are written.
Capture stacks using perf record -F 99 -p `pgrep -n node` -g -- sleep 30
Fold stacks using stackcollapse-perf.pl script from this repository
Generate svg flame graph using flamegraph.pl script
I get following result (which look really nice at the beginning):
Problem is that there are a lot of [unknown] elements, which I suppose should be my nodejs function calls. I assume that whole process fails somwhere at point 3, where perf data should be folded using mappings generated by node/v8 executed with --perf-basic-prof. /tmp/perf-PID.map file is created and some mapping are written to it during node execution.
How to solve this problem?
I am using CentOS 6.5 x64, and already tried this with node 0.11.13, 0.11.14 (both prebuild, and compiled as well) with no success.
FIrst of all, what "[unknown]" means is the sampler couldn't figure out the name of the function, because it's a system or library function.
If so, that's OK - you don't care, because you're looking for things responsible for time in your code, not system code.
Actually, I'm suggesting this is one of those XY questions.
Even if you get a direct answer to what you asked, it is likely to be of little use.
Here are the reasons why:
1. CPU Profiling is of little use in an I/O bound program
The two towers on the left in your flame graph are doing I/O, so they probably take a lot more wall-time than the big pile on the right.
If this flame graph were derived from wall-time samples, rather than CPU-time samples, it could look more like the second graph below, which tells you where time actually goes:
What was a big juicy-looking pile on the right has shrunk, so it is nowhere near as significant.
On the other hand, the I/O towers are very wide.
Any one of those wide orange stripes, if it's in your code, represents a chance to save a lot of time, if some of the I/O could be avoided.
2. Whether the program is CPU- or I/O-bound, speedup opportunities can easily hide from flame graphs
Suppose there is some function Foo that really is doing something wasteful, that if you knew about it, you could fix.
Suppose in the flame graph, it is a dark red color.
Suppose it is called from numerous places in the code, so it's not all collected in one spot in the flame graph.
Rather it appears in multiple small places shown here by black outlines:
Notice, if all those rectangles were collected, you could see that it accounts for 11% of time, meaning it is worth looking at.
If you could cut its time in half, you could save 5.5% overall.
If what it's doing could actually be avoided entirely, you could save 11% overall.
Each of those little rectangles would shrink down to nothing, and pull the rest of the graph, to its right, with it.
Now I'll show you the method I use. I take a moderate number of random stack samples and examine each one for routines that might be speeded up.
That corresponds to taking samples in the flame graph like so:
The slender vertical lines represent twenty random-time stack samples.
As you can see, three of them are marked with an X.
Those are the ones that go through Foo.
That's about the right number, because 11% times 20 is 2.2.
(Confused? OK, here's a little probability for you. If you flip a coin 20 times, and it has a 11% chance of coming up heads, how many heads would you get? Technically it's a binomial distribution. The most likely number you would get is 2, the next most likely numbers are 1 and 3. (If you only get 1 you keep going until you get 2.) Here's the distribution:)
(The average number of samples you have to take to see Foo twice is 2/0.11 = 18.2 samples.)
Looking at those 20 samples might seem a bit daunting, because they run between 20 and 50 levels deep.
However, you can basically ignore all the code that isn't yours.
Just examine them for your code.
You'll see precisely how you are spending time,
and you'll have a very rough measurement of how much.
Deep stacks are both bad news and good news -
they mean the code may well have lots of room for speedups, and they show you what those are.
Anything you see that you could speed up, if you see it on more than one sample, will give you a healthy speedup, guaranteed.
The reason you need to see it on more than one sample is, if you only see it on one sample, you only know its time isn't zero. If you see it on more than one sample, you still don't know how much time it takes, but you do know it's not small.
Here are the statistics.
Generally speaking it is a bad idea to disagree with a subject matter expert but (with the greatest respect) here we go!
SO urges the answer to do the following:
"Please be sure to answer the question. Provide details and share your research!"
So the question was, at least my interpretation of it is, why are there [unknown] frames in the perf script output (and how do I turn these [unknown] frames in to meaningful names)?
This question could be about "how to improve the performance of my system?" but I don't see it that way in this particular case. There is a genuine problem here about how the perf record data has been post processed.
The answer to the question is that although the prerequisite set up is correct: the correct node version, the correct argument was present to generate the function names (--perf-basic-prof), the generated perf map file must be owned by root for perf script to produce the expected output.
That's it!
Writing some new scripts today I hit apon this directing me to this SO question.
Here's a couple of additional references:
https://yunong.io/2015/11/23/generating-node-js-flame-graphs/
https://github.com/jrudolph/perf-map-agent/blob/d8bb58676d3d15eeaaf3ab3f201067e321c77560/bin/create-java-perf-map.sh#L22
[ non-root files can sometimes be forced ] http://www.spinics.net/lists/linux-perf-users/msg02588.html

Nexys3 interface to a VmodTFT

I'm trying to interface a Nexys3 board with a VmodTFT via a VHDCI connector. I am pretty new to FPGA design, and although I have experience with micro-controllers. I am trying to approach the whole problem as a FSM. However, I've been stuck on this for quite some time now. What signals constitute my power up sequence? When do I start sampling data? I've looked at the relevant datasheets and they don't make things very clearer. Any help would be greatly appreciated (P.S : I use Verilog for the design).
EDIT:
Sorry for the vagueness of my question. Here's specifically what I am looking at.
For starters, I am going to overlook the touch module. I want to look at the whole setup as a FSM. I am assuming the following states:
1. Setup connection or handshake signals
2. Switch on the LCD
3. Receive pixel data
4. Display video
5. Power off the LCD
Would this be a reasonable FSM? My main concerns are with interpreting the signals. Table 5 in the VmodTFT_rm manual shows a list of signals; however, I am having trouble understanding what signals are for what (This is my first time with display modules). I am going to assume everything prefixed with TFT_ is for the display and everything with TP_ is for the touch panel (Please correct me if I'm wrong). So what signals would I be changing in each state and what would act as inputs?
Now what changes should I make to accommodate the touch panel too?
I understand I am probably asking for too much, but I would greatly appreciate a push in the right direction as I am pretty stuck with this for a long time.
Your question could be filled out a little better, it's not clear exactly what's giving you trouble.
I see two relevant docs online (you may have seen these):
Schematic: https://digilentinc.com/Data/Products/VMOD-TFT/VmodTFT_sch.pdf
User Guide: https://digilentinc.com/Data/Products/VMOD-TFT/VmodTFT_rm.pdf
The user guide explains what signals are part of the Power up sequence
you must wait between 0.5ms and 100ms after driving TFT-EN before you can drive DE and the pixel bus
You must wait 0 to 200ms after setting up valid pixel data to enable the display (with DISP)
You must wait 160ms after enabling DISP before you start pulsing LED-EN (PWM controls the backlight)
Admittedly the documentation doesn't look great and some of the signals names are not consistent, but I think you can figure it out from there.
After looking at the user guide to understand what the signals do, look at the schematic to find the mapping between the signal names and the VHDCI pinout. Then when you connect the VHDCI pinout to your FPGA, look at your FPGA's manual to find mapping between pins on the VHDCI connector and balls of the FPGA, and then you can use the fpga's configuration settings to map an FPGA ball to a logical verilog input to your top module.
Hope that clears things up a bit, but please clarify your question about what you don't understand.

How does the Ableton Drum-To-MIDI function work?

I can't seem to find any information regarding the process that Ableton uses to efficiently detect atonal percussion and convert it into MIDI. I assume feature extraction and onset detection algorithms are executed, but I'm intrigued as to what algorithms. I am particularly interesting how its efficiency is maintained for a beatboxed input.
Cheers
Your guesses are as good as everyone else's - although they look plausible. The reality is that the way this feature is implemented in Ableton is a trade secret and likely to remain that way.
If I'm not mistaken Ableton licenses technology from https://www.zplane.de/ for these things.
I don't exactly know how the software assigns the different drum sounds, but the chapter in the live manual Convert Drums to New MIDI Track says that it can only detect kick, snare and hi-hat. An important thing is that they are identified by the transient Markers. For a good result you should manually check and adjust them. The transient Markers look like the warp Markers, but are grey.
compared to a kick and a snare for example, a beatboxed input is likely to have less difference between the individual sounds and therefore likely to be harder for Ableton to individually extract the seperate sounds (depends on the beatboxer). In any case, some combination of frequency and amplitude - more specifically(Attack, Decay, Sustain, Release) as well as perhaps the different overtone combinations that account for differences in timbre are going to be the characteristics that would have to be evaluated in order to separate the kick snare and hihat .
Before this feature existed I used gates and hi/low pass filters to accomplish a similar task. So perhaps Ableton's solution is not as complicated as we might imagine.

XAudio2 occlusion processing

I'm working on a home brew game engine and I am currently working on the audio engine implementation. This is mostly for self-educational reasons. I want to create an interface wrapper for generic audio processing, so I can switch between OpenAL, XAudio2 or other platforms as appropriate or needed. I also want this code to be reusable, so I am trying to make it as complete as possible, and have various systems implement as much functionality as possible. For the time being, I am focusing on an XAudio2 implementation and may move on to an OpenAL implementation at a later date.
I've read a good deal over the past few months on 3D processing (listener/emitter), environmental effects (reverberation), exclusion, occlusion, obstruction and direct sound. I want to be able to use any of these effects with audio playback. While I've researched the topics as best I can, I can't find any examples as to how occlusion (direct and reflection signal muffling), obstruction (direct signal muffling) or exclusion (reflection signal muffling) are actually implemented. Reading MSDN documentation seems to passive references to occlusion, but nothing directly about implementation. The best I've found is a generic "use a low-pass filter", which doesn't help me much
So my question is this: using XAudio2, how would one implement audio reflection signal muffling (exclusion) and audio direct signal muffling (obstruction) or both simultaneously (occlusion)? What would the audio graph look like, and how would these relate to reverberation environmental effects?
Edit 2013-03-26:
On further thinking about the graph, I realized that I may not be looking at the graph from the correct perspective.
Should the graph appear to be: Source → Effects (Submix) → Mastering
-or-
Should the graph appear generically as follows:
↗→ Direct → Effects ↘
Source →Mastering
↘→ Reflections → Effects ↗
The second graph would split the graph such that exclusion and obstruction could be calculated separately; part of my confusion has been how they would be processed independently.
I would think, then, that the reverb settings from the 3D audio DSP structure would be applied to the reflections path; that the doppler would be applied to either just the direct or both the direct and the reflections path; and that the reverb environmental effects would affect the reflections path only. Is this getting close to the correct audio graph model?
You want your graph to look something along the lines of:
Input Data ---> Lowpass Filter ---> Output
You adjust the Lowpass filter as the source becomes more obstructed. You can also use the lowpass filter gain to simulate absorption. The filter settings are best set up so that they are exposed in way that the could be adjusted by the Sound Designer.
This article covers sound propogation in more detail: http://engineroom.ubi.com/sound-propagation-and-diffraction-simulation/
In terms of this then been passed along the graph for environmental effects such as reverb, you just want those to be further down the graph:
Input ---> Low pass filter ---> Output ---> Reverb ----> Master Out
This way the reverberated sound will match the occluded sound (otherwise it will sound odd having the reverb mismatched to the direct signal).
Using a low pass filter sounds vague and incomplete, but there is not actually much more to the effect than filtering the high frequencies and adjusting the gain. For more advanced environmental modelling you want to research something like "Precomputed Wave Simulation for Real-Time Sound Propagation of Dynamic Sources in Complex Scenes" (I'm unable to link directly as I don't have enough rep yet!) but it may well be beyond the scope of what you are trying to achieve.

What is the best way to remove the echo from an audio file?

I want to run an audio file through something like skype's echo cancellation feature. iChat and other VoIPs have this feature too, but I can't find any software that I can import or open my file into.
basic approach:
determine the delay. determine the amplitude offset.
invert the signal. apply the delay. adjust the amplitude. play back both audio files.
any multitrack audio app is capable of this (e.g. audacity, protools, or logic).
for more complex signals, you will need to be more smart about your filtering, and ideally you would suppress the signals before they interfere (in a Skype scenario).
Sounds cool, and makes a lot of sense theoretically, but I still don't really know how I should go about doing it. I am fairly experienced with Logic, but I don't know how to determine the delay. Should i just make a copy of the file, invert it and move it around until it sounds good?
Just line up the transients of the 2 signals visually to determine the delay. Then you have to zoom way in and determine the delay to the sample to achieve the best cancellation. If it's not close, it won't cancel but add.
What do you mean by amplitude offset, is that the volume difference between the original and the echo noise?
Exactly. Apart from very unusual cases, the echo is going to be a different (typically lower) amplitude than the source, and you need to know this difference to cancel it best (this offset is applied to the inverted signal, btw). If the amplitude is wrong, then you will introduce the inverted signal (audibly) or, in the odd event the echo is louder than the source, reduce only part of the echo.
Once the transients are aligned (to the sample) and the signal's inverted, then determine the difference in volume -- if it's too high or too low, it won't cancel as much as it could.
Again, that's a basic approach. You can do a lot to improve it, depending on the signals and processors you have. In most cases, this approach will result in suppression, not elimination.
In order to remove echo, you need TWO files: mic & reference.
The mic is the signal that contains the echo.
The reference is the signal that contains the original audio that generated the echo.
After you have both these files You can start building the logic of echo removal. Start with the wiki page on the subject.

Resources