So, I have spent quite a bit of time looking for the answer but I get the exact opposite (something I already know how to do), namely, the audio below a certain threshold can get muted (can be done either with ffmpeg or sox).
What I need is for the command to search my audio for anything that exceeds -18dB and mute all such segments. (The reason for this is because I have a recording where I am playing an electric piano with the audio output being recorded directly through the cable, but I am also recording from two microphones what I am speaking - between my playing. When I am playing the keyboard, the mics pick up a lot of action noise which I want to mute, and that's easy to do because the channel that records my piano input has piano sounds at that point and that corresponds exactly with when I want the audio from my mics to be muted, or at least attenuated.)
Does anyone know if such a feat is possible?
If it cannot be done with just one command, that's fine I'll try anything.
Related
My Zoom H4n somehow decided it didn't want to properly save two recordings this weekend, leaving me with four zero byte files (which I have tried any which way to open/convert, but nothing was working).
I then used CardRescue to scan the SD card for any audio it could find, and - lo and behold - I got .wav files! However, instead of two files for each session (one was an XLR output from the desk, the other the on-Zoom mics), or even a nice stereo with one left, the other right, I have a mess.
In importing as raw data to Audacity (the rescued .wavs themselves do not open), the right channel has the on-Zoom mic audio, with intermittent silence. The left has the on-Zoom audio, followed by the same part of the XLR input audio. This follows the same pattern as the silences.
I have spent hours chopping up in Garageband, but as it is audio for a video, it needs to match what 'really' happened perfectly (I appreciate for a podcast/audio-only I could relatively simply take away the on-Zoom mic audio from the left channel). I began attempting to sync the mic audio to the on-camera audio (which, despite playing around with settings is as unusable as it always is) but because it's a pattern, can't help but wonder if there's a cleaner fix: either analysing the audio somehow as there are clean lines if I look at the spectral data, or a case of adding a couple of numbers to the wav's binary that'd click the two into place?
I've tried importing to Audacity with different settings, different offsets - this has ended up in either slow audio, fast audio, or heavily distorted audio (but always the same patterns with the files).
I use a Mac (and don't know any PC users close by!) so any software suggestions will need to run on Mac. However, I'm willing to try just about anything that's not dragging tiny clips.
I am having so much trouble doing something that should be SIMPLE. I do sales for a golf course and I have to read the same thing over and over again on every call and it gets so damn annoying. I want to be able to play a pre-recorded wave/mp3 file through the mic input of my headset so I can just play the recording at the right point in the sales cycle instead of repeating it 200X a day. I have succeeded in doing it with stereo mix BUT it will disable the voice aspect of the microphone so when the recording is finished, I have to jump into setting real fast and switch the mic input - which is not doable.
I know there is a way to do this. I see twitch streamers do this sort of thing all the time. I have tried SO MANY methods and nothing seems to work.
I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!
I'm trying to make a video tutorial, so i decided to record the speeches using a TTS online service.
I use Audacity to capture the sound, and the sound was clear !
After dinning, i wanted to finish the last speeches, but the sound wasn't the same anymore, there is a background noise(parasite) which is disturbing, i removed it with Audacity, but despite this, the voice isn't the same ...
You can see here the difference between the soundtrack of the same speech before and after the occurrence of the problem.
The codec used by the stereo mix peripheral is "IDT High Definition Codec".
Thank you.
Perhaps some cable or plug got loose? Do check for this!
If you are using really cheap gear (built-in soundcard and the likes) it might very well also be a problem of electrical interference, anything from ...
Switching on some device emitting a electro magnetic field (e.g. another monitor close by)
Repositioning electrical devices on your desk
Changes in CPU load on your computer (yes i'm serious!)
... could very well cause some kinds of noises with low-fi sound hardware.
Generally, if you need help on audio sounding wrong make sure that you provide a way to LISTEN to the files, not just a visual representation.
Also in your posted waveform graphics i can see that the latter signal is more compressed, which may point to some kind of automated levelling going on somewhere in the audio chain.
I need to detect acoustic echo/sidetone from a unmuted telephone handset.
Basically I am calling a telephone handset on my mic muted computer. I then play a sound from the computer to the phone and record the incoming audio from the handset.
I need to detect if the telephone I called was on mute or not.
If its not muted I should see some sidetone/echo in the audio file.
Currently I am having issues seeing any echo in the raw audio.
Is there any software or algorithms I can run the audio file through to detect the echo/sidetone?
Is there any specific tones or freq I should play to generate the biggest echo?
Echo and sidetone are generally looked at separately although they do overlap - sidetone is sound that 'leaks' from the microphone in the handset to the speaker in the same handset (the leak can be in handset, the phone or sometimes the local linecard in the exchange/PABX), and echo is sound that travels from one party in a call to other party at the far end of the call and then 'leaks' back along the connection to the original party again.
For echo if the distance is short then it effectively behaves the same as sidetone as the user simply hears a portion of what they have said played back at almost exactly the same time they are saying it. If the distance is long enough that the 'echoed' sound is heard after the user has spoken then it sounds more like an 'echo'.
You should be able to visually see the effect by generating a sound clip into the microphone of the handset being tested which has a very distinct shape when you graph it, and then comparing that with a graph of the sound received in the earpiece of the handset. To test echo you generally need to simulate a delay somehow, or else the echo will look just like sidetone.