I'm trying to make a video tutorial, so i decided to record the speeches using a TTS online service.
I use Audacity to capture the sound, and the sound was clear !
After dinning, i wanted to finish the last speeches, but the sound wasn't the same anymore, there is a background noise(parasite) which is disturbing, i removed it with Audacity, but despite this, the voice isn't the same ...
You can see here the difference between the soundtrack of the same speech before and after the occurrence of the problem.
The codec used by the stereo mix peripheral is "IDT High Definition Codec".
Thank you.
Perhaps some cable or plug got loose? Do check for this!
If you are using really cheap gear (built-in soundcard and the likes) it might very well also be a problem of electrical interference, anything from ...
Switching on some device emitting a electro magnetic field (e.g. another monitor close by)
Repositioning electrical devices on your desk
Changes in CPU load on your computer (yes i'm serious!)
... could very well cause some kinds of noises with low-fi sound hardware.
Generally, if you need help on audio sounding wrong make sure that you provide a way to LISTEN to the files, not just a visual representation.
Also in your posted waveform graphics i can see that the latter signal is more compressed, which may point to some kind of automated levelling going on somewhere in the audio chain.
Related
What I want is to be able to get a signal at my raspberry pi at home when I'm not at home so I can e.g. wake up my PC. I always have an old phone lying around that I never really use. So I thought, I can call my phone, a specific mp3 ringtone plays, my raspberry pi listens and recognizes the ringtone and therefore the signal. So I can pretty much chose whatever ringtone I want (but hopefully a not too long one). But the problem is, that it should be recognizable by the raspberry and it should be distinguishable from other sounds. At best I can play random music at home and it will not get signalled until it's the specific ringtone i chose.
So I'm at the very beginning of the project and I have a lot of question. Is this even feasible? How do I listen to the ringtone? Should I use a normal microphone or could I e.g. trigger some gpio pin as long as a specific frequency is played? What kind of ringtone should I use to be as distinguishable as possible? And how to create the software to recognize the sound?
I know this is a lot and I don't expect a step by step solution. But maybe you got some hints to get me in the right direction?
If someone has a similar problem, I found a solution: First I had to choose between a mostly hardware solution and a mostly software solution. The hardware solution is to filter specific frequencies. This seems to be pretty hard using normal band-pass filters if you want narrow bands. There are also components that can do that, now I know of the NE567. But this component only reacts to one frequency and takes quite a lot of energy. To recognize a ringtone, more of these components are needes which means more power consumption. Additionally this solution is pretty unflexible.
So I went for the software solution. Now I have an Arduino Uno that gets an amplified electret microphone signal at an analog input pin. The data is collected and simultaneously analysed with an FFT algorithm. Then I check the dominant frequency if there is any and safe it in an array. Everytime a got a new data point I compare the array with the pattern of my ringtone and calculate a score for the match. If the score is big enough the ringtone is "found" and I can trigger my event.
I'm actually pretty pleased with the solution because it works quite well even with the phone some feet away from the microphone. I thought I need to put the microphone almost directly next to the phone to get good results, but I dont have to. It's still a little sensitive, because the sound volume shouldnt be too high or to low. But with the right volume settings it works with a quite big area when the phone is in the same room. It works even better with some space between microphone and phone, because the phones radiation from the call seems to disturb the circuit quite a lot. There is also the problem, that other noises block the ringtone recognition. I could compensate that with my algorithm, but I almost used up all resources of the Arduino, so I had to keep the algorithm simple. But in my case I dont have a noisy environment, so this is not a problem for me. Another pro is that my event was never triggered from another sound and it seems almost impossible that this could happen by accident.
So it is feasible and I think its actually a quite elegant solution. I also thought about a vibration detection or even directly using the vibration motor's signal but I have no control over the vibration function of that old phone. But I can chose the ringtone for every contact, so I only gave the "magic" ringtone to myself and so the event can only be triggered by myself. I only have to say, that writing the software was kind of hard with the Arduinos limitations. Because I need the data in real time I have limited time for the calculation. I had to limit the incomping data and therefore I can only listen to frequencies up to 10kHz. But the ringtone recognition is still possible and I think it was worth the effort. :)
I have recorded an audio.
I dont know how it happened that only one sided speech is recorded and the other speech is recorded with a very low sound.
Is there any solution to amplify the other side signal.
any help would be much appreciated.
This question is probably more appropriately asked at a forum where recording and mixing is discussed. For example: https://sound.stackexchange.com/
The ideal would be to improve your recording situation, to control factors so the sound are more closely matched. (Match microphones, isolate the speakers from environmental sounds, optimize input levels, etc.)
After that, the next option or step is to pre-process your audio files with a tool like Audacity. Use this or another DAW (Digital Audio Workstation) tool to match amplitudes or employ noise filtering or a range of other tools.
Audio processing is both tricky (an "art") and cpu intensive, so it's good to get as much of this handled as possible before the sounds are imported into a program.
I have a situation where I have a video capture of HD content via HDMI with audio from a sound board that goes through a impedance drop into a microphone input of a camcorder. That same signal is split at line level to a 'line in' jack on the same computer that is capturing the HDMI. Alternatively I can capture the audio via USB from the soundboard which is probably the best plan, but carries with it the same issue.
The point is that the line in or usb capture will be much higher quality than the one on HDMI because the line out -> impedance change -> mic in path generates inferior quality in that simply brushing the mic jack on the camera while trying to change the zoom (close proximity) can cause noise on the recording.
So I can do this today:
Take the good sound and the camera captured sound and load each into
audacity and pretty quickly use the timeshift toot to perfectly fit
the good audio to the questionable audio from the HDMI capture and
cut the good audio to the exact size of the video. Then I can use
ffmpeg or other video editing software to replace the questionable
audio with the better audio.
But while somewhat quick and easy, it always carries with it a bit of human error and time. I'd like to automate this if possible as this process is repeated at least weekly throughout the year.
Does anyone have a suggestion if any of these ideas have merit or could suggest another approach?
I suspect but have yet to confirm that the system timestamp of the start time may be recorded in both audio captured with something like Audacity, or the USB capture tool from the sound board as well as the HDMI mpeg-2 video. I tried ffprobe on a couple audacity captured .wav files but didn't see anything in the results about such a time code, but perhaps other audio formats or other probing tools may include this info. Can anyone advise if this is common with any particular capture tools or file formats?
if so, I think I could get best results by extracting this information and then using simple adelay and atrim filters in ffmpeg to sync reliably directly from the two sources in one ffmpeg call. This is all theoretical for me right now-- I've never tried either of these filters yet-- just trying to optimize against blind alleys by asking for advice up front.
If such timestamps are not embedded, possibly I can use the file system timestamp for the same idea expressed in 1a, but I suspect the file open of the two capture tools may have different inherant delays. Possibly these delays will be found to be nearly constant and the approach can work with a built-in constant anticipation delay but sounds messy and less reliable than idea 1. Still, I'd take it, if it turns out reasonably reliable
Are there any ffmpeg or general digital audio experts out there that know of particular filters that can be employed on the actual data to look for similarities like normalizing the peak amplitudes or normalizing the amplification of the two to some RMS value and then stepping through a short 10 second snippet of audio, moving one time stream .01s left against the other repeatedly and subtracting the two and looking for a minimum? Sounds like it could take a while, but if it could do this in less than a minute and be reliable, I suspect it could work. But I have only rudimentary knowledge of audio streams and perhaps what I suggest is just not plausible-- but since each stream starts with the same source I think there should be a chance. I am just way out of my depth as to how to go down this road, so if someone out there knows such magic or can throw me some names of filters and example calls, I can explore if I can make it work.
any hardware level suggestions to take a line level output down to a mic level input and not have the problems I am seeing using a simple in-line impedance drop module, so that I can simply rely on the audio from the HDMI?
Thanks in advance for any pointers or suggestinons!
On a basic embedded systems speaker with a single line of output, wiggling the output as 0 or 1 in a for given periods produces sound.
I'd like to do something similar on a modern Linux desktop. A brief look-see of Portaudio, OpenAL, and ALSA suggests to me that most people do things at a considerable higher level. That's ok, but not what I'm looking for.
(I've never worked with sounds on Linux before, so if a tutorial exists, I'd love to see it).
Actually, it... kinda is. While you can generate the waveform yourself, you still need to use an API to queue it and send it to the audio hardware; there no longer even exists a sane way to twiddle the audio line directly. Plus you get cross-platform compatibility for free.
[...] embedded systems speaker with a single line of output, wiggling the output as 0 or 1 in a for given periods produces sound.
Sounds a lot like the old PC speaker. You might still find code for it in the Linux kernel.
I'd like to do something similar on a modern Linux desktop.
Then you need AFAIK a driver for ALSA. There you can find infos on how to write an ALSA driver. Use PWM to produce the sound.
Since there are many different sound cards and audio interfaces produced by different companies, there is no uniform way to have a low level access to them. With most sound I/O APIs what you need to do is to generate the PCM data and send that to the driver. That's pretty much the lowest level you can go.
But PCM data is very similar to the 0-1 approach you describe. It's just that you have the in-between options too. 0-1 is 1-bit audio. 8-, 16-, 24-bit audio is what you'll find on a modern sound card. There are also 32- and 64-bit float formats. But they're still similar.
I am currently thinking about what I could do to measure the time it takes from the point where the computer gets audio input (through a normal audio input on a soundcard) to the point where there's something to work with, e.g. noise cancellation or something like that.
The main problem I reckon is to measure when the audio signal was created and the synchronization of the sender and receiver.
So far I came up with the following ideas:
Use the serial port to transmit timing information
Put a timestamp into the audio signal
Transmit a recurring signal - a delay would be visible
Do you have more ideas or something that I m not seeing in mine? I thought I would find more academic work on this matter but was sad to see that this is not the case, am I searching wrong?
you can check the latency in windows with this tool they also have some great info on the site also you can read up on the ASIO drivers or try to reach out to the communities that use these tools (DJs Guitar modeling scene) another great source of information is open Source projects like JACK that have more technical information:
Latency Tool:
http://www.thesycon.de/deu/latency_check.shtml
Asio Wikipedia Page:
http://en.wikipedia.org/wiki/Audio_Stream_Input/Output
Guitar Amp Modeling:
http://www.guitarampmodeling.com/
JACK Project homepage:
http://jackaudio.org/
Hope that helps.