Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to build a simple Audio Converter (between major file formats) using C#.NET, so I need to know the basic steps to do so.
Thanks.
Step 1: find a good third-party component(s) that do(es) conversion between file formats.
Step 2: use this component in your app.
If your intent is to write all the raw conversion code yourself, get ready for some pain. The WAV file format (aka Linear PCM) is easy enough to deal with, as long as the file is just a header plus the sample data. Often, however, WAV files are a lot more complicated than this and require much more elaborate code to locate the various RIFF chunks and parse the file.
And that's just for a very straightforward file format that (usually) does not do any encoding of any sort. The MP3 format is vastly more complex, and requires a good knowledge of FFT (Fast Fourier Transform) and the like.
Update: Alvas.Audio is one third-party C# component that may do what you need. NAudio is another.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm developing a gstreamer plugin using the rust programming language. It's a source element and gets a text as a parameter and returns the correspoding speech of the text using some kinds of TTS providers(Google, Amazon, WellSaid, etc). Some providers return an MP3 file and some WAV. So what is the best approach to send the received sound file to the src pad of the element?
Decode MP3 and return PCM for both MP3 and WAV files. (I don't know whether it's possible to decode it inside the plugin or not)
Make the source dynamic to have an MP3 pad or WAV pad.
I'm new to gstreamer and I don't know which approach is better.
Output whatever you receive and let the next elements worry about decoding. That way you're not adding unnecessary complexity to your element, and applications can decide which MP3 decoder they want to use or if they want to directly forward the MP3 elsewhere without re-encoding.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to add timestamps to book sentences, fitting the relevant audiobook.
In various languages ideally.
Here's an example:
Pride and prejudice
text from gutenberg project
audio from Librivox
My idea was to find a voice recognition tool that puts timestamps on sentences (step 1), and then map the messy transcription to the original text using levenshtein distances (step 2).
The website https://speechlogger.appspot.com/ offers a solution to the 1st step, but it's limited in character output. I could theoritically use web automation to get the job done, by starting a new recording every minute or so, but it's really dirty.
I scripted step 2 in R and tested it on a sample I got from speechlogger and it works okayish, but this could be greatly improved if the program knew the text, like when you read to train a speech recognition software. I'm not using all my information here by transcribing first.
So my questions are, what alternative ways could i have to timestamp audio files, and is there a way i can make my process smarter by letting the recognition engine know what it's supposed to recognize ?
There are many nice software packages developed for that with various level of accuracy:
Gentle - Kaldi-based aligner, works as a service.
Older implementations:
Aligner Demo in Sphinx4 - CMUSphinx toolkit in java
SAIL align - HTK-based aligner, quite some pack of perl scripts.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm not quite sure where to post this question, but I think Stack Overflow has a lot of smart people who could help.
I'm wondering if there is a way I can combine programming and electrical circuits. Can I somehow turn my computer into a signal generator to create AC waveforms which I could apply to an external circuit that I've created? Could I then program my computer with say C++ code to change the amplitude/frequency of the waveform (hopefully this if possible doesn't require assembly language which I know nothing about expect that its code that operates more directly with the CPU or something). Basically, I'm looking for a way to combine coding with electrical circuits. Anything will do. I just want to get better at both because they both interest me.
Yes, you can use your audio channel.
You have to consider its frequency response: (theoretical Maximum of 20kHz?)
You also have to buffer the audio output. Use an opamp as a buffer for that. You do not want to overload your audio jack.
You will run into challenges of how "Fast" you can send data to your audio channel. But I think it is possible.
Another way is to use a good old parallel port, IF you have one :). Those are nice to command some electronics.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to know about Linux audio, i spent a lot of time on reading but i didn't understand(clearly). Can anybody give a brief information on various Linux audio sub systems(Like OSS, ALSA, JACK, Gstreamer, Phonon, Xine)?.
Any help, Thanks in advance.
I once wrote a famous blog post about the jungle of Linux audio output formats. You can find it here.
Regrettably, the picture is no longer there, here's a copy:
It's a bit old (dating from 2007), but I hope it gives you the general idea. OSS and ALSA are the layers closest to the actual audio hardware. All the other libraries and frameworks simply talk to those lower layers. And as you can see, some of these libs and frameworks actually have wrappers around other libs and frameworks.
Which layer you want to call upon depends largely on what you wish to accomplish.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Hi i am working on a compression audio stuff and i would like to ask you about the audio format the most adequate for human voice that can concerve the same quality of my files while trasfering to the server ? Thanks
With standard audio formats, there's not much of a difference between music and speech compression. MP3, for example, is designed to only lose information that is largely imperceptible to the human ear, especially at high bit rates. MP3 is nice because can choose a bit rate that meets your data needs. If you need more extreme compression you'll definitely lose a noticeable amount of quality.
You will not be able to tune the flac codec, and it's seams overkill to use it for voice recording.
Even if mp3 is not supported natively with java, you should take a look at "lame" which is a CLI mp3 codec, very easy to use with Java (create a Process object, with the parameters you wants...)
usage:
lame.exe -V2 file.wav file.mp3
or from a wav buffer (if your application records the voice itself)
lame.exe -V2 - file.mp3