Create timestamps for subtitles in audibook [closed] - audio

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to add timestamps to book sentences, fitting the relevant audiobook.
In various languages ideally.
Here's an example:
Pride and prejudice
text from gutenberg project
audio from Librivox
My idea was to find a voice recognition tool that puts timestamps on sentences (step 1), and then map the messy transcription to the original text using levenshtein distances (step 2).
The website https://speechlogger.appspot.com/ offers a solution to the 1st step, but it's limited in character output. I could theoritically use web automation to get the job done, by starting a new recording every minute or so, but it's really dirty.
I scripted step 2 in R and tested it on a sample I got from speechlogger and it works okayish, but this could be greatly improved if the program knew the text, like when you read to train a speech recognition software. I'm not using all my information here by transcribing first.
So my questions are, what alternative ways could i have to timestamp audio files, and is there a way i can make my process smarter by letting the recognition engine know what it's supposed to recognize ?

There are many nice software packages developed for that with various level of accuracy:
Gentle - Kaldi-based aligner, works as a service.
Older implementations:
Aligner Demo in Sphinx4 - CMUSphinx toolkit in java
SAIL align - HTK-based aligner, quite some pack of perl scripts.

Related

Using your computer as a signal generator [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm not quite sure where to post this question, but I think Stack Overflow has a lot of smart people who could help.
I'm wondering if there is a way I can combine programming and electrical circuits. Can I somehow turn my computer into a signal generator to create AC waveforms which I could apply to an external circuit that I've created? Could I then program my computer with say C++ code to change the amplitude/frequency of the waveform (hopefully this if possible doesn't require assembly language which I know nothing about expect that its code that operates more directly with the CPU or something). Basically, I'm looking for a way to combine coding with electrical circuits. Anything will do. I just want to get better at both because they both interest me.
Yes, you can use your audio channel.
You have to consider its frequency response: (theoretical Maximum of 20kHz?)
You also have to buffer the audio output. Use an opamp as a buffer for that. You do not want to overload your audio jack.
You will run into challenges of how "Fast" you can send data to your audio channel. But I think it is possible.
Another way is to use a good old parallel port, IF you have one :). Those are nice to command some electronics.

How to build a simple application, that uses audio filter (eg. damping of sound level with distance) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I would like to build a really simple app.
Let say, that apps GUI consists of 2 buttons: "5 meters" and "15 meters".
When the first button is clicked, an audio file would play. When the second button is clicked, the app would apply a filter to the same audio file, so that the user will be able to hear how that same sound sounds like 10 meters away.
Firstly I would like to know, in which programming language an application like this could be written. I have some experience in Java and C++.
Secondly, I would like to know, how to build audio filters (e.g. damping of sound level with distance) and how to integrate it into the app.
I really dont know, where to start.. Any practical example or similar application with available source code would be of much help!
The sound pressure decreases by 1/r. So a doubling of the distance results in a 6 dB lower amplitude. This should be easy to model by a distance dependent amplification.
The interesting part of the problem is the sound absorption caused by air. This absorption is frequency dependent (it is higher for high frequencies) and also depends or air pressure, humidity and temperature. You can find a detailed quantitative model in the ISO 9613-1 standard.
What would be the platform for your app? iOS, android, linux, windows ...? Anyway, I recomend you to have a look at SFML. It's a library in C++ that could help you for multimedia task
about audio in SFML
there is an example for audio levels that change with distance.
Good luck!

measuring precision and recall [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We are building a text search solution and want a way to measure precision and recall of the system every time we add new document types. From reading some of the posts here it sounds like a machine learning based solution is the way to go. Can a expert comment on this? We will then look to add machine learning folks to our team.
The only way to get the F1-score require knowledge about the correct class, rank of all samples obtains by evaluation querys, and you also need thoses evaluation querys.
Any machine learning will need a large quantity of manual work to provided thoses samples and/or querys. So large that it wont save you any time.
Another bad aspect of this evaluation is through to learning-related intrinsic errors. It will go with the growing size of the index of the search engine and the number of examples required. You never get a good evaluation.
Forget machine-learning for the evaluation of search engine.
Build by hand your tests querys and sample, by the time it will become big and reliable.
If you really want machine-learning in your system, you should look at query pre-processing. Getting some meta-information about the query by another way (you say SVN, why not?) is generaly a good for performance and while it did'nt change the result, you can use the same sample for an end-to-end evaluation.
That what I have done few years ago, but with naive baye classifier on natural langage analysis.

helping getting started with audio programming languages [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a small application I have been working on for some time to help my son with his speech delay. I am using gstreamer to play phonemes, small audio clips about 100ms long.
Things are fine but I find it a bit distasteful to use a library that has such powerful video features for an audio only application.
I was thinking that an audio programing language might be able to play short audio clips on-the-fly too.
I've been reading about these DSLs and there are quite a few of them. However it seems that csound, Supercollider and Chuck are the front runners.
All I really need to do right now is to play small audio clips, preferably from a C binding, in near real time and I only need to run on Linux.
Could anyone help me pick a language for this? There are so many features I "can't see the forest for the trees".
Once I have one picked out, I will have a tool for my immediate needs and a platform to grow with as my needs change.
SuperCollider has great real-time capabilities and makes it pretty easy to play sound files via its sclang interpreter. It's also not that hard to communicate with the supercollider server (scsynth, the part of supercollider that actually synthesize sounds) via open sound control (OSC) messages, so you can control the synth from another, separate application.
Don't know much about chuck but I hear it's good for on-the-fly, live audio programming too so it might work.
I wouldn't recommand CSound since it's meant to be used for composition, more like a "compiled" language vs interpreted. You basically write a score file and generate a whole sound file from that, so it's probably not what you want.

Audio Conversion using C#.Net [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to build a simple Audio Converter (between major file formats) using C#.NET, so I need to know the basic steps to do so.
Thanks.
Step 1: find a good third-party component(s) that do(es) conversion between file formats.
Step 2: use this component in your app.
If your intent is to write all the raw conversion code yourself, get ready for some pain. The WAV file format (aka Linear PCM) is easy enough to deal with, as long as the file is just a header plus the sample data. Often, however, WAV files are a lot more complicated than this and require much more elaborate code to locate the various RIFF chunks and parse the file.
And that's just for a very straightforward file format that (usually) does not do any encoding of any sort. The MP3 format is vastly more complex, and requires a good knowledge of FFT (Fast Fourier Transform) and the like.
Update: Alvas.Audio is one third-party C# component that may do what you need. NAudio is another.

Resources