Perform Speech-to-Text in Python using pre-transcribed text as guide

Perform Speech-to-Text in Python using pre-transcribed text as guide - python-3.x

I'm working on a python application that's meant to align video clips based on what actors are saying on screen.
As an example, I have a scene where actors are reading dialogue from a script. They do the 3min scene 10 times.
I am currently transcribing what they say using speech-to-text, but because the actors are reading the same dialogue repeatedly, I want to use the pre-transcribed dialogue (the movie script) to help guide the speech-to-text engine to be more accurate.
For example:
"Are you telling me you built a time machine out of a Delorean?"
Speech to text returns:
"Are you talking me you building a time machine out of a daylight?"
I should be able to figure out where the mistakes are and estimate the correct line using the original script and lock everything against the movie script.
I'm currently using CMUSphinx in Python to get my STT data and it works very well. But I'm having some trouble with the logic on this next part.
I'll post some code shortly!
EDIT: Discovered that the search term I was looking for is "audio aligner" and "long audio aligner." These seem to be tools included in some STT packages. CMUSphinx in particular may have the ability to do this built in. Exploring that.

Related

Microsoft Speech Recognition defaults vs API

So I've been using Microsoft Speech Recognition in Windows 10, doing the training exercises, dictating text into Wordpad and correcting it, adding words to the dictionary and so on. I would like to use the software to transcribe .wav files. It appears one can do this using the Windows Speech Recognition API, but this seems to involve creating and loading one's own grammar files, which suggests to me that this would basically create a new speech recognizer, that uses the same building blocks but is a different program from the one that runs when I click "Start Speech Recognition" in the start menu. In particular, it would perform differently because of differences in training or configuration.
Am I wrong in this ? And if I'm not, is there still a way of retrieving all the data the default speech recognizer uses so I can reproduce its behavior exactly? If I need to create a separate speech recognizer with its own grammar files and separate training history and so on in order to transcribe .wav files then so be it but I'd like to better understand what's going on here.

The Woundify open source project contains examples of how to convert wav files and to text (STT).

Search in a book with speech

I am trying to build a program that will find which page/sentence in a book is read to microphone. I have the book's text and its audio content. The user will start reading from a random page and program is supposed to synch to the user and show the section of the book which is being read. It might seem useless program but please bear with me..
Would an approach similar to shazam-like programs work? I am not sure how effective those algorithms for speech. Also, the speaker will be different and might have accent and different speeds to read.
Another approach would be converting the speech to text and searching the text in the book. The problem is that the language of the book is a rare one for which there is no language model available. In addition, the script does not use latin characters which makes programming difficult (for me at least).
Is there any solutions that anyone can recommend? Would extracting features from the audio file and comparing with the "real-time" extracted features (from microphone) would work? Which features?
Any implementation/code that I can start with? Any language is ok but prefer C.

You need to use speech recognizer.
Create a language model directly from the book text. That will make the recognition of the book reading very accurate, both original reading and the reading by the user.
Use this language model to recognize the book and assign timestamps for the words or use more advanced algorithm to perform text to audio alignment.
Recognize user's speech with the book-specific language model and use the recognized text to display a position in a book.
You can use CMUSphinx for the mentioned tasks.

real time refreshing in processing

I am new to processing, i found it by searching for "draw with coding" , and i tried it, seems every time i modify the code, i have to stop and render again to get the final result
Is there any way to get updated graph without re-rendering? that can be much more convenient for creating simple figures.
if not, is there any alternative to processing that can draw a graph with coding?
I've used Tikz in Latex, but that is just for Latex, I want something that can let me draw a figure by coding, I've suffered enough though using software like coreldraw, it lacks the fundamental elegance of coding..
thanks alot！

Please have a look at the FluidForms libraries.
easy to setup
documentation and video tutorials
as long as you don't run into exceptions, live code comfortably
if you prefix public variables with param you also get sliders for free :)
Do check out the video tutorials, especially this one:
Also, if using Python isn't a problem I recommend having a look at:
NodeBox
Field
Python is a brilliant scripting language - which makes prototyping/'live coding' easy(although it can be compiled and it also plays nicely with c/c++) and is easy to pick up and a joy to use.

In Processing, you must re-run your program to see the changes (graphically), unless you write code to receive input from the user to dynamically adjust what you are drawing. For creating user interfaces there's for example the controlP5 library (http://www.sojamo.de/libraries/controlP5/).
It doesn't support "live coding" (at least that I know of).

You must re-run the code to see the new result.
If Live coding is what you're looking for, check out Fluxus (http://www.pawfal.org/fluxus/) or Impromptu (http://en.wikipedia.org/wiki/Impromptu_(programming_environment)

walking through a building created in google sketchup

I am looking for a way to take a model of a building and allow people to walk through it like a video engine.
We are also looking to run this on a viz wall, which requires OpenGL on Linux and be open source. But Something running on windows or closed source on Linux would be better than nothing.
I have found Panda3D, but I am not sure that will perform well enough for such a large model, the .egg file was over 200MB and took over 8GB of RAM to convert to their binary format.
None of our prefessors know about this, and we are having trouble finding the tools we need.

Try Flux Player, i used this on a school project a while ago.
If i recall correctly i had to export the sketchup file with the extention that the program needs, but i don't remember at the moment which one was that.
http://www.redorbit.com/news/technology/854384/media_machines_releases_flux_studiotm_20_and_flux_playertm_20/
software belongs to "media machines"
this is basically a plugin that allows you to navigate the models on a web browser using your cursor or mouse.
I believe that there are other solutions out there that allow you to do exactly what you want.

How do you visualize logfiles in realtime?

Sometimes it might be useful, but mostly just looking cool or impressive to visualize log files (anything from http requests and to bandwith usage to cups of coffee drunk per day).
I know about Visitorville which I think look a bit silly, and then there's gltail.
How do you "visualize" your log files in realtime?

There is also the logstalgia tool. Visualizes Apache logs. See http://code.google.com/p/logstalgia/ for more details and a youtube video.

You may take a look at Apache Chainsaw. This nifty tool allows Log incomes from nearly everyqhere and has live filtering and colering. If you have an already written Log, I'm not sure if it can read it, it's been a while since I used it last time (was very usefull for the prototyping phase of our JBoss server)

Google has released the Visualization API that is probably flexible enough to help you:
The Google Visualization API lets you access multiple sources of structured data that you can display, choosing from a large selection of visualizations. The Google Visualization API also provides a platform that can be used to create, share and reuse visualizations written by the developer community at large.
It requires some Javascript knowledge and includes Google Docs integration, Spreadsheet integration. Check out the Gallery for some examples.

You could take a look at this. http://www.intalisys.com. 3D realtime vis app

We use Awk and Perl scripts to parse the log files and create summary reports and "databases" (technically databases in that each row corresponds to a unique event with many columns of data about that event, but not stored in a traditional database format. We're moving in that direction). I like Awk because you can very quickly search for specific strings in the log files using regex, keep counters and gather data from the log file entries, and do all kinds of calculations with that data. Then use your favorite plotting software. We use Excel, mainly because that's what was here before I started this job. I prefer MATLAB and it's open-source cousin, Octave, which is built on gnuplot.

I prefer Sawmill for visualizing data. You can basically throw any log file against it, and it will not only autodetect its structure*, but will also decide on how to analyze it. Even if you have a custom log file, you can still define what and how shall be analyzed and visualized.

I mainly use R to visualize data, but I've heard of Orange, too.

Not sure if it fits the question, but I just released this:
numStepCsvLogVis - analyze logfile data in CSV format
It uses Python's matplotlib, is motivated by the need to visualize syslog data in context of debugging kernel circular buffer operation (and variables) in C; and it visualizes by using CSV file format as intermediary to the logfile data (I cannot explain it better in brief - take a look at the README for more detail).
It has a "step" player accessed in terminal, and can handle "live" stdin input, but unfortunately, I cannot get a better response that 1 FPS when plot renders, so I wouldn't really call it "realtime" per se - but you can use it to eventually generate sonified videos of plot animations.

A simple solution is to use Logstalgia alongside the lightweight local-web-server.
First install the above. Then, from the root folder of your site visualise your logs in realtime with:
$ ws --log-format default | logstalgia -

Using SciTe, Notepad++ or other powerful text editor which have file processing routines, so you can create a script that colorizes parts of the log or just delete some non-important lines from it

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perform Speech-to-Text in Python using pre-transcribed text as guide - python-3.x

Related

Microsoft Speech Recognition defaults vs API

Search in a book with speech

real time refreshing in processing

walking through a building created in google sketchup

How do you visualize logfiles in realtime?

Categories

Resources