How to detect language from audio in python? - python-3.x

I have tried using FFMPEG for audio extraction from video.
How to transcribe speech to text and detect the language?
I have tried using ,
Myspokenlangauge
Google cloud -speech to text api

You can use speech to text to detect the words.
Send the words to google translate which automatically detects the language.
Then you can web-scrape the auto detected tag.

You can take the text from speech to text engine using one of the most used languages for your use case.
Create a labelled dataset to train NLP model - for text classification.
Use this model to detect the text coming out of STT engine.

Related

A way to translate images?

In Python I am attempting to translate Arabic characters within a image. I can provide the language 'source' type (Arabic) and 'destination' (English). Is there a python library or API that is free that I can use for this? I.e that provides a service like https://translate.google.com, that allows for cloud image translation (the uploading of images containing non-translated characters) and downloading of images containing the destination characters translated within the image? Or a library to do this locally within my system (i.e. detect Arabic characters from an image containing Arabic text, extract the Arabic characters, for using cloud translation services (e.g. google translate) and then modify the image containing Arabic characters with the newly translated English characters? So, my goal is to modify/replace the Arabic characters within an image containing Arabic characters with English characters that are the translated characters of the original/extracted Arabic characters. I know Yandex / https://translate.yandex.com/ocr allows for this however you must pay for their translation API. How could I do this?
While I'm not sure if there is support for Arabic, there are libraries like OpenCV2 for python and pytesseract to extract text from image. Then you can use another library like translate to finish the process from there. https://pypi.org/project/translate/

How to do isolated word recognition using pocketsphinx

I tried to follow this link for isolated word recognition
Speech to text for single word
But when I am providing -keyphrase, it is giving me the result (keyphrase word) even if I am giving a wrong keyphrase.
Is this expected?

Do I need to add updated phoneme sequence of words to .dict file while adapting AM using cmusphinx?

I am trying to adapt en-us acoustic model with indian english accent recordings. Since many words are pronounced in different accent, do I need to add the updated phoneme representation of words? Currently I am following this link: https://cmusphinx.github.io/wiki/tutorialadapt/#accumulating-observation-counts and here nothing is mentioned about updating your .dict file.
PS: Should I add new words directly in the dictionary?
There is Indian English model in downloads, you should use it instead. It comes with Indian English dictionary.

Getting Speaker labels or Splitting audio file that contains dialogue between two person?

I have audio file that contains dialogue between two person I want to split audio file into two audio files that contains dialogue of both in separate file.
I am want to use this audio files further for Speech to Text.
Or
Getting Speaker labels about the dialogue spoken by each of them.
IBM provides Speaker labels but I want something other than IBM.
Is there any way of splitting the audio file according to speaker?
I want to do this through some API or program.

text to phonemes converter

I'm searching for a tool that converts text to phonemes, (like text to speech software)
I can program one but it will not be without errors and takes a lot of time!
so my question is:
is there a simple tool for converting e.g.
"hello" to "HH AH0 L OW1"
maybe some command-line tool so i can capture the stdout?
i'm searching for the phonemes in 'Arpabet' style (see the 'hello' example).
espeak does something like that but the output is not in Arpabet style and the phonemes are
not split by some determiner.
If you had searched for Arpabet on wiki you would have found your answer. The CMU guys have prepared scripts which convert most english words to their respective Arpabet phonetic break up.
If you want the phone sequence of a couple of words you can use their interface here. But, if you want it for a big file then you might have to run their scripts on your own. They used to have a working page here, but it seems to be not working now.

Resources