Azure Speech to Text Translations with multiple languages - speech-to-text

I'm fairly new to Azure's speech sdk so it's quite possible I'm missing something obvious so apologies if that's the case.
I've been working on a project where I want to translate an audio file/stream from one language to another. It works decently when they entire conversation is in one language (all Spanish) but it falls apart when I feed it a real conversations where there's English and Spanish. It tries to recognize the english words AS spanish words (so it'll transcribe something like 'I'm sorry' as mangled spanish).
From what I can tell, you can set multiple target languages (language to translated into) but only one speechRecognitionLanguage. That seems to imply that it can't handle conversations where there's multiple languages (like a phone call with a translator) or if speakers flip between languages. Is there a way to make it work with multiple languages or is that just something Microsoft hasn't quite gotten around to yet?
Here's the code I have right now (it's just a lightly modified version of the example on their github):
// pull in the required packages.
var sdk = require("microsoft-cognitiveservices-speech-sdk");
(function() {
"use strict";
module.exports = {
main: function(settings, audioStream) {
// now create the audio-config pointing to our stream and
// the speech config specifying the language.
var audioConfig = sdk.AudioConfig.fromStreamInput(audioStream);
var translationConfig = sdk.SpeechTranslationConfig.fromSubscription(settings.subscriptionKey, settings.serviceRegion);
// setting the recognition language.
translationConfig.speechRecognitionLanguage = settings.language;
// target language (to be translated to).
translationConfig.addTargetLanguage("en");
// create the translation recognizer.
var recognizer = new sdk.TranslationRecognizer(translationConfig, audioConfig);
recognizer.recognized = function (s, e) {
if (e.result.reason === sdk.ResultReason.NoMatch) {
var noMatchDetail = sdk.NoMatchDetails.fromResult(e.result);
console.log("\r\nDidn't find a match: " + sdk.NoMatchReason[noMatchDetail.reason]);
} else {
var str = "\r\nNext Line: " + e.result.text + "\nTranslations:";
var language = "en";
str += " [" + language + "] " + e.result.translations.get(language);
str += "\r\n";
console.log(str);
}
};
//two possible states, Error or EndOfStream
recognizer.canceled = function (s, e) {
var str = "(cancel) Reason: " + sdk.CancellationReason[e.reason];
//if it was because of an error
if (e.reason === sdk.CancellationReason.Error) {
str += ": " + e.errorDetails;
console.log(str);
}
//We've reached the end of the file, stop the recognizer
else {
recognizer.stopContinuousRecognitionAsync(function() {
console.log("End of file.");
recognizer.close();
recognizer = undefined;
},
function(err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
})
}
};
// start the recognizer and wait for a result.
recognizer.startContinuousRecognitionAsync(
function () {
console.log("Starting speech recognition");
},
function (err) {
console.trace("err - " + err);
recognizer.close();
recognizer = undefined;
}
);
}
}
}());

As of now (August) Speech SDK translation supports translation from one input language into multiple output languages.
There are services in development that support recognition of the spoken language. These will enable us to run translation from multiple input languages into multiple output languages (both set of languages you would specify in the config). There is no ETA for the availability yet ...
Wolfgang

According to the section Speech translation of the offical document Language and region support for the Speech Services, as below, I think you can use Speech translation instead of Speech-To-text to realize your needs.
Speech translation
The Speech Translation API supports different
languages for speech-to-speech and speech-to-text translation. The
source language must always be from the Speech-to-Text language table.
The available target languages depend on whether the translation
target is speech or text. You may translate incoming speech into more
than 60 languages. A subset of these languages are available for
speech synthesis.
Meanwhile, there is the offical sample code Azure-Samples/cognitive-services-speech-sdk/samples/js/node/translation.js for Speech translation.
I do not speak in Spanish, so I can not help to test an audio in English and Spanish for you.
Hope it helps.

Related

How to pass source language code to Google Cloud Translate Basic API Edition in Node.js

Any ideas?
This is what my code looks like. I followed the basic project setup guide for Node.
Init
const {Translate} = require('#google-cloud/translate').v2;
const translate = new Translate({projectId, credentials});
const text = 'The text to translate, e.g. Hello, world!';
const target = 'es';
The function to translate.
async function translateText() {
let [translations] = await translate.translate(text, target);
translations = Array.isArray(translations) ? translations : [translations];
console.log('Translations:');
translations.forEach((translation, i) => {
console.log(`${text[i]} => (${target}) ${translation}`);
});
}
It's pretty frustrating that every API doc mentions that the source language gets automatically "detected", without any information about how to stop Google from guessing at it. Seems like something Google would benefit from also... Less work for them.
Found in the package's README.md
They want us to pass an "options" object that contains the from/to language codes instead of just the target (to) language code.
So we define this object
const options = {
from: 'en',
to: 'es'
};
and then our
translate.translate(text, target);
becomes
translate.translate(text, options);
it's also possible to pass a callback.
translate.translate(text, options, (err, translation) => {
if (!err) {
// translation = 'Hola'
}
});

Why can't my translation script handle cyrillic or chinese letters?

I'm trying to build a really simple translating script for people who want to say something in their native language and get it translated into english. It works surprisingly well, but it cannot handle cyrillic (or chinese for example) letters at all.
if(message.content.toLowerCase().startsWith("+t ")) {
var args = message.content.substring(3).split(" ");
if(!args[0]) {
message.channel.send("What do you want to translate?");
return;
}
let gurl = "https://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=english&dt=t&q=" + args.join("%20");
request(gurl, function(error, response, body) {
try {
let translated = body.match(/^\[\[\[".+?",/)[0];
translated = translated.substring(4, translated.length - 2);
message.delete().catch(O_o=>{});
message.channel.send("**"+message.author.username+"** (via translator): " + translated);
} catch(err) {
message.channel.send("Failed: "+err);
}
});
}
err = TypeError: Cannot read property 'match' of undefined if I try to translate them. Do I need to encode them somehow compatible to latin first or how should I approach the problem?

Are callbacks important in an Alexa skill?

I'm trying to understand how to use Node.js callbacks specifically why and when you would use them in an Alexa skill.
The highlow game sample https://github.com/alexa/skill-sample-nodejs-highlowgameuses employs a callback when the correct number has been guessed but
if I move the callback code into the NumberGuessIntent function the the skill appears to behave exactly the same so what is the purpose of that callback?
Code without a callback:
'NumberGuessIntent': function() {
var guessNum = parseInt(this.event.request.intent.slots.number.value);
var targetNum = this.attributes["guessNumber"];
console.log('user guessed: ' + guessNum);
if(guessNum > targetNum){
this.emit('TooHigh', guessNum);
} else if( guessNum < targetNum){
this.emit('TooLow', guessNum);
} else if (guessNum === targetNum){
this.handler.state = states.STARTMODE;
this.attributes['gamesPlayed']++;
this.emit(':ask', guessNum.toString() + 'is correct! Would you like to play a new game?',
'Say yes to start a new game, or no to end the game.');
} else {
this.emit('NotANum');
}
},
Callbacks are used for async operations such as fetching data from a Web API.
It will be easier for you to use Promises that like you execute async operation exactly as you would run sync code. See more here https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Using_promises

Incremental and non-incremental urls in node js with cheerio and request

I am trying to scrape data from a page using cheerio and request in the following way:
1) go to url 1a (http://example.com/0)
2) extract url 1b (http://example2.com/52)
3) go to url 1b
4) extract some data and save
5) go to url 1a+1 (http://example.com/1, let's call it 2a)
6) extract url 2b (http://example2.com/693)
7) go to url 2b
8) extract some data and save etc...
I am struggling work out how to do this (note, I only am familiar with node js and cheerio/request for this task even though it is likely not elegant, so am not looking for alternative libraries or languages to do this in, sorry). I think I am missing something because I can't even think how this could work.
EDIT
Let me try this in another way. here is the first part of code:
var request = require('request'),
cheerio = require('cheerio');
request('http://api.trove.nla.gov.au/result?key=6k6oagt6ott4ohno&zone=book&l-advformat=Thesis&sortby=dateDesc&q=+date%3A[2000+TO+2014]&l-availability=y&l-australian=y&n=1&s=0', function(error, response, html) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(html, {
xmlMode: true
});
var id = ($('work').attr('id'))
var total = ($('record').attr('total'))
}
});
The first returned page looks like this
<response>
<query>date:[2000 TO 2014]</query>
<zone name="book">
<records s="0" n="1" total="69977" next="/result?l-advformat=Thesis&sortby=dateDesc&q=+date%3A%5B2000+TO+2014%5D&l-availability=y&l-australian=y&n=1&zone=book&s=1">
<work id="189231549" url="/work/189231549">
<troveUrl>http://trove.nla.gov.au/work/189231549</troveUrl>
<title>
Design of physiological control and magnetic levitation systems for a total artificial heart
</title>
<contributor>Greatrex, Nicholas Anthony</contributor>
<issued>2014</issued>
<type>Thesis</type>
<holdingsCount>1</holdingsCount>
<versionCount>1</versionCount>
<relevance score="0.001961126">vaguely relevant</relevance>
<identifier type="url" linktype="fulltext">http://eprints.qut.edu.au/65642/</identifier>
</work>
</records>
</zone>
</response>
The URL above needs to increase incrementally s=0, s=1 etc. for 'total' number of times.
'id' needs to be fed into the url below in a second request:
request('http://api.trove.nla.gov.au/work/" +(id)+ "?key=6k6oagt6ott4ohno&reclevel=full', function(error, response, html) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(html, {
xmlMode: true
});
//extract data here etc.
}
});
For example when using id="189231549" returned by the first request the second returned page looks like this
<work id="189231549" url="/work/189231549">
<troveUrl>http://trove.nla.gov.au/work/189231549</troveUrl>
<title>
Design of physiological control and magnetic levitation systems for a total artificial heart
</title>
<contributor>Greatrex, Nicholas Anthony</contributor>
<issued>2014</issued>
<type>Thesis</type>
<subject>Total Artificial Heart</subject>
<subject>Magnetic Levitation</subject>
<subject>Physiological Control</subject>
<abstract>
Total Artificial Hearts are mechanical pumps which can be used to replace the failing natural heart. This novel study developed a means of controlling a new design of pump to reproduce physiological flow bringing closer the realisation of a practical artificial heart. Using a mathematical model of the device, an optimisation algorithm was used to determine the best configuration for the magnetic levitation system of the pump. The prototype device was constructed and tested in a mock circulation loop. A physiological controller was designed to replicate the Frank-Starling like balancing behaviour of the natural heart. The device and controller provided sufficient support for a human patient while also demonstrating good response to various physiological conditions and events. This novel work brings the design of a practical artificial heart closer to realisation.
</abstract>
<language>English</language>
<holdingsCount>1</holdingsCount>
<versionCount>1</versionCount>
<tagCount>0</tagCount>
<commentCount>0</commentCount>
<listCount>0</listCount>
<identifier type="url" linktype="fulltext">http://eprints.qut.edu.au/65642/</identifier>
</work>
So my question is now how do I tie these two parts (loops) together to achieve the result (download and parse about 70000 pages)?
I have no idea how to code this in JavaScript for Node.js. I am new to JavaScript
You can find out how to do it by studying existing famous website copiers (closed source or open source)
For example - use trial copy of http://www.tenmax.com/teleport/pro/home.htm to scrap your pages and then try the same with http://www.httrack.com and you should get the idea how they did it (and how you can do it) quite clearly.
The key programming concepts are lookup cache and task queue
Recursion is not the successful concept here if your solution should scale well up to several node.js worker processes and up to many pages
EDIT: after clarifying comments
Before you start reworking your scrapping engine into more scale-able architecture, as a new Node.js developer you can start simply with synchronized alternative to the Node.js callback hell as provided by the wait.for package created by #lucio-m-tato.
The code below worked for me with the links you provided
var request = require('request');
var cheerio = require('cheerio');
var wait = require("wait.for");
function requestWaitForWrapper(url, callback) {
request(url, function(error, response, html) {
if (error)
callback(error, response);
else if (response.statusCode == 200)
callback(null, html);
else
callback(new Error("Status not 200 OK"), response);
});
}
function readBookInfo(baseUrl, s) {
var html = wait.for(requestWaitForWrapper, baseUrl + '&s=' + s.toString());
var $ = cheerio.load(html, {
xmlMode: true
});
return {
s: s,
id: $('work').attr('id'),
total: parseInt($('records').attr('total'))
};
}
function readWorkInfo(id) {
var html = wait.for(requestWaitForWrapper, 'http://api.trove.nla.gov.au/work/' + id.toString() + '?key=6k6oagt6ott4ohno&reclevel=full');
var $ = cheerio.load(html, {
xmlMode: true
});
return {
title: $('title').text(),
contributor: $('contributor').text()
}
}
function main() {
var baseBookUrl = 'http://api.trove.nla.gov.au/result?key=6k6oagt6ott4ohno&zone=book&l-advformat=Thesis&sortby=dateDesc&q=+date%3A[2000+TO+2014]&l-availability=y&l-australian=y&n=1';
var baseInfo = readBookInfo(baseBookUrl, 0);
for (var s = 0; s < baseInfo.total; s++) {
var bookInfo = readBookInfo(baseBookUrl, s);
var workInfo = readWorkInfo(bookInfo.id);
console.log(bookInfo.id + ";" + workInfo.contributor + ";" + workInfo.title);
}
}
wait.launchFiber(main);
You could use the additional async module to handle multiple request and iteration through several pages. Read more about async here https://github.com/caolan/async.

IndexedDB very slow compared to WebSQL, what am i doing wrong?

I made a demo chrome extension to compare websql and indexeddb and to learn how both worked in more detail.
To my surprise it showed that indexeddb is a lot slower even compared to the most naive sql command.
Since websql have been deprecated in favor of indexeddb i assumed indexeddb would be as fast or faster than websql.
I'm assuming i'm doing something wrong in the indexeddb code.
Because deprecating something that is much faster would be stupid and i assume they knew what they were doing when deprecating websql in favor of indexeddb.
The sql search code:
// Search entries
var term = search_query;
db.transaction(function(tx) {
tx.executeSql('SELECT * FROM places', [], function (tx, results) {
console.log("sql search");
var count = 0;
var wm = WordsMatch.init(term.trim().toLowerCase());
var len = results.rows.length
for (var i = 0; i < len; ++i) {
var item = results.rows.item(i);
if (wm.search(item.url.toLowerCase())) {
//console.log(item.id, item.url);
++count;
}
}
console.log("Search matches:", count);
console.log("\n");
});
}, reportError);
The indexeddb search code:
PlacesStore.searchPlaces(search_query, function(places) {
console.log("indexedDB search");
var count = places.length;
console.log("Search matches:", count);
console.log("\n");
});
var PlacesStore = { searchPlaces: function (term, callback) {
var self = this,
txn = self.db.transaction([self.store_name], IDBTransaction.READ_ONLY),
places = [],
store = txn.objectStore(self.store_name);
var wm = WordsMatch.init(term.trim().toLowerCase());
Utils.request(store.openCursor(), function (e) {
var cursor = e.target.result;
if (cursor) {
if (wm.search(cursor.value.url.toLowerCase())) {
places.push(cursor.value);
}
cursor.continue();
}
else {
// we are done retrieving rows; invoke callback
callback(places);
}
});
}
}/**/
var Utils = {
errorHandler: function(cb) {
return function(e) {
if(cb) {
cb(e);
} else {
throw e;
}
};
},
request: function (req, callback, err_callback) {
if (callback) {
req.onsuccess = function (e) {
callback(e);
};
}
req.onerror = Utils.errorHandler(err_callback);
}
};
I have also made a chrome bug report and uploaded the full extension code there:
http://code.google.com/p/chromium/issues/detail?id=122831
(I cant upload the extension zip file here, no such feature)
I filled both websql and indexeddb databases each with 38862 urls that i used as test data.
Part of the problem is that IndexedDB implementations have so far mostly been working on getting the full spec implemented, and less focused on performance. We recently found some really stupid bugs in Firefox which got fixed and should make us significantly faster.
I know the chrome team has suffered some challenges because of their multi-process architecture. I'm told that they've fixed some of these issues recently.
So I'd encourage you to try the latest version of all browsers, possibly including nightly/canary builds.
However note that we didn't deprecate WebSQL because IndexedDB was faster. We deprecated WebSQL because it wasn't future proof. WebSQL was defined to use a specific SQLite backend (if you look at the spec it's actually written clearly there). However all browser manufacturers need to use the latest version of SQLite in order to pick up security, performance and stability fixes. And the latest versions always change the SQL syntax in subtle ways. Which means that we would have broken your WebSQL-using web applications in subtle ways. This didn't seem ok to us.
Answer: You're not doing anything wrong. Your IndexedDB code is correct. As for the conclusion, others have found this to be true as well.
Extra: One interesting thing to note is that IndexedDB is implemented differently across browsers. Firefox uses SQLLite and Chrome LevelDB, so even if you're using IndexedDB in FF you're still using a SQL-backed technology with SQL-like overhead (plus everything else).
I would be curious to see your results at different sized databases. I would hope, but cannot yet confirm, that IndexedDB would scale better across larger datasets (even though 38862 does seem sufficiently large).

Resources