Google speech to text not working on nodejs - node.js

I have created app for speech to text converter. react frontend and nodejs API.i record audio from react and post it to nodejs.but google API result is empty.how can I fix it?
why getting always empty results?
that's my code.
ReactMic Recorder
<ReactMic
record={record}
className="sound-wave"
onStop={onStop}
onData={onData}
strokeColor="#000000"
backgroundColor="#FF4081"
mimeType="audio/wav"/>
<button onClick={startRecording} type="button">Start</button>
<button onClick={stopRecording} type="button">Stop</button>
NodeJs API
app.post('/SpeechConvert', (req, res) => {
const client = new speech.SpeechClient();
console.log(req.files.file);
req.files.file.mv('./input.wav',function (err) {
if (err) {
console.log(err);
}
})
async function speechToText() {
// The name of the audio file to transcribe
const fileData = req.files.file.data;
// Reads a local audio file and converts it to base64
const file = fs.readFileSync('input.wav');
const audioBytes = fileData.toString('base64');
// console.log(audioBytes);
// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
content: audioBytes,
};
const config = {
enableAutomaticPunctuation: true,
encoding: 'LINEAR16',
sampleRateHertz: 44100,
languageCode: 'en-US',
};
const request = {
audio: audio,
config: config,
};
// Detects speech in the audio file
const [response] = await client.recognize(request);
console.log(response);
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);
res.send({ 'transcription': transcription, 'msg': 'The Audio successfully converted to the text' });
}
speechToText().catch(console.error);
});
can anyone help me to fix this?

Related

save mp4 file in nodejs is saved with some error

I am creating a video/mp4 from a canvas in the front with react:
const canvasResult = document.getElementById(
"canvasResult"
) as HTMLCanvasElement;
const createVideo = () => {
// create video:
const chunks: any[] = []; // here we will store our recorded media chunks (Blobs)
const stream = canvasResult.captureStream(); // grab our canvas MediaStream
const rec = new MediaRecorder(stream); // init the recorder
// every time the recorder has new data, we will store it in our array
rec.ondataavailable = (e) => chunks.push(e.data);
// only when the recorder stops, we construct a complete Blob from all the chunks
rec.onstop = (e) => {
setLoadingCanvaGif(100);
resolve(new Blob(chunks, { type: "video/mp4" }));
};
rec.start();
setTimeout(() => {
clearInterval(canvaInterval);
setShowCanva(false);
rec.stop();
}, 6000); // stop recording in 6s
}
const blobvideo = await createVideo();
const fileVideo = new File([blobvideo], "video.mp4" , {
type: blobvideo.type,
});
let formData = new FormData();
formData.append("file", file);
await axios.post(
`/uploadFile`,
formData,
{
headers: {
"Content-Type": "multipart/form-data",
},
}
);
and receiving it in the backend with nodejs, I save it like this:
// using express-fileupload
const file: UploadedFile = req?.files?.file;
const targetPath = path.join(
__dirname,
`../../uploads`
);
const fileName = path.join(targetPath, `/design_${ms}.mp4`); // ms is a ramdon id
await new Promise<void>((resolve, reject) => {
file.mv(fileName, function (err: any) {
if (err) {
throw {
code: 400,
message: err,
};
}
resolve();
});
});
the problem I have is that when it is saved with the .mp4 extension, the file is not saved correctly, windows shows it like this (case 3):
If i save it as webm
If the .mp4 file (of the case 3) is passed through a video to mp4 converter (https://video-converter.com/es/)
if i save it as mp4
The problem is that when I want to use case 3 (I need it in mp4 and not in webm and I can't manually upload each video to a converter) I can't use it correctly, it generates errors.
note: the three files are played correctly by opening it with vlc or any video player
I believe MediaRecorder in Chrome only supports video/webm mimeType. Your node service will need to convert the file to mp4.
You can use MediaRecorder.isTypeSupported('video/mp4') to check if it is supported.
const getMediaRecorder = (stream) => {
// If video/mp4 is supported
if (MediaRecorder.isTypeSupported('video/mp4')) {
return new MediaRecorder(stream, { mimeType: 'video/mp4' });
}
// Let the browser pick default
return new MediaRecorder(stream);
};

google.cloud.speech.v1.RecognizeRequest.audio: object expected when I passed the object

I tried to use google cloud speech to text in my node.js project. I followed he quick startup guides from docs. I pass a audio file in wav and flac format but in both cases I receive an error: error TypeError: .google.cloud.speech.v1.RecognizeRequest.audio: object expected
try {
const speech = require('#google-cloud/speech');
const client = new speech.SpeechClient();
const filename = require('path').join(__dirname + '/../temp/test.wav');
const config = {
encoding: 'LINEAR16',
sampleRateHertz: 16000,
languageCode: 'en-US'
};
const audio = fs.readFileSync(filename).toString('base64');
const request = {
config: config,
audio: audio
};
const [response] = await client.recognize(request);
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);
as you can see request is an object and when I consoled.log it it is properly converted toString. But the function client.recognize doesn't work. Where do you think is the problem?
The audio property in the request needs to be an object with the audioBytes as a property called content, like in the example below:
// Reads a local audio file and converts it to base64
const audioBytes = fs.readFileSync(fileName).toString('base64');
// The audio file's encoding, sample rate in hertz, and BCP-47 language code
const audio = {
content: audioBytes,
};
const request = {
audio: audio,
config: config,
};

Convert mediarecorder blobs to a type that google speech to text can transcribe

I am making an app where the user browser records the user speaking and sends it to the server which then passes it on to the Google speech to the text interface. I am using mediaRecorder to get 1-second blobs which are sent to a server. On the server-side, I send these blobs over to the Google speech to the text interface. However, I am getting an empty transcriptions.
I know what the issue is. Mediarecorder's default Mime Type id audio/WebM codec=opus, which is not accepted by google's speech to text API. After doing some research, I realize I need to use ffmpeg to convert blobs to LInear16. However, ffmpeg only accepts audio FILES and I want to be able to convert BLOBS. Then I can send the resulting converted blobs over to the API interface.
server.js
wsserver.on('connection', socket => {
console.log("Listening on port 3002")
audio = {
content: null
}
socket.on('message',function(message){
// const buffer = new Int16Array(message, 0, Math.floor(data.byteLength / 2));
// console.log(`received from a client: ${new Uint8Array(message)}`);
// console.log(message);
audio.content = message.toString('base64')
console.log(audio.content);
livetranscriber.createRequest(audio).then(request => {
livetranscriber.recognizeStream(request);
});
});
});
livetranscriber
module.exports = {
createRequest: function(audio){
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
return new Promise((resolve, reject, err) =>{
if (err){
reject(err)
}
else{
const request = {
audio: audio,
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};
resolve(request);
}
});
},
recognizeStream: async function(request){
const [response] = await client.recognize(request)
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: ${transcription}`);
// console.log(message);
// message.pipe(recognizeStream);
},
}
client
recorder.ondataavailable = function(e) {
console.log('Data', e.data);
var ws = new WebSocket('ws://localhost:3002/websocket');
ws.onopen = function() {
console.log("opening connection");
// const stream = websocketStream(ws)
// const duplex = WebSocket.createWebSocketStream(ws, { encoding: 'utf8' });
var blob = new Blob(e, { 'type' : 'audio/wav; base64' });
ws.send(blob.data);
// e.data).pipe(stream);
// console.log(e.data);
console.log("Sent the message")
};
// chunks.push(e.data);
// socket.emit('data', e.data);
}
I wrote a similar script several years ago. However, I used a JS frontend and a Python backend instead of NodeJS. I remember using a sox transformer to transform the audio input into to an output that the Google Speech API could use.
Perhaps this might be useful for you.
https://github.com/bitnahian/speech-transcriptor/blob/9f186e5416566aa8a6959fc1363d2e398b902822/app.py#L27
TLDR:
Converted from a .wav format to .raw format using ffmpeg and sox.

Send audio stream from microphone to Google Speech - Javascript

I am trying to send microphone input at the client(nuxt) side to the node + socket.io server and then to the google speech api. I am getting stream from navigator.mediaDevices.getUserMedia({ audio: true }) and send it to back end using socket.io-stream. My client side code as follows.
import ss from 'socket.io-stream'
navigator.mediaDevices.getUserMedia({ audio: true }).then((mediaStream) => {
ss(this.$socket).emit('audio', mediaStream);
});
And my server code as follows.
const io = require('socket.io')(3555);
const ss = require('socket.io-stream');
const speech = require('#google-cloud/speech');
io.on('connection', (socket) => {
const client = new speech.SpeechClient({ keyFilename: 'key.json' });
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: true,
};
ss(socket).on('audio', (stream) => {
const recognizeStream = client.streamingRecognize(request)
.on('error', console.error)
.on('data', data => {
process.stdout.write(
data.results[0] && data.results[0].alternatives[0]
? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
: `\n\nReached transcription time limit, press Ctrl+C\n`
);
});
stream.pipe(recognizeStream);
});
});
But this code doesn't work and display the error TypeError: stream.pipe is not a function.
Someone please point out the error or tell me a way to achieve this. Thank you!

Save speech to text in local using node js

I'm trying to replicate the code given at https://github.com/googleapis/nodejs-speech/blob/master/samples/recognize.js. There is no error when I run it locally. But here I'm confused on where can I see the result that is created. Is there a way that I can write the result to a file?
Here is the code.
const record = require('node-record-lpcm16');
// Imports the Google Cloud client library
const speech = require('#google-cloud/speech');
// Creates a client
const client = new speech.SpeechClient();
/**
* TODO(developer): Uncomment the following lines before running the sample.
*/
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};
// Create a recognize stream
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data =>
process.stdout.write(
data.results[0] && data.results[0].alternatives[0] ?
`Transcription: ${data.results[0].alternatives[0].transcript}\n` :
`\n\nReached transcription time limit, press Ctrl+C\n`
)
);
// Start recording and send the microphone input to the Speech API
record
.start({
sampleRateHertz: sampleRateHertz,
threshold: 0,
// Other options, see https://www.npmjs.com/package/node-record-lpcm16#options
verbose: false,
recordProgram: 'sox', // Try also "arecord" or "sox"
silence: '10.0',
})
.on('error', console.error)
.pipe(recognizeStream);
console.log('Listening, press Ctrl+C to stop.');
This is very confusing :(. please let me know how can I achieve this.
Thanks
It's in the "data". Please looking into the code and see how the console logs the data.
Example:
client
.recognize(request)
.then(data => {
const response = data[0];
const transcription = response.results
.map(result => result.alternatives[0].transcript)
.join('\n');
console.log(`Transcription: `, transcription);
})

Resources