We are using AWS serverless architecture for our Contact center. We are storing audio recordings on S3 bucket and using lambda functions to process them.
Our requirement is to remove sensitive details from audio recording such as Payment information.
So we need to fetch audio recording from S3 bucket and slice that using start time and duration for sensitive payment details and then join remaining recording clips into one.
How can we achieve this by using AWS lambda(NodeJS/Python), S3?
Thanks,
Ganesh
I did not try this myself yet, but I'd use the lambda-audio package, which contains SoX, a swiss army knife for sound files and then run the trim command as described here.
Here is some code to get you started:
lambdaAudio.sox('./input.mp3 /tmp/output.wav trim 0 10')
.then(response => {
// Do something when the first 10 seconds of the file have been extracted
})
.catch(errorResponse => {
console.log('Error from the sox command:', errorResponse)
})
Related
Currently, I'm trying to convert a response of Kinesis Vides Stream GetMedia API to an audio file but have had no success in this. According to AWS documentation, - result of GetMedia request? it's recommended to use Kinesis Video Stream Parser Library. But I'd like to use js/ts implementation. Is it possible to convert this stream to an audio file using just js/ts?
Thank you for your help.
Can I use lambda to compress images under a bucket?
I can get the images under a particular bucket visa listObject. How do you compress these returns and write it in another bucket?
Yes, you can absolutely use lambda. Try this library: aws-lambda-image-compressor
AWS lambda function to compress and resize images
This is a Lambda Function which resizes/reduces images automatically. When an image is put on some AWS S3 bucket, this function will resize/reduce it and save it into a new bucket. I have used it in the past and I loved it.
Usage
edit lambda-config.js file and assign name, description, memory size, timeout of your lambda function.
edit .env file with your AWS access data
npm install
gulp deploy
You can also try this other library which is more popular in the crowd - aws-lambda-image
If you really want to create something of your own and want a good start.
I would recommend these 2 articles that explain it very well -
Image conversion using Amazon Lambda and S3 in Node.js
Automating Image Compression Using S3 & Lambda
If you are fine to use Amazon API Gateway then u can follow this AWS Compute Blog -
Resize Images on the Fly with Amazon S3, AWS Lambda, and Amazon API Gateway
Hope this was useful.
I'm storing audio files on Google Cloud Storage (through Firebase storage).
I need to use FFMPEG to convert the audio file from stereo (two channels) to mono (one channel).
How can I perform the above conversion on Google Cloud Platform?
Update:
I suspect one possibility is to use Google Compute Engine to create a virtual machine, install ffmpeg, and somehow gain access to the audio files.
I'm not sure if this is the best way or even possible. So I'm still investigating.
If you have code that exists already which can talk to Google Cloud Storage, you can deploy that code as an App Engine application which runs on a Custom Runtime. To ensure the ffmpeg binary is available to your application, you'd add this to your app's Dockerfile:
RUN apt-get install ffmpeg
Then, it is just a matter of having your code save the audio file from GCS somewhere in /tmp, then shelling out to /usr/bin/ffmpeg to do your conversion, then having your code do something else with the resulting output file (like serving it back to the client or saving it back to Cloud Storage).
If you're not using the flexible environment or Kubernetes, download the ffmpeg binaries (Linux-64) from https://ffbinaries.com/downloads and include ffmpeg and ffprobe directly in your app. For apps using the standard environment this is really the only way without switching.
Once you've added them, you'll need to point to them in your options array:
$options = array(
'ffmpeg.binaries' => '/workspace/ffmpeg-binaries/ffmpeg',
'ffprobe.binaries' => '/workspace/ffmpeg-binaries/ffmpeg',
'timeout' => 3600,
'ffmpeg.threads' => 12,
);
To have it work locally, you should make them environment variables to point to the correct path in each set up. Add something like export FFMPEG_BINARIES_PATH="/usr/local/bin" (or wherever you have them locally) to your .zshrc or other rc file and the below code to your app.yaml:
env_variables:
FFMPEG_BINARIES_PATH: '/workspace/ffmpeg-binaries'
And then change the options array to:
$options = array(
'ffmpeg.binaries' => getenv("FFMPEG_BINARIES_PATH") . '/ffmpeg',
'ffprobe.binaries' => getenv("FFMPEG_BINARIES_PATH") . '/ffmprobe',
'timeout' => 3600,
'ffmpeg.threads' => 12,
);
The gist of the issue is that IBM Watson Speech to Text only allows for FLAC, WAV, and OGG file formats to be uploaded and used with the API.
My solution to that would be that if the user uploads an mp3, BEFORE sending the file to Watson, a data conversion would take place. Essentially, the user uploads an mp3, then using ffmpeg or sox the audio would be converted to an OGG, after which the audio would then be uploaded to Watson.
What I am unsure about is: What exactly do I have to modify in the Node.js Watson code to allow for the audio conversion to happen? Linked below is the Watson repo which is what I am working through. I am sure that the file that will have to be changes is fileupload.js, which I have linked, but where the changes go is what I am uncertain about?
I have looked through both SO and developerWorks, the IBM SO for answers to this issue, but I have not seen any which is why I am posting here. I would be happy to clarify my question if that is necessary.
Watson Speech to Text Repo
The Speech to Text sample application you are trying to use doesn't convert MP3 files to OGG. The src folder(with fileupload.js on it) is just javascript that will be used on the client side(thanks to Browserify).
The application is basically communicating the browser with the API using CORS so the audio goes from the browser to the Watson API.
If you want to convert the audio using ffmpeg or sox you will need to install the dependencies using a custom buildpack since those modules have binary dependencies (C++ code in them)
James Thomas has a buildpack with sox on it: https://github.com/jthomas/nodejs-buildpack.
You need to update your manifest.yml to be something like:
memory: 256M
buildpack: https://github.com/jthomas/nodejs-buildpack.git
command: npm start
Node:
var sox = require('sox');
var job = sox.transcode('audio.mp3', 'audio.ogg', {
sampleRate: 16000,
format: 'ogg',
channelCount: 2,
bitRate: 192 * 1024,
compressionQuality: -1
});
I'm currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.
It's working great, but now I would like to put some vocal commands to do some actions.
Like when the software is saying "One mail from Bob", I would like to say "Read it", or "Archive it".
My software is running through a node server, currently I don't have any browser implementation, but it can be a plan.
What is the best way in node JS to enable speech to text?
I've seen a lot of threads on it, but mainly it's using the browser and if possible, I would like to avoid that at the beginning. Is it possible?
Another issue is some software requires the input of a wav file. I don't have any file, I just want my software to be always listening to what I say to react when I say a command.
Do you have any information on how I could do that?
Cheers
Both of the answers here already are good, but what I think you're looking for is Sonus. It takes care of audio encoding and streaming for you. It's always listening offline for a customizable hotword (like Siri or Alexa). You can also trigger listening programmatically. In combination with a module like say, you could enable your example by doing something like:
say.speak('One mail from Bob', function(err) {
Sonus.trigger(sonus, 1) //start listening
});
You can also use different hotwords to handle the subsequent recognized speech in a different way. For instance:
"Notifications. Most recent." and "Send message. How are you today"
Throw that onto a Pi or a CHIP with a microphone on your desk and you have a personal assistant that reads your notifications and reacts to commands.
Simple Example:
https://twitter.com/_evnc/status/811290460174041090
Something a bit more complex:
https://youtu.be/pm0F_WNoe9k?t=20s
Full documentation:
https://github.com/evancohen/sonus/blob/master/docs/API.md
Disclaimer: This is my project :)
To recognize few commands without streaming them to the server you can use node-pocketsphinx module. Available in NPM.
The code to recognize few commands in continuos stream should look like this:
var fs = require('fs');
var ps = require('pocketsphinx').ps;
modeldir = "../../pocketsphinx/model/en-us/"
var config = new ps.Decoder.defaultConfig();
config.setString("-hmm", modeldir + "en-us");
config.setString("-dict", modeldir + "cmudict-en-us.dict");
config.setString("-kws", "keyword list");
var decoder = new ps.Decoder(config);
fs.readFile("../../pocketsphinx/test/data/goforward.raw", function(err, data) {
if (err) throw err;
decoder.startUtt();
decoder.processRaw(data, false, false);
decoder.endUtt();
console.log(decoder.hyp())
});
Instead of readFile you just read the data from microphone and pass it to recognizer. The list of keywords to detect should look like this:
read it /1e-20/
archive it /1e-20/
For more details on spotting with pocketsphinx see Keyword Spotting in Speech and Recognizing multiple keywords using PocketSphinx
To get audio data into your application, you could try a module like microphone, which I haven't used by it looks promising. This could be a way to avoid having to use the browser for audio input.
To do actual speech recognition, you could use the Speech to Text service of IBM Watson Developer Cloud. This service supports a websocket interface, so that you can have a full duplex service, piping audio data to the cloud and getting back the resulting transcription. You may want to consider implementing a form of onset detection in order to avoid transmitting a lot of (relative) silence to the service - that way, you can stay within the free tier.
There is also a text-to-speech service, but it sounds like you have a solution already for that part of your tool.
Disclosure: I am an evangelist for IBM Watson.