API to generate server-side static visualizations of sound clips - audio

I'm developing an app that will pull in static PNG visualizations of sound clips (30 seconds max). The images will then act as the background image of the player / scrubber in the UI.
I'm looking for APIs / tools that would support the processing and visualization of sound clips on the back-end, generating and saving a quality PNG. I thought Processing might be an option, but am not yet sure if it has these specific capabilities (it's also not really designed to be server-side). Any and all suggestions would be great.
Related - if anyone is an expert in this, and can give me insight into the type of data that can be extracted and visualized from sound, that would also be great. Though, I am hoping by identifying possible tools or APIs, that information will become more clear.
Thank you.
Claudia

Related

analysing video stream for conditions

Not entirely new to azure, but new to the Media Services available on azure. I am looking for suggestion on what azure components I should consider to build a solution to analyze video for certain conditions.
(e.g. 1) Presence of a human - Yes/No, 2) alert if no human presence detected for a certain number of minutes, 3) confirmation if identified human is wearing a uniform or not, etc. )
I have built a somewhat similar on-premise solution in the past using OpenCV & some open source ML libraries, not sure what azure services I can use if this will be running in Azure.
I can live stream this to azure and am not looking for an edge solution.
I looked up azure video indexer and it looks promising, but probably more tuned for audio analysis rather then image frame analysis.
suggestions would be appreciated.
Azure video indexer is optimized for files, not streams, but is capable of meeting the requirement since it detect faces and people (in advanced preset).
Regarding uniform or not, this is not supported in video indexer at the moment but ability to detect cloth color will come in the future.
By fragmenting the video, Azure Video Indexer provides a near live solution. It means there will be a few minutes delay, so it depends on how time-sensitive your requirements are.
Regarding your second question, it will be possible to customize a model to identify specific uniforms in a few months. When the bounding boxes of the uniforms match the bounding boxes of the detected people, you can identify if a person is wearing a uniform.

Storing video that can be played from the middle

I am using Azure Blob Storage to store a video. I would like a user to be able to scroll the video to any point in time and play it from there.
For short videos, there is no problem because the whole video loads and you can do that, but for larger videos, it does not seem to work out of the box. And in some sense it makes sense - files by default do not have the functionality to be downloadable from the middle. But all decent video streaming websites offer this functionality. I must be missing some video concepts, would appreciate, if someone linked me to some articles explaining how things like this are done. Bonus points if the solution is using Microsoft Azure.
Large video example (28 secs, 126MB):
https://www.w3schools.com/code/tryit.asp?filename=GP328W3SEY77
Small video example (10 secs, 1MB):
https://www.w3schools.com/html/tryit.asp?filename=tryhtml5_video
Video streaming servers or cloud services are usually dedicated specialised servers and their functionality can be quite complex.
A video 'file' typically consists of one or more video and audio tracks in a 'container' like MP4. The container will have header information and pointers to the track info.
Simple HTTP streaming of an mp4 file is possible if your server supports range requests, i.e. downloading parts of the file at a time, and if the header information is at the start of the video file - in mp4 is it usually at the end by default but can be moved to the start.
More sophisticated streaming servers, including most/all of the popular commercial services, use a dedicated streaming protocol, typically ABR HLS or DASH these days. These provide chunked multiple different bit rate versions of the video and allow the client switch between bit rates for each chunk it downloads - see more info here: https://stackoverflow.com/a/42365034/334402
The thumbnails you see when you scroll along a video timeline are actually usually a separate track in the video file or stream. They are a set of images at timepoints and the entire set of thumbnails can be downloaded quickly at the start of playback to give a view of what a particular part of the video will look like if the user wants to jump to it. When the user actually jumps to that section of the video, the client requests from the server the chunks or section of the file corresponding to that thumbnail.
Azure CDN actually provides some nice functionality:
video starts to play instantly (browser doesn't need to wait for the video to fully load)
you can scroll back/forward in time.
(Obviously, this in addition to the standard CDN functionality of multiple PoP, caching, etc.)
The above CDN setup was tested on Standard Microsoft, but Verizon and Akamai seem to be offering similar functionality.

To read the text from PAN Card

I have the usecase to read the text from the PAN Card. Ideally the application should have the screen to scan the PAN Card and the text should be extracted from there. The extracted texts will be auto populated on the further screens.
I have read about tesseract npm module, but still didn't have the clue where to start as there is no compete blogs available for this usecase over the internet. Also tried the npm module - okrabyte, this is not giving 100% result. Any guidance or help would be required.
I tried AWS Textract service as well. This is not helping to parse the PAN CARD as the extracted results were completely different.
You need to use OCR to achieve this. There are various options for doing this. Tesseract is open source. I hope this blog helps you get started with tesseract on nodejs.
You can use OCR apis from different cloud providers to achieve this as well. Example: Microsoft Cognitive Services Vision API, Abbyy Cloud, etc.
Also, improving the quality of your image helps in extracting text with higher accuracy. Personally, I've seen big difference between 200 dpi images vs 600 dpi images.
Hope this helps!

Is Node.JS audio mixing + MP3 generation possible?

In short, I have a site where on the client side the user has a "beat maker" app. The user can turn on / turn off noises, background beats, etc, to essentially create their own custom "song" based on the pre-defined noises, tones and tunes that I have on the client side.
I need to somehow translate the beat they're making (in HTML5 canvas) to my server-side (currently Node.JS) and spit out an MP3 of their creation.
Basically I have to somehow have my server-side backend gracefully concatenate + overlap + mix various smaller MP3/wav files into one MP3 file that matches the beat that they created on the client side. I then have to return that MP3 to the client side for download.
Anyone able to point me in the right direction?
As far as my research indicates, this isn't easily accomplished or feasible at all (I.E. within realistic budget / time constraints of the project) due to the complexity of the problem at hand.
This is possible, and there are some audio libraries for JavaScript, but I would take a different approach.
The Web Audio API is very solid these days. You can have your user make all the adjustments client-side, and then generate the audio file right there in the user's browser. If you need to get a copy server-side, you can upload the raw PCM to your server (bandwidth intensive), or send the parameters to the server and re-generate the file.
Now unfortunately, PhantomJS doesn't support Web Audio. To generate a perfect server-side copy, I would execute Chrome with a special page that renders the audio and then uploads to the local server. This guarantees that the sound output you get is the exact same as that of the client, and leaves all of the heavy lifting to the Web Audio API already implemented in the browser.
You won't find much off-the-shelf for a project like this, but with a little creativity I think you will find that this isn't too difficult.

how to implement a web site like youtube?

I'm doing a language web site for my university language center, where students login and see videos to learn English. i have to do it like this,
person is logging in to the system, search using a search area and find the details,lessons and videos relevant to that videos. this functionality exactly matches the youtube scenario.
for implementing twitter like functionality we can use status-net, is there a similer library, statusnet like famous implementation for youtube or a some kind of platform or a framework like codeigniter that we can use to implement youtube like site very easily??
please suggest some options?? a open source one or a commercial one ???
and what is the best video format to use in a such web site?? flv?? mp4?? or mov???
regards,
Rangana
Your best option is to use a 'cloud' based video processing service. Most have a sample project / library for many different languages and frameworks. Here is a list of a few I've tried and liked:
http://zencoder.com/
http://transloadit.com/
http://pandastream.com/
The typical steps involve uploading the video files to a large 'cloud' static asset host (such as S3) through the browser. If you are inexperienced it is best to select a processor that provides an uploader (it will handle putting the files in the right spot). Of the three, Transloadit and Panda both have custom unloaders.
Usually the service will allow you to either pass the encoding settings (what formats and qualities to) output to as parameters or configure them in your account. To support all current HTML5 browsers you just need H264 (.m4a) and OGG (.ogv). However, the new trend in the video world is for WEBM (.webm) so you might want to include it as well.
Next you will receive a unique code from the web service that you must store in persistent storage (database). The web service can be configured to 'callback' (perform an HTTP POST or GET request to your service) once the video is encoded.
Once your recieve a callback you can activate your video and start dislpaying it on your pages. For displaying, if you are inexperienced I'd highly recommend you use one of the following players:
http://sublimevideo.net/
http://longtailvideo.com/
http://videojs.com/
They all do similar things for different prices. My current personal favourite is Sublime Video (it offers cool light box effects and a gorgeous player).
Why do you have to re-implement Youtube when you can just use it for hosting your videos for free? Many online e-learning portals (e.g. Khan academy) do exactly that.
As far as the best video format to use -- go read about H.264/AVC. It's what Youtube currently uses.
I think you will not find already built solution ;)
But it's not really that hard. You can use existing frameworks that will make your life easier while you build account management system, the rest shouldn't be really that hard (assuming you don't really want to re-build the whole Youtube ;D ).
For playing videos, you can use JW Player. A great piece of software, you should check it out.

Resources